Unlocking Your Company’s AI Future: Projects to Invest In Today Part 2

Universal User Identification and Generative AI

May 30, 2023

This article is part 2 in a series that discusses how startups can enhance their data foundations to get uniquely valuable insights from tools like generative AI. The last article walked through why building proprietary data sets is key and how to get started with first party data collection. Today, we will delve into the backbone of any data strategy: the identification of users across various systems.

Part #2: Identify Users Early, Often and Consistently

The significance of assigning to each user a unique company-wide user ID cannot be overstated. It allows a company (and AI systems) to quickly and accurately link and migrate users from one system to another. This allows user records to be connected and analyzed through their entire lifecycle at a company rather than in third party data silos of individual applications.

If a user's journey across multiple systems through the acquisition and support funnel cannot be seamlessly tied together, it results in gaps in analyses and integrations. Those gaps will lead to incorrect conversion rates and inaccurate insights that lead the business to focus on the wrong thing—with confidence! All because users changed computers, browsers, email addresses, or just got lost in business systems along the way. This issue can be avoided with relative ease and cost-effectiveness if addressed early in a startup's lifecycle, making the unification of user IDs a part of the company’s culture.

Universal User ID Acquisition Funnel Diagram

User identity resolution refers to the process of connecting a user's identity across multiple systems and tracking their activities over time. If the frontline systems that run the business don’t have consistent user identity information then connecting the profile of the user from one system to another can become impossible. If user profiles can’t be consistently built across many systems then the user journey is broken and the startup can’t troubleshoot bottlenecks and become more efficient. In the era of generative AI driving down the costs of ad hoc analyses this will become a competitive disadvantage for companies that don’t implement a universal user ID. Automatic analyses will suffer the same disconnects leading to incorrect yet confident answers.

Universal user ID capabilities - Marc Stone

High-quality user identity processes serve as a significant force multiplier for all data and analytics projects within a company. It builds momentum for a business that compounds over time in agility and data quality. The divide between startups that have clean user identity implemented and those that don’t is subtle but pervasive. The gap is measured in higher headcount for system administrators and data engineers than otherwise needed. This means more engineering effort is going to maintenance over time rather than delivering net new projects.

💡 This image is a simplification but effectively illustrates the escalating cost of connecting systems over time and how a small investment in maintaining a universal ID for the business yields compounding benefits over time.

One of the most effective ways to enable user identity resolution is by creating a universal user ID. This unique identifier can link all user-related data across various platforms and touch points.

To accomplish this, a company needs to create a persistent identifier that will be unique to each user across all systems (even if they have multiple email addresses). The ID needs to be static and never change for the lifetime (and resurrection) of the user.

Establishing a Universal User ID

The universal user ID should be stored in a lightweight company wide user database that can be easily accessed through an API endpoint that can create/retrieve a user ID at any time. The goal of the service is to generate and resolve user IDs as far up the acquisition funnel as possible for new users (often before CRMs or product signups are involved).

There are many ways to create and track user IDs across the lifetime of a user that vary based on the size and type of company. I’ll show one example of a custom built solution to illustrate how it works on a public website (which is one place many user journeys get lost due to lack of robust identification).

Universal User ID Endpoint for Website Form Submissions

When a new user is detected by the endpoint, a new user ID is created, stored and returned to the page for submission with the form. Future requests to the service for the same user will return the existing user ID.

The most important part is that the signup/form submission event must be submitted with both the email address and user_id data present to keep all downstream systems in sync.

In this case, the API endpoint is called directly from the website/app as a client side action. Alternatively some behavioral data collection vendors can enrich the sign up/form submission event with the user ID after submission and before passing it along to the destination system.

The same API endpoint should be connected to frontline systems like CRMs and support software to populate the universal user ID as a field on newly created user records in those systems as well. This way as new users are collected across many systems there is a central source of truth for user IDs which can be referenced by all systems within the company.

💡 If the company has a SaaS product then the product database itself can be used for this purpose if it can be extended to create/store users who have not signed up for the product yet but have identified themselves through web forms, support channels, sales processes, etc.

A Note on Email Addresses as Universal IDs

Email addresses are often chosen by startups for this purpose only to be abandoned later down the road, so I would like to save everyone some time here. Do not use email addresses as universal IDs. There are several reasons:

Email addresses are considered PII which leads to greater regulatory scrutiny for every single system they are stored in. Have one system where PII can’t be stored? Now it’s disconnected from everything else and a universal ID will need to be created to join the two. And we’re right back where we started, but it’s a couple years later and now the cost of implementing a universal ID is 10x to 100x more expensive and painful.
People may change their email addresses over time, leading to multiple email addresses associated with the same user. This results in data fragmentation and difficulty in accurately linking user data across different systems. This may seem minor at first but quickly grows into long discrepancy research projects where analysts and engineers spend days or weeks chasing down apparitions which produces no forward momentum for a business. And the goal of this project is to maintain agility and value creation for the business over time.
Users often have multiple email addresses (at least personal and company) and will often interact with some products using both email addresses at different times. This also leads to the user journey being fragmented. Better to allow multiple email addresses to be linked to a single user account as is often seen in late stage products for this exact reason.

Have any thoughts or feedback? Find me on LinkedIn!

Growth with Data

Discussion about this post