Platform

AI

AI Agents
Sense, decide, and act faster than ever before
AI Visibility
See how your brand shows up in AI search
AI Feedback
Distill what your customers say they want
Amplitude MCP
Insights from the comfort of your favorite AI tool

Insights

Product Analytics
Understand the full user journey
Marketing Analytics
Get the metrics you need with one line of code
Session Replay
Visualize sessions based on events in your product
Heatmaps
Visualize clicks, scrolls, and engagement

Action

Guides and Surveys
Guide your users and collect feedback
Feature Experimentation
Innovate with personalized product experiences
Web Experimentation
Drive conversion with A/B testing powered by data
Feature Management
Build fast, target easily, and learn as you ship
Activation
Unite data across teams

Data

Warehouse-native Amplitude
Unlock insights from your data warehouse
Data Governance
Complete data you can trust
Security & Privacy
Keep your data secure and compliant
Integrations
Connect Amplitude to hundreds of partners
Solutions
Solutions that drive business results
Deliver customer value and drive business outcomes
Amplitude Solutions →

Industry

Financial Services
Personalize the banking experience
B2B
Maximize product adoption
Media
Identify impactful content
Healthcare
Simplify the digital healthcare experience
Ecommerce
Optimize for transactions

Use Case

Acquisition
Get users hooked from day one
Retention
Understand your customers like no one else
Monetization
Turn behavior into business

Team

Product
Fuel faster growth
Data
Make trusted data accessible
Engineering
Ship faster, learn more
Marketing
Build customers for life
Executive
Power decisions, shape the future

Size

Startups
Free analytics tools for startups
Enterprise
Advanced analytics for scaling businesses
Resources

Learn

Blog
Thought leadership from industry experts
Resource Library
Expertise to guide your growth
Compare
See how we stack up against the competition
Glossary
Learn about analytics, product, and technical terms
Explore Hub
Detailed guides on product and web analytics

Connect

Community
Connect with peers in product analytics
Events
Register for live or virtual events
Customers
Discover why customers love Amplitude
Partners
Accelerate business value through our ecosystem

Support & Services

Customer Help Center
All support resources in one place: policies, customer portal, and request forms
Developer Hub
Integrate and instrument Amplitude
Academy & Training
Become an Amplitude pro
Professional Services
Drive business success with expert guidance and support
Product Updates
See what's new from Amplitude

Tools

Benchmarks
Understand how your product compares
Templates
Kickstart your analysis with custom dashboard templates
Tracking Guides
Learn how to track events and metrics with Amplitude
Maturity Model
Learn more about our digital experience maturity model
Pricing
LoginContact salesGet started

AI

AI AgentsAI VisibilityAI FeedbackAmplitude MCP

Insights

Product AnalyticsMarketing AnalyticsSession ReplayHeatmaps

Action

Guides and SurveysFeature ExperimentationWeb ExperimentationFeature ManagementActivation

Data

Warehouse-native AmplitudeData GovernanceSecurity & PrivacyIntegrations
Amplitude Solutions →

Industry

Financial ServicesB2BMediaHealthcareEcommerce

Use Case

AcquisitionRetentionMonetization

Team

ProductDataEngineeringMarketingExecutive

Size

StartupsEnterprise

Learn

BlogResource LibraryCompareGlossaryExplore Hub

Connect

CommunityEventsCustomersPartners

Support & Services

Customer Help CenterDeveloper HubAcademy & TrainingProfessional ServicesProduct Updates

Tools

BenchmarksTemplatesTracking GuidesMaturity Model
LoginSign Up

Identity Resolution: Data Warehouse vs. Customer Data Platform

Learn how identity resolution occurs in data warehouses vs. customer data platforms and which one is right for your business.
Insights

Aug 15, 2022

10 min read

Arpit Choudhury

Arpit Choudhury

Founder, astorik

Identity Resolution

Everybody wants a single source of truth for customer data, but what it entails depends on who you’re asking.

Sure, the data warehouse is a “single store” for customer data collected across multiple sources; however, in the absence of identity resolution, the data is only half-true. Building a unified view of customer activity from the data is anything but trivial—those tasked with it can attest to the complexities involved in getting it right.

Moreover, the definition of identity resolution also varies from business to business—for certain industries, solving for identity resolution is a subset of a broader entity resolution problem.

Identity resolution, as the name suggests, refers to the identity of a person—an individual user or customer who is one of the several entities that a business deals with. Some of the others are accounts, products, suppliers, vendors, partners, and resellers.

In this guide though, I want to delve a little deeper into identity resolution and describe the systems where it takes place, the differences between automated and manual identity resolution, and the benefits of deterministic over probabilistic matching.

Identity resolution: Where and how it happens

Identity resolution, as you probably already know, is the process of unifying user (or customer) records that are captured across multiple sources (or touchpoints).

But where does this process take place? Who performs the unification? How is the data captured and stored? And what are the prerequisite data points to make it all possible?

It’s important to have answers to these questions before investing in an identity resolution endeavor.

Data warehouse (DWH)

Bill Inmon, known as the father of the data warehouse, recently wrote an article titled “What A Data Warehouse Is Not” where he debunks popular myths regarding what a data warehouse is—it’s a fascinating read and I highly recommend it if you want to gain a deeper understanding of what’s happening in the world of data warehousing.

The data warehouse, in its typical form, is a cloud database that stores customer data from disparate sources and is used for analytic workloads.

Before identity resolution can happen, one has to ensure that data from first-party data sources—apps, websites, or smart devices—is made available in the data warehouse, which is typically done using an internal or external customer data infrastructure (CDI) solution. What data is collected and how it is stored is important as identity resolution relies on a set of identifiers (IDs) that are used to match and merge user records originating across multiple sources.

Writing the unification code

The process of unifying or merging records starts once the requisite data is made available in the warehouse. This is typically done by analysts who have a good understanding of the datasets and are adept at writing SQL queries that perform complex joins across tables to create new tables known as materialized views. These tables then serve as the source of truth that is used for analysis and activation.

Probabilistic vs. deterministic matching

In the absence of identifiers such as email, mobile number, device ID, and user ID, or the ability to join them accurately due to other factors, one has to resort to what is referred to as probabilistic matching, which relies on signals rather than personally identifiable information (PII).

Also known as fuzzy matching, probabilistic matching looks for a combination of user properties such as name, location, operating system, IP address, etc. to then merge records when the potential match receives an acceptable score.

In simple terms, probabilistic matching is more flexible but is not 100% accurate. It makes sense to employ it for critical use cases such as fraud detection where the datasets are large and complex; however, it’s not recommended if your goal is to build data-powered personalized experiences.

Deterministic matching is more accurate simply because there’s no “guesswork” involved—it’s a 0 or 1 scenario based on the available identifiers. The benefits of this approach are covered below.

I’m hoping that you now have a fair understanding of how identity resolution is handled in the data warehouse. It’s time to understand how it’s done by CDPs.

Read my guide with Amplitude on Behavioral Data & Event Tracking to learn more about laying your data foundation.

Behavioral Data Event Tracking

Customer data platform (CDP)

I wanted to link to an article describing what a CDP is not (here’s what a CDP is), but unfortunately, I couldn’t find one so I’d first like to quickly mention that a CDP is not a CDI, nor is it a CRM.

In essence, a customer data platform is, well, a platform on top of customer data infrastructure—the platform enables folks to segment and sync audiences with third-party tools using a visual interface.

So where does identity resolution take place and how?

Generally speaking, it takes place at the time of, or soon after, data is collected. Under the hood, a CDP stores a copy of the data and in an automated fashion, performs deterministic matching based on supplied identifiers.

As mentioned earlier, personally identifiable information (PII) plays a key role in enabling deterministic matching and offers a high level of accuracy—an integrated system to collect the data and perform the unification is what makes a CDP appealing.

Some CDP vendors have taken the probabilistic route and tout their offerings to be superior in nature. Instead of detailing the downsides of probabilistic matching, I’d like to highlight some of the key benefits of deterministic matching.

Deterministic identity resolution: Key benefits

Personalization is the holy grail for SaaS and ecommerce businesses, but if gone wrong or ill-timed, personalization efforts can prove to be more detrimental than no personalization at all.

Deterministic identity resolution not only ensures accurate personalization at scale but also enables businesses to be more privacy-friendly and adhere to regulations more strictly. Allow me to unpack this.

Personalization

Since deterministic identity resolution takes place only when the system is able to identify user records based on identifiers provided by the user directly (typically email or phone number), it’s highly unlikely for personalization efforts to get messed up.

Additionally, timeliness is ensured since CDPs are able to automatically perform identity resolution at the time of data collection.

A simple use case that applies to most SaaS businesses is to send a highly personalized welcome email to users—almost immediately after they sign up—that also takes into account other user attributes such as location, industry, or preferences.

SaaS businesses typically allow a user to create multiple accounts or workspaces but sending the same standard welcome email to an existing user makes little sense. Deterministic identity resolution coupled with pre-defined segmentation and real-time syncing can ensure that the user is not treated as a new user and the communication they receive reflects that.

A broader example that applies to pretty much all industries is to notify users when they log into their account on a new device or in an unrecognized location. Since the system already has the user ID associated with a specific IP address and device ID, it is able to immediately recognize unknown patterns and notify the user in real time.

Privacy-friendly

Nobody needs a lesson in why a privacy-friendly approach is critical for businesses—the ramifications of not adhering to GDPR or CCPA can be brutal.

With deterministic matching, brands can be certain that if a user has opted out of receiving communication or wants to be forgotten, they are accurately identified across downstream systems—email, SMS, advertisement channels, and so on—and their data is wiped clean from everywhere.

Achieving this level of compliance in the absence of a CDP with deterministic identity resolution capabilities is far from trivial and can result in multiple violations along the way.

Which form of identity resolution is right for you?

The goal of this guide is to provide an overview of how identity resolution is achieved in different environments under different constraints, and hopefully, I’ve managed to do that.

These tips and suggestions are better suited for the realm of product, growth, and marketing use cases, primarily at B2B SaaS companies. Moreover, this piece is not meant to conclude that one approach is better than the other, and based on certain factors, managing identity resolution in the data warehouse using fuzzy matching might work better for some businesses after all.

Learn more about identity resolution in the Amplitude CDP by speaking with a product expert.

Contact sales
About the author
Arpit Choudhury

Arpit Choudhury

Founder, astorik

More from Arpit

Arpit is growing databeats (databeats.community), a B2B media company, whose mission is to beat the gap between data people and non-data people for good.

More from Arpit
Topics

Amplitude Activation

Data Management

Platform
  • Product Analytics
  • Feature Experimentation
  • Feature Management
  • Web Analytics
  • Web Experimentation
  • Session Replay
  • Activation
  • Guides and Surveys
  • AI Agents
  • AI Visibility
  • AI Feedback
  • Amplitude MCP
Compare us
  • Adobe
  • Google Analytics
  • Mixpanel
  • Heap
  • Optimizely
  • Fullstory
  • Pendo
Resources
  • Resource Library
  • Blog
  • Product Updates
  • Amp Champs
  • Amplitude Academy
  • Events
  • Glossary
Partners & Support
  • Contact Us
  • Customer Help Center
  • Community
  • Developer Docs
  • Find a Partner
  • Become an affiliate
Company
  • About Us
  • Careers
  • Press & News
  • Investor Relations
  • Diversity, Equity & Inclusion
Terms of ServicePrivacy NoticeAcceptable Use PolicyLegal
EnglishJapanese (日本語)Korean (한국어)Español (Spain)Português (Brasil)Português (Portugal)FrançaisDeutsch
© 2025 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.

Recommended Reading

article card image
Read 
Product
Getting Started: Product Analytics Isn’t Just for Analysts

Dec 5, 2025

5 min read

article card image
Read 
Insights
Vibe Check Part 3: When Vibe Marketing Goes Off the Rails

Dec 4, 2025

8 min read

article card image
Read 
Customers
How CAFU Tripled Engagement and Boosted Conversions 20%+

Dec 4, 2025

8 min read

article card image
Read 
Customers
The Future is Data-Driven: Introducing the Winners of the Ampy Awards 2025

Dec 2, 2025

6 min read

Explore Related Content

Integration
Using Behavioral Analytics for Growth with the Amplitude App on HubSpot

Jun 17, 2024

10 min read

Personalization
Identity Resolution: The Secret to a 360-Degree Customer View

Feb 16, 2024

10 min read

Product
Inside Warehouse-native Amplitude: A Technical Deep Dive

Jun 27, 2023

15 min read

Guide
5 Proven Strategies to Boost Customer Engagement

Jul 12, 2023

Video
Designing High-Impact Experiments

May 13, 2024

Startup
9 Direct-to-consumer Marketing Tactics to Accelerate Ecommerce Growth

Feb 20, 2024

10 min read

Growth
Leveraging Analytics to Achieve Product-Market Fit

Jul 20, 2023

10 min read

Product
iFood Serves Up 54% More Checkouts with Error Message Makeover

Oct 7, 2024

9 min read

Blog
InsightsProductCompanyCustomers
Topics

101

AI

APJ

Acquisition

Adobe Analytics

Amplify

Amplitude Academy

Amplitude Activation

Amplitude Analytics

Amplitude Audiences

Amplitude Community

Amplitude Feature Experimentation

Amplitude Guides and Surveys

Amplitude Heatmaps

Amplitude Made Easy

Amplitude Session Replay

Amplitude Web Experimentation

Amplitude on Amplitude

Analytics

B2B SaaS

Behavioral Analytics

Benchmarks

Churn Analysis

Cohort Analysis

Collaboration

Consolidation

Conversion

Customer Experience

Customer Lifetime Value

DEI

Data

Data Governance

Data Management

Data Tables

Digital Experience Maturity

Digital Native

Digital Transformer

EMEA

Ecommerce

Employee Resource Group

Engagement

Event Tracking

Experimentation

Feature Adoption

Financial Services

Funnel Analysis

Getting Started

Google Analytics

Growth

Healthcare

How I Amplitude

Implementation

Integration

LATAM

Life at Amplitude

MCP

Machine Learning

Marketing Analytics

Media and Entertainment

Metrics

Modern Data Series

Monetization

Next Gen Builders

North Star Metric

Partnerships

Personalization

Pioneer Awards

Privacy

Product 50

Product Analytics

Product Design

Product Management

Product Releases

Product Strategy

Product-Led Growth

Recap

Retention

Startup

Tech Stack

The Ampys

Warehouse-native Amplitude