What's the practical difference between deterministic and probabilistic identity resolution?

Deterministic uses a shared authentication event (same login, same email) as the bridge between two IDs — confidence 0.95+. Probabilistic uses co-occurrence signals (same IP, same device graph cluster, same postal) — confidence 0.4–0.7. Every identity graph combines both. The buyer's job is to pick a confidence threshold that matches each use case's risk tolerance, not to pretend one is strictly better than the other.

Is a CTV ID a person or a household?

A household. CTV IDs are generated at the TV or streaming-app level and represent everyone who watches that device. That is why household resolution is so important in CTV — without mapping the CTV ID to the person-level IDs that share the household, the buyer can't attribute a mobile conversion to a specific CTV exposure.

How does iOS ATT affect MAID-based identity graphs?

Apple's App Tracking Transparency reduced IDFA availability on iOS materially. Identity graphs compensate with deterministic HEM bridges (login events still produce HEMs even without IDFA) and household-level probabilistic links (IP + device graph). The effect on graph scale is concentrated on iOS; Android AAID availability is far less affected.

What match rate should a buyer expect when testing an identity graph against their CRM?

For a US B2C brand with a solid first-party database, 60–85% resolution into additional IDs (HEM → MAID, HEM → CTV ID) is the realistic range. International CRMs see lower match rates (30–60%) because the cross-border graph coverage is thinner. Vendors claiming 95%+ across any CRM without confidence-score disclosure are conflating 'we found a match somewhere' with 'we found a reliable match.'

Identity Graphs 101: MAID, HEM, CTV, Household

The phrase 'identity graph' has been used for everything from a simple email-to-MAID lookup table to a thousand-node cross-device resolution engine with confidence scoring. The resulting confusion has cost a lot of marketing teams real money, because the lowest-tier lookup tables are priced like full resolution engines and the full engines are priced like the post-cookie insurance policy they effectively are. A buyer who can't tell the difference ends up with the wrong tool for the job. This piece walks through what an identity graph actually does, which pieces are deterministic versus probabilistic, and where the economic value concentrates — with specific reference to the GSDSI Global Mobility & Location Data identity layer and its 200M+ US MAID-to-HEM links and 700M+ international entries across 150+ countries.

Key Takeaways

Three atomic units: HEM (hashed email), MAID (mobile ad ID), and CTV ID (household-level streaming ID). Every mature identity graph links all three.
Deterministic resolution (shared logins) carries 0.95+ confidence; probabilistic resolution (IP, device fingerprints, postal) carries 0.4–0.7 — the best graphs expose the score.
Household resolution is what unlocks CTV attribution — without it, the Roku impression and the mobile search can't be tied together.
The IAB Tech Lab's identity framework has standardized the vocabulary; buyers should require vendors to map their graph to it for apples-to-apples comparison.

The Atomic Units: HEM, MAID, CTV ID

Every identity graph is built on top of three primary key types. A hashed email address (HEM) is a one-way hash of an email the consumer supplied somewhere — a newsletter signup, a purchase confirmation, a login. A mobile advertising ID (MAID) is the IDFA/AAID assigned by the device's operating system for ad-attribution purposes; a single consumer may have multiple MAIDs across phone, tablet, and work device. Apple's App Tracking Transparency framework governs IDFA availability on iOS and has materially reduced MAID supply on that platform since 2021. A connected-TV identifier is a household-level ID generated by the TV's ACR or the streaming platform's SDK — it represents a household rather than an individual, which has important measurement implications. A mature identity graph links all three key types, and often adds probabilistic IP-and-device fingerprints, postal-based household IDs, and cookie-pool bridges where they still exist.

Deterministic vs Probabilistic Resolution

Resolution is the work of saying 'these three atomic IDs belong to the same person.' Deterministic resolution is the cleanest form — the graph has a record of the same consumer logging in with the same email on the mobile app (producing a HEM-to-MAID pair) and on the Smart TV app (producing a HEM-to-CTV-ID pair). That shared HEM is the deterministic bridge; confidence scores for these links are typically 0.95+ because there's a login event backing them. Probabilistic resolution fills in the gaps where deterministic signal is missing — IP address overlap, device-graph co-occurrence, postal-code targeting. Probabilistic links carry lower confidence (often 0.4–0.7) and every serious identity graph exposes the score to the buyer so the buyer can pick a confidence threshold that matches the use case's risk tolerance. Practical guidance on when each tier matters:

Suppression (don't send email to existing customers): 0.6+ is fine — a false positive costs one unsent email.
Audience targeting for brand campaigns: 0.7+ is the common threshold — false positives waste impressions but don't damage the brand.
Measurement and attribution: 0.85+ — false positives corrupt the ground truth the entire measurement program depends on.
Personalized direct-mail/outbound: 0.9+ deterministic-only — false positives produce real consumer complaints.

The Household Resolution Layer

The household-resolution layer sits on top of the individual graph and answers a different question: which of these people live together? Postal-address matching anchors it; home IP co-occurrence and device co-location signals strengthen it; household panels like Comscore or Nielsen provide the ground truth for validation. Household resolution is what unlocks CTV attribution — the ad impression on the Roku in the living room needs to be tied to the mobile search that happened at the same household's kitchen table three hours later. Without the household layer, the buyer sees a CTV-ID impression and a MAID search event and has no way to connect them. With it, cross-channel measurement becomes tractable, attribution becomes honest, and the walled-garden premium the buyer used to pay for attribution inside a single platform's ecosystem stops being necessary.

Evaluating an Identity Graph

For a buyer evaluating identity-graph products, three practical considerations matter more than the marketing slides. First: scale against the specific audience. A graph with 200 million US individual IDs is average; a graph with ~200 million US MAID-to-HEM links plus 700 million international entries across 150+ countries is a rare commercial asset that supports actual cross-border activation. Second: match rate against the buyer's CRM. The buyer should send a hashed customer list and measure how many of those customers the graph actually resolves, across which IDs, at which confidence levels. Third: activation surface. A graph that only ships as a flat file is less valuable than a graph that integrates into the buyer's DSP, CDP, or clean-room. The winning identity-graph deployment in 2026 looks like a deterministic-first graph, with probabilistic links at disclosed confidence scores, a household layer validated against panel data, and activation hooks into every channel the buyer runs — not a data file sitting on an S3 bucket waiting to be joined. The parallel diligence framework on B2B contact database evaluation covers a similar procurement discipline from the other side of the identity stack.

Frequently Asked Questions

What's the practical difference between deterministic and probabilistic identity resolution?: Deterministic uses a shared authentication event (same login, same email) as the bridge between two IDs — confidence 0.95+. Probabilistic uses co-occurrence signals (same IP, same device graph cluster, same postal) — confidence 0.4–0.7. Every identity graph combines both. The buyer's job is to pick a confidence threshold that matches each use case's risk tolerance, not to pretend one is strictly better than the other.
Is a CTV ID a person or a household?: A household. CTV IDs are generated at the TV or streaming-app level and represent everyone who watches that device. That is why household resolution is so important in CTV — without mapping the CTV ID to the person-level IDs that share the household, the buyer can't attribute a mobile conversion to a specific CTV exposure.
How does iOS ATT affect MAID-based identity graphs?: Apple's App Tracking Transparency reduced IDFA availability on iOS materially. Identity graphs compensate with deterministic HEM bridges (login events still produce HEMs even without IDFA) and household-level probabilistic links (IP + device graph). The effect on graph scale is concentrated on iOS; Android AAID availability is far less affected.
What match rate should a buyer expect when testing an identity graph against their CRM?: For a US B2C brand with a solid first-party database, 60–85% resolution into additional IDs (HEM → MAID, HEM → CTV ID) is the realistic range. International CRMs see lower match rates (30–60%) because the cross-border graph coverage is thinner. Vendors claiming 95%+ across any CRM without confidence-score disclosure are conflating 'we found a match somewhere' with 'we found a reliable match.'