Distance-matrix feeds — precomputed pairwise distances between points of interest — are one of the most quietly useful geospatial products in retail, CRE, and CPG analytics. They power site selection (where should I open next?), cannibalization modeling (how much of the new store's traffic comes from the existing one?), and competitive catchment analytics (whose trade area does my site overlap?). The math is elementary; the use-cases are decisive. This piece is the working reference — what a distance-matrix feed is, the Euclidean-versus-drive-time tradeoff, which use-cases want which distance metric, and how buyers procure and validate these feeds. For the POI-layer framing underneath, see POI data quality in depth.
Key Takeaways
A Euclidean distance feed precomputes straight-line distances between every relevant POI pair — cheap to compute, cheap to store, instantly queryable.
Euclidean is right for strategic analytics (site selection screen, competitive-catchment heatmap, cannibalization ranking) because it is direction-agnostic and doesn't require network routing.
Drive-time distance is right for tactical analytics (exact trade-area polygons, last-mile logistics, delivery feasibility) but is expensive, routing-API-dependent, and changes with traffic.
The most common procurement mistake is asking for drive-time when Euclidean would suffice — the analytic conclusions are almost always identical, and drive-time costs 50–100x more to compute at scale.
Pairwise-distance feeds enable O(1) cannibalization and overlap queries at run-time when the compute has already been amortized offline — the architectural reason they exist as a product.
What a Euclidean Feed Actually Is
A Euclidean feed (sometimes called a pairwise-distance feed or POI-adjacency matrix) is a precomputed dataset of straight-line distances between POI records. For a catalog of N POIs, the full pairwise matrix is N² entries; most feeds are filtered to nearest-K neighbors per source POI (typically K=50 to K=500) so the storage is linear in N rather than quadratic. GSDSI's Euclidean Feed ships a nearest-K structure with distance in meters, NAICS and category filters, and POI identifiers that join back to POI & Geofencing. The core computation is the haversine formula applied to POI centroids — a great-circle distance on the WGS84 ellipsoid, accurate to within a few meters at retail-relevant scales.
What the feed enables: instead of computing distances at query time ("which competitors are within 5 miles of this proposed site?"), the feed answers in O(1) because the nearest-K competitors are already indexed. For site-selection and portfolio-level analytics — where a typical query fans out across hundreds of candidate sites and thousands of competitive POIs — this architectural shift is the difference between a responsive dashboard and a 20-minute batch job.
Euclidean vs Drive-Time: When Each Is Right
The recurring question: should I use Euclidean (straight-line) or drive-time distance? The answer depends on what you are measuring.
Euclidean is direction-agnostic, network-independent, and computes in constant time. It is right for analytics where relative distance matters more than precise travel time — site-selection screens, cannibalization ranking, competitive-catchment counts, CPG-retailer cross-mapping.
Drive-time incorporates the road network, traffic, and one-way streets. It is right for tactical analytics where actual travel matters — exact trade-area polygons, last-mile logistics optimization, delivery feasibility, customer-origin mapping for local marketing.
Drive-time is strictly more expensive: computing N² or nearest-K drive-times requires either a licensed routing API (Google Maps, Mapbox, HERE) with per-request billing, or a self-hosted router with significant compute cost. For a catalog of 10M POIs, full pairwise drive-time is infeasible; nearest-50 drive-time is expensive enough that buyers typically compute it only for a working subset.
The working heuristic for retail and CRE analytics: use Euclidean for strategic screens (which markets are underserved? which competitors threaten this site?) and drive-time only for tactical decisions (what is the actual 15-minute trade area of this specific store?). For most portfolio-level questions the Euclidean answer and the drive-time answer correlate above 0.95 at retail-relevant distances; the cost-benefit is decisively Euclidean.
Site Selection and Cannibalization
The canonical use-case. A retailer evaluating a new store location wants to know: which competitors are already present, at what density, and which existing own-brand stores would the new location cannibalize. Both questions are pairwise-distance queries. With a Euclidean feed: rank own-brand stores by distance to candidate site, take the nearest K, overlay foot-traffic data to estimate cannibalization share. Rank competitors by distance and NAICS, take the nearest K, overlay category share-of-wallet data to estimate competitive pressure. The entire analysis runs at interactive latency when the distance compute has been amortized offline. GSDSI's Trade Area Explorer in Free Tools demonstrates this pattern in a lightweight form.
Competitive Catchment and Trade-Area Overlap
Landlords, REITs, and retail chains run recurring competitive-catchment analyses to understand which trade areas overlap. The computation: for each POI of interest, find the nearest-K POIs of competitor brands (via NAICS or explicit brand filter), then overlay foot-traffic origin-destination data from Global Mobility & Location Data to measure shared customer draw. The Euclidean feed is the indexing backbone — without it, every catchment query would recompute distances at run time, and portfolio-scale analyses become impractical. NAIOP research on trade-area analytics consistently identifies catchment overlap as one of the higher-leverage quantitative inputs to CRE leasing and underwriting decisions.
Euclidean Feed Procurement Diagnostics
A short checklist for buyers evaluating distance-matrix feeds:
What is the K in nearest-K, and is it configurable per source POI? K=50 is a typical minimum for retail analytics; K=200+ for dense-urban work.
Are distances in meters, with documented precision? Published feeds should ship WGS84 haversine with at least 1-meter precision.
Do the POI identifiers join cleanly back to a production POI & Geofencing catalog with NAICS and brand metadata? A distance feed without clean joins is analytically dead weight.
What is the refresh cadence, and how are POI opening/closing events propagated? A distance feed that lags POI by 6 months will silently reference closed stores.
Is there a drive-time companion feed for tactical use-cases, or does the buyer own drive-time enrichment separately?
Buyers who license distance-matrix feeds and find the analytics underperforming almost always have a POI-join failure (identifiers don't match the production POI catalog) or a refresh-lag problem (stale POIs in the distance matrix). Both are diagnosable in minutes with a join-audit script and a sample-size check; both are procurement-quality tells.
Frequently Asked Questions
When should I use Euclidean distance vs drive-time?
Use Euclidean for strategic analytics — site-selection screens, competitive-catchment counts, cannibalization ranking — because it is direction-agnostic, network-independent, and computes in constant time. Use drive-time for tactical decisions where actual travel matters — exact trade areas, last-mile logistics, delivery feasibility. Drive-time is 50–100x more expensive to compute at scale; for most portfolio-level questions the Euclidean answer and the drive-time answer correlate above 0.95.
What is a "nearest-K" distance feed?
A distance-matrix structure that stores, for each source POI, only the K closest neighbors rather than the full pairwise N² matrix. K is typically configured at 50 to 500 depending on use-case — K=50 is fine for most retail analytics; K=200+ is needed for dense-urban work or very granular cannibalization modeling. Nearest-K trades completeness for linear storage; most analytics workloads don't need more than the top few hundred neighbors.
How does a Euclidean feed integrate with foot-traffic analytics?
The distance feed is the indexing backbone — it answers "which POIs are near which other POIs?" at O(1). Global Mobility & Location Data provides the visit signal — "who visits what?". Combining them enables cannibalization modeling (new store draws X% of its traffic from existing Y), catchment overlap (my trade area shares Z% of customers with competitor W), and origin-destination analytics at portfolio scale. Neither signal alone supports these use-cases; both together are decisive.
Why not compute distances at query time instead of using a precomputed feed?
Latency and cost. For a portfolio-level analysis spanning hundreds of candidate sites and thousands of competitive POIs, computing distances at query time — especially drive-time distances requiring a routing API — is the difference between an interactive dashboard and a multi-minute batch job. A precomputed feed amortizes the cost offline and delivers O(1) queries at run time. See GSDSI's Euclidean Feed for the production version of this pattern.