The Six Moats of Data Businesses

7 min readOct 11, 2023

There are thousands of data businesses across industries, across a complex value chain. Some companies advertise that they curate data better than others, some advertise that they have exclusive access to X data set, some advertise that they offer the best analytics, and some advertise that they are the easiest to use.

The vast majority of these businesses do not go on to become large successes — even if they execute well and build a great data set — because they do not have a defensible moat.

This post walks through the different moats that exist in data businesses — and how a few dozen companies found defensibility and became $1+ billion data businesses over time.

Features That Aren’t Moats

Before jumping into what moats exist, I’ll outline some features that companies focus on that are not defensible:

Better curation/quality/coverage/etc.
Data that is easier to use (great API, easy pricing, better formatting, etc.)
Speed of data collection
Better analytics
Dashboards & UIs

All of these may improve the customer’s experience, and they may be very useful features — but they are not defensible enough to create a large business around (despite hundreds of companies trying to use these to differentiate). Ideally, companies may use some of these features to smooth their sales cycles on their paths to establishing a moat, but these are not sufficient themselves as they are easily replicable.

The Most Effective Moat: Data Currencies

Almost every data business that is a $1+ billion enterprise has a data currency aspect to its model. A data currency is used by two (or more) parties that rely on a particular data set to complete a transaction; meanwhile the data company that controls the currency takes a tax on its use.

Once established, these businesses have massive network effects: the existence of the currency unlocks a large number of transactions, and it is difficult for any company to stop using the currency, because that’s what everyone else is using.

Examples of $1+ billion companies that have become data currencies:

D&B — the DUNS number is routinely used as the identifier for businesses between a variety of parties, establish a currency that is controlled by D&B
Fair Isaac — the FICO score is used as a currency between borrowers and financial institutions to help determine eligibility for a loan
Standard & Poor’s –S&P and other ratings agencies create a score used by third parties to determine the creditworthiness of companies, shaping their contracts with lenders
Nielsen — when Nordstrom buys a TV ad from NBC, the contract requires that X million viewers see the ad. Nordstrom and NBC don’t know how many viewers saw the ad directly, so Nielsen is used to adjudicate the contract.
Datavant — the Datavant key is used between parties in healthcare as a common patient identifier, so two parties know if they have information about the same patient
IQVIA — IQVIA benchmarks are used for performance measurement and performance guarantees between product manufacturers and their sales & marketing channels
LiveRamp — when two parties in the marketing industry want to exchange data about a consumer, they use LiveRamp IDs as the unique identifier

If a company can become a currency, this is by far the most defensible business model in data — as it creates a sticky, multi-sided network. As a result, many of the biggest companies in data have this as a foundational component of their business model.

Moat #2: Navigating the long-tail

Another approach that can create large data businesses is navigating the long-tail of relationships. In these business models, a company focuses on a large investment in making thousands of partnerships and integrating data across them, and bets that it does not make economic sense for a second company to attempt to replicate that work once they have become an incumbent.

The primary risk in this business model: if a large portion of value can be captured with just a handful of relationships, the defensibility only applies to the value of the long-tail portion. Long-tail aggregators should assume that they’ll be disintermediated at the top of the market, and need to ensure there’s still a big enough prize if their moat only protects the long tail.

Some examples of multi-billion dollar companies built on this moat:

Lexis Nexis built infrastructure to pull data from thousands of local governments and courthouses, aggregating legal records and public information about individuals
Plaid has built hundreds of technical integrations with financial services institutions, making it easier for data to flow through its pipes versus building a company’s own pipes
Bloomberg has built its position as the largest data vendor to investors because of the comprehensiveness of its data — making the long-tail of information available through its terminals
Change Healthcare became a “data clearinghouse” in healthcare, managing the many-to-many data flow between thousands of providers and hundreds of payers

Proprietary Data Moats

Many companies aim to have proprietary, differentiated data. While this can work as a short-term strategy, it is usually difficult to build a moat around. Typically there are substitutes to the data that exist, so gaining enough proprietary data that another aggregator can’t do the same is difficult.

This section will walk through various models of building a proprietary data set that can be defensible, and their pitfalls.

Moat #3: Proprietary Data from Exclusive Sources

One approach is for a data aggregator to create exclusive relationships with data originators. This is difficult: if the aggregator proves that the data is valuable, the data sources generally try to capture the value (rather than allowing the aggregator to do so); as a result, it works best in industries without a single dominant data source where no individual party has enough market power to extract concessions from the aggregator.

There are a few examples of this:

IQVIA established its initial dominance in healthcare by signing long-term exclusive relationships to obtain data rights from most major pharmacies (and then augmented this position by becoming a currency). No individual pharmacy had enough market power to disintermediate them, and the long-term nature of their contracts made it difficult for other aggregators to come into the picture.
Neustar built its business on top of a decades-long, government-granted exclusivity on the “Number Portability Administration Center” (NPAC) dataset. The NPAC data set tells telco companies which carrier manages a particular phone number, so is critical for successfully routing calls. While Neustar was able to build a successful business around this, this is also a case study of the pitfalls of exclusive data sources: in 2016, the government issued a new RFP, which was fiercely competitive as other companies saw Neustar’s profitability, and Neustar ultimately lost the rights to the data set.

Moat #4: Proprietary Data from Give-to-Get Models

Another approach to proprietary data is “give-to-get” models: companies have to share data, and in return they get access to data that has been shared with them. There are a number of industry-specific benchmarking tools and “data coops” that follow this model, and a handful of $1+ billion companies:

Credit Bureaus (Experian, Equifax, Transunion) all have business models where lenders are required to share with them information about defaults and payments, and that information is aggregated and made available to inform credit ratings
Tegus makes transcripts of expert interviews available to other Teagus members, so each interview enhances Teagus’ data asset
Glassdoor requires employees to provide salary data and feedback on their employers, and then they can research feedback and salaries at other companies

Moat #5: Proprietary Data Creation

Another approach is proprietary data creation. This is also a risky model — as other companies can often create similar data assets and make similar investments — but there are several examples of proprietary data assets that have stood the test of time:

Nielsen has a panel to understand TV viewing behavior of its members (and has used this to create a currency)
Gartner/Forrester/Consumer Reports are examples of expert-generated data.
Yelp is an example of user-generated data

Moat #6: Proprietary Exhaust Data

“Exhaust data” is data that comes from another line of business. While it is a near-impossible strategy to build from scratch, the largest data companies in the world generally employ this strategy.

Big tech (Amazon, Facebook, Google) all make large amounts of money from data, but the knowledge they have comes as exhaust from the “core” products they offer consumers.
Many retail companies (Walmart, CVS, Target, etc.) have built media networks and data businesses utilizing their knowledge of who is buying products and monetizing that for CPG companies.
Flatiron Health is a great example of a company built around secondary use of data. Flatiron built one of the first large businesses around oncology data. In order to aggregate oncology data, they own an Electronic Medical Record system for oncology practices, which is essentially a loss-leader to obtain data rights.

***

Data businesses are fiercely competitive, both with other data aggregators, but also with the other players in the value chain (such as the data sources), so moats are foundational to building a large data business. In fact, many of the best data businesses employ multiple of these moats and have several layers of defensibility.

In the decade ahead, there will be dozens more $1+ billion data businesses created, and thousands more created that won’t hit that level of success. The ones that succeed won’t just be the companies with the most comprehensive data or the most curation, but they’ll also require thoughtful strategies designed around strong network effects.