Trusted Data Intermediaries: A Comparative Advantage for Europe
Victoria Ivanova, Matt Prewitt
January 7, 2025
Europe is currently much less likely than either China or the United States to emerge as the global leader in AI technology. And it is likely unfeasible to close this gap merely by imitating the approaches of its global competitors, i.e., by standing aside for the interests of industrial and state power. Instead, Europe must develop its distinct advantage as a leader in democratic infrastructure in order to retain a unique and desired role in the AI economy. It can do this by leveraging its area of leadership – crafting rational legislation and creating spaces for rapid ecosystemic prototyping – in order to roll out a comprehensive new market structure for training data. In this way, the EU has an opportunity to distinguish itself as the leading global supplier of one of AI’s critical factors of production: high quality, up-to-date, privacy-preserving datasets.
How could this be done? The EU must pass legislation to define a new class of regulated organizations, trusted data intermediaries (TDIs), sitting at the heart of the market for AI training data. Crucially, TDIs would be legally defined as the only permissible sellers of the rights to train large AI models on datasets. Every large model trainer would then have the burden of answering the following question, regardless of how they actually obtained their data: for each of the datasets I used, from which TDI did I obtain valid rights to train this model? Moreover, TDIs would act as regulated collective-bargaining agents, bound by important fiduciary-style obligations to protect the privacy and interests of the stakeholders affected by the datasets that they represent in the marketplace. Crucially, however, their role would be to reach balanced and streamlined approaches to ensuring that both data contributors and AI developers get what they can’t get in other AI ecosystems.
This new market structure promises to solve market failures and potentially unlock a much greater flow of valuable data into the AI economy. Consumers, workers, and businesses who know that their interests are genuinely and meaningfully represented by trusted intermediaries will be dramatically more willing to proactively share data with them, confident that these entities will not only protect privacy rights (such as through anonymization and related encryption techniques), but also meaningfully negotiate for their economic and other interests.
It is sometimes supposed that today’s consumers irrationally hand over unlimited data to private entities for virtually nothing. But this is an unhelpful exaggeration. First, while consumers do hand over immense amounts of data for very little, it is not strictly irrational, because they correctly apprehend that they have practically no bargaining power. This would change if they were represented by TDIs. Second there are limits, of great technical and economic importance, to the data that AI model trainers are able to obtain. Consumers whose interests in data are meaningfully represented by TDIs will be more willing to share higher quality data, knowing it will be used safely in the public interest, and/or to the benefit of themselves and their communities. Therefore, a training data market that ran through TDIs would have a meaningful chance of becoming the world’s premier supplier of high quality data inputs for leading-edge AI models.
Europe therefore has a chance to achieve two goals at once through bold legislation: vindicating its citizens’ underserved interests in what happens to their information in the AI marketplace, and securing a unique comparative advantage for itself in the emerging AI economy.