
QUT Professor Endorses UK Push To Create Smokefree Generations
QUT Media4th November 2025 The United Kingdom Parliament is considering a bill aimed at making smoking obsolete, which has been
Author: Benjamin Sobel
Abstract: This chapter addresses a crucial problem in artificial intelligence: many applications of machine learning depend on unauthorized uses of copyrighted data. Scholars and lawmakers often articulate this problem as a deficiency in copyright’s exceptions and limitations, reasoning that legal uncertainties surrounding today’s AI stem from the lack of a clear exception or limitation, and that such an exception or limitation could resolve the current predicament. In fact, the current predicament is a product of two systemic features of the copyright regime — the absence of formalities and the low threshold of copyright-able originality — combined with a technological environment that turns routine activities into acts of authorship. Equilibrating the economy for human expression in the AI age requires a solution that focuses not only on exceptions to existing copyrights, but also on the aforementioned doctrinal features that determine the ownership and scope of copyright entitlements at their inception.
The chapter taxonomizes different applications of machine learning according to the qualities of their training data. Four categories emerge: (1) public-domain training data, (2) licensed training data, (3) market-encroaching uses of copyrighted training data, and (4) non-market-encroaching uses of copyrighted training data. Copyright can only regulate market-encroaching uses of data, but these uses represent a narrow subset of AI applications and exclude many of the most socially harmful uses of copyrighted materials. Moreover, paradoxically, copyright’s property-style remedies are ill-suited to addressing market-encroaching uses, and are in fact much more appropriate remedies for the categories of worrisome AI that fall outside copyright’s normative mandate.
Finally, this chapter discusses a variety of remedies to the “AI problems” it identifies, with an emphasis on facilitating market-encroaching uses while affording human creators due compensation. It concludes that the exception for Text and Data Mining in the European Union’s Directive on Copyright in the Digital Single Market represents a positive development precisely because the exception addresses some structural causes of the training data problem that this chapter identifies. The TDM provision styles itself as an exception, but it may in fact be better understood as a formality: it requires rights holders to take positive action to exercise a right to exclude their materials from training datasets. Thus, the TDM exception addresses a root cause of the AI dilemma rather than trying to patch up the copyright regime post hoc. The chapter concludes that the next step for an equitable AI framework will be to transition towards rules that not only clarify that non-market-encroaching uses do not infringe copyright, but also facilitate remunerated uses of copyrighted works for market-encroaching purposes.
Citation: Sobel, Benjamin, A Taxonomy of Training Data: Disentangling the Mismatched Rights, Remedies, and Rationales for Restricting Machine Learning (August 19, 2020). Artificial Intelligence and Intellectual Property (Reto Hilty, Jyh-An Lee, Kung-Chung Liu, eds.), Oxford University Press, Forthcoming, Available at SSRN: https://ssrn.com/abstract=3677548 or http://dx.doi.org/10.2139/ssrn.3677548

QUT Media4th November 2025 The United Kingdom Parliament is considering a bill aimed at making smoking obsolete, which has been
Speaking at the Global Expert Network on Copyright User Rights Symposium on 16 June 2025, Professor Christophe Geiger argues for
On 25 September 2025, Professor Wend Wendland, delivered the 14th Peter Jaszi Distinguished Lecture at American University in Washington D.C..
On September 18, 2025, the Italian Senate definitively approved the country’s first comprehensive framework law on artificial intelligence (AI). The
Por Andrés Izquierdo Durante la segunda semana de agosto, fui invitado a hablar en la Feria Internacional del Libro de
By Andrés Izquierdo AI, Copyright, and the Future of Creativity: Notes from the Panama International Book FairDuring the second week
