
QUT Professor Endorses UK Push To Create Smokefree Generations
QUT Media4th November 2025 The United Kingdom Parliament is considering a bill aimed at making smoking obsolete, which has been
Comment endorsed by 16 members of the Global Expert Network on Copyright User Rights. (PDF)
We submit this comment in response to the World Intellectual Property Organization request in relation to its work on the impact of artificial intelligence (Al) on intellectual property (IP). We are members of the Global Expert Network on Copyright User Rights with particular interest in the application of copyright to the use of text and data mining technology, including for the purposes of machine learning and artificial intelligence (AI).
We comment here only on the copyright related questions in section 13. Some of our comments with regard to the framing of the questions and defining the differences between AI, machine learning and text and data mining may apply more broadly to the entire document.
I. NEW QUESTIONS
We first address elements that we propose WIPO add to the existing set of questions.
Defining Text and Data Mining, Machine Learning, and AI
As a threshold matter, all of the questions in this section (and perhaps the rest of the questionnaire) meld the definitions of text and data mining with machine learning and AI. As a result, many of the questions are confusing and difficult to answer accurately.
Text and data mining (TDM) should be used to refer to applying computational processes to materials (which could include copyrighted works) to derive data about those works. Machine learning and AI involve applying programing techniques to data (often derived from text and data mining) to enable machines to dynamically “learn” from the data inputted. Text and data mining have many other applications, including in medicine, humanities, and social science, that do not necessarily involve machine learning for the purpose of AI. Many of the copyright rules discussed in this section of questions would potentially affect text and data mining research that is both used to train AI and text and data mining research that may be unrelated to AI.
New Question on WIPO’s Role
Before moving to specific comments on the questions asked — we propose a question on the WIPO role on this issue:
What actions may WIPO take that may help balance the proper role of the copyright system in promoting creativity, disseminating knowledge, and fostering technological development in relation to the development of machine learning, artificial intelligence, and text and data mining?For example:
II. COMMENTS ON PROPOSED QUESTIONS
13(i). Should the use of the data subsisting in copyright works without authorization for machine learning constitute an infringement of copyright? If not, should an explicit exception be made under copyright law or other relevant laws for the use of such data to train AI applications?
We suggest that this question be rephrased as follows for the reasons expressed below:
Should existing law (including relevant exceptions of general applicability) be understood to permit applying computational processes to copyrighted works without authorization to derive data about those works, including for for the purposes of machine learning and AI, assuming the reproductions do not express the work to the public and even if such processes involve making temporary or ephemeral reproductions of the works studied?
Should existing law (including relevant exceptions of general applicability) be understood to permit the technical reproduction and storage of copyrighted works to enable the application of computational processes to derive data about those works, including for the purposes of machine learning and AI, assuming the reproductions do not express the work to the public?
Our proposed redrafted question focuses on the descriptive issue about the current state of the law because the normative question of what the law should be is addressed below.
The phrase “use of the data subsisting in copyright works without authorization” needs to be clarified throughout the questions in this section. The word “data” should be used with more precision. Text and data mining uses copies of copyrighted works as the “data” that is being analyzed or “mined”. The outputs of text and data mining analysis is data about the copyrighted works. That “data” does not “subsist in” those works, but rather is a product of observation of them.
The most relevant copyright question is whether and when temporary or more permanent copies of works may be made to enable text and data mining processes, including to train machines for AI. For this reason, the question should distinguish between at least two relevant categories of research using copyrighted works required in machine learning and AI, which may have very different treatment under copyright law:[1]
As currently phrased, the question could be answered negatively (“No, use of the data alone should not be considered infringement”) by parties who nevertheless believe that making of a corpus to facilitate machine learning and AI tools may require authorization or operation of a copyright exception.
13(ii) If the use of the data subsisting in copyright works without authorization for machine learning is considered to constitute an infringement of copyright, what would be the impact on the development of AI and on the free flow of data to improve innovation in AI?
We offer the following reformulation of the question:
13(ii). If copyright law in some or all countries were understood to prohibit applying computational processes to copyrighted works without authorization, or were understood to prohibit the making and storing of reproductions of works to create corpora to be mined, what would be the impact on development of text and data mining research, machine learning and AI?
WIPO could ask more specific sub-questions to draw attention to specific impact areas, e.g.:
13(iii) If the use of the data subsisting in copyright works without authorization for machine learning is considered to constitute an infringement of copyright, should an exception be made for at least certain acts for limited purposes, such as the use in non-commercial user-generated works or the use for research?
We suggest the following reformulation:
13(iii). If copyright laws were understood to prohibit applying computational processes to copyrighted works without authorization to derive data about those works, or were understood to prohibit the making and storing of reproductions of works to create corpuses to be mined, including for the purposes of machine learning, should new exceptions be made under copyright law or other relevant laws to enable such activities, and subject to what restrictions, if any?
We reiterate our concerns above about the “use of the data subsisting” formulation.
The current question asks about “limited purposes, such as the use in non-commercial user-generated works or for research.”
A canvassing of the existing research exceptions that may apply to allow text and data mining activities, including to train machine learning and AI, display at least nine different categories of internal limits, with different possible impacts on the field. The questions could ask what the benefits or drawbacks may be from including such limits in research rights as compared to the models that are more open.
Open exceptions with “fair” practice limits. U.S. and other fair use (e.g. Israel) or open fair dealing exceptions (e.g. Singapore, Malaysia), are “open” in the sense of potentially applying to any purpose — commercial and non-commercial; any use implicating an exclusive right (e.g. reproduction, storage, making available, etc.); all kinds of works ; and uses by all kinds of users .
The operative limitation in open exceptions is that the particular use must be “fair” to the rights holder. The fairness criteria includes assessment of any impact on the market for the work.
In a line of recent cases, the fair use right in U.S. law has been interpreted to permit the reproduction of copyrighted works to create a corpus for computational uses (including of the kind that could train AI), and to making the data from the corpus available to other researchers through a search tool, as long as the process used does not re-express works to the public in a way that could compete in the market for the work.[3]
Purpose restrictions. There is variation in how the purposes of exceptions are drafted between countries. Canada, and many other fair dealing countries have exceptions broadly applying to “research.”[4] Japan’s exceptions cover any non-expressive use[5] or “information analysis.”[6] The EU Copyright in the Digital Single Market” Directive (2019) allows acts of reproduction and extraction “for the purposes of text and data mining” by research organisations for scientific research purposes Article 3), or for any purposes but with the possibility to opt-out Article 4.[7]
Commercial use restrictions. Some research exceptions — including text and data mining exceptions passed in the EU before the most recent directive — are limited in their application to “non-commercial” research. WIPO should inquire into the application and impact of commercial use restrictions. How do these restrictions impact the growth of public-private partnerships[8] or public interest commercial activities like journalism? How can the line between commercial and non-commercial activities be drawn in regard to many broadly socially beneficial commercial text and data mining products, such as Internet search, language translation, and projects that seek to harness AI for the public good?[9]
Uses implicating exclusive rights. Many research exceptions, especially those based in fair use or fair dealing, potentially apply to any use that implicates an exclusive right. Others specify the uses that are authorized, thus potentially excluding application to other uses. Specified authorized uses included in some but not all current research exceptions include:
● reproduction of the corpus[10]
● making the corpus available to other researchers[11]
● adaptation[12]
● storage[13]
● extraction[14]
● reuse[15]
WIPO may ask what the implications may be of authorizing some, but not all, uses that may be needed in data mining and machine learning. For example, in many cases, researchers need to access works from a distance. Providing such access may involve the making available right, not only the reproduction right.
Works. All the specific exceptions, except France’s current law (which may need to be changed to comply with the DSM),[16 ]apply their research exception to all kinds of copyrighted works. WIPO may ask for examples where data mining is useful outside the strict confines of photographs and written text that most of the literature focuses on. For example, text and data mining of audiovisual works and broadcasts are used for a variety of purposes from media monitoring to the development of language translation tools.
Transfer and sharing. Germany is the only law to explicitly address uses needed to share a data mining corpus with other researchers. It permits the making available of a corpus only to a “specifically limited circle of persons for their joint scientific research, as well as to individual third persons” for quality assurance.[17] It does not appear to permit the making available of the corpus more broadly. Art. 3(2) and 4(2) of the EU DCDSM have different wording on the need for replicability, e.g. in order to ensure that the AI has been trained in a fair, transparent, and accountable manner. WIPO may ask about the circumstances when rights to reproduce and share a corpus are necessary to accomplish machine learning and digital research ends as well as to ensure public interest regulatory objectives.
Lawfully accessed source. Three of the specific exceptions for research require that the materials used to create a corpus be “lawfully accessed.” Other provisions are silent on this matter.[18] WIPO may ask what the implications of a restriction or silence may be on this matter.
Cross-border rights. Perhaps most importantly for WIPO, there is little legal certainty on whether and when a researcher can transfer, share, make available, or otherwise allow the use of a lawfully created research corpus in another country from that in which it was lawfully created. WIPO may ask whether and when cross border rights,including rights to reproduce and transfer a corpus, may be necessary for some kinds of beneficial research activities, including in the training of machines for AI.
Contract and TPM override. Notable examples of non-copyright barriers to digital research which could impede uses for machine learning and AI include contract law (e.g. purchasing or licensing restrictions on research uses) and prohibitions on the circumvention of technological protection measures.[19] WIPO should ask about such issues.
13(iv). If the use of the data subsisting of copyright works without authorization for machine learning is considered to constitute an infringement of copyright, how would existing exceptions for text and data mining interact with such infringement?
We propose deleting this question as it would be answered in response to our reformulated question 13(i).
13(v) Would any policy intervention be necessary to facilitate licensing if the unauthorized use of data subsisting in copyright works for machine learning were to be considered an infringement of copyright?
We would reformulate this question as follows:
13(v). In the absence of applicable exceptions, are there policy interventions that could facilitate licensing works for text and data mining research, including to train machines for AI? What would be the strengths and weaknesses of those interventions, and how could they be made to work across borders?
The essential problem for licensing solutions in this area is that “[t]raining data sets are likely to contain millions of different works with thousands of different owners,” such that “allowing a copyright claim is tantamount to saying, not that copyright owners will get paid, but that no one will get the benefit of this new use.”[20] Crafting a licensing mechanism to respond to these massive transaction costs would be exceedingly complicated. WIPO could ask specifically about some of those complications, e.g.:
13(vi). How would the unauthorized use of data subsisting in copyright works for machine learning be detected and enforced, in particular when a large number of copyright works are created by AI?
The question should be edited to make its phrasing consistent with other questions in regard to eliminating the “use of data subsisting in copyright works” formulation.
We propose adding the following question:
What would be the impact of different enforcement regimes including, for example, the over-deterrence that may result from application of statutory damages in cases of infringement of potentially millions of works in the act of training machine learning?
SIGNED:
Sean Flynn, Counsel of Record
Professorial Lecturer & Director
Program on Information Justice and Intellectual Property
American University Washington College of Law
SFlynn@wcl.american.edu
Michael Carroll
Professor of Law
American University Washington College of Law
Matthew Sag
Professor of Law
Loyola University, Chicago
Prof. Lucie Guibault
Associate Dean
Schulich School of Law, Dalhousie University
Dr. Thomas Margoni
Senior Lecturer
School of Law – CREATe Centre, University of Glasgow
Brandon Butler
Director of Information Policy
University of Virginia Library
Allan Rocha de Souza
Lawyer, Professor & Researcher
Federal University of Rio de Janeiro
Dr. Maja Bogataj Jančič, LL.M.
Founder & Member
Intellectual Property Institute
Peter Jaszi
Professor of Law Emeritus
American University Washington College of Law
Dr. João Pedro Quintais
Post-doctoral Researcher
Institute for Information Law (IViR), University of Amsterdam
Christophe Geiger
Professor of law
Director of the Research Department
Center for International Intellectual Property Studies (CEIPI), University of Strasbourg
Caroline Ncube
Professor of Law
University of Cape Town
Ben White
Doctoral Researcher
Centre for Intellectual Property Policy & Management, Bournemouth University
Arul George Scaria, Ph.D.
Professor of Law
National Law University, Delhi
Carolina Botero
Executive Director
Karisma Foundation – Colombia
Dr. Carys Craig
Associate Professor of Law
Osgoode Hall Law School, York University, Toronto
FOOTNOTES

QUT Media4th November 2025 The United Kingdom Parliament is considering a bill aimed at making smoking obsolete, which has been
Speaking at the Global Expert Network on Copyright User Rights Symposium on 16 June 2025, Professor Christophe Geiger argues for
On 25 September 2025, Professor Wend Wendland, delivered the 14th Peter Jaszi Distinguished Lecture at American University in Washington D.C..
On September 18, 2025, the Italian Senate definitively approved the country’s first comprehensive framework law on artificial intelligence (AI). The
Por Andrés Izquierdo Durante la segunda semana de agosto, fui invitado a hablar en la Feria Internacional del Libro de
By Andrés Izquierdo AI, Copyright, and the Future of Creativity: Notes from the Panama International Book FairDuring the second week
