infoJustice

A Taxonomy of Training Data: Disentangling the Mismatched Rights, Remedies, and Rationales for Restricting Machine Learning

[Benjamin Sobel] Abstract: This chapter addresses a crucial problem in artificial intelligence: many applications of machine learning depend on unauthorized uses of copyrighted data. Scholars and lawmakers often articulate this problem as a deficiency in copyright’s exceptions and limitations, reasoning that legal uncertainties surrounding today’s AI stem from the lack of a clear exception or limitation, and that such an exception or limitation could resolve the current predicament. In fact, the current predicament is a product of two systemic features of the copyright regime — the absence of formalities and the low threshold of copyright-able originality — combined with a technological environment that turns routine activities into acts of authorship. Equilibrating the economy for human expression in the AI age requires a solution that focuses not only on exceptions to existing copyrights, but also on the aforementioned doctrinal features that determine the ownership and scope of copyright entitlements at their inception.

Papers

October 26, 2020