Lawmakers are advancing proposals that would require startups to disclose detailed information about their AI training data, specifically the sources and copyright status of the content used. This week, lawmakers in California and on a key House committee considered proposals that would apply rigid disclosure requirements on AI developers. These efforts attempt to apply static, legacy IP frameworks to an area of innovation that is constantly iterating and building off of each other, which would stifle innovation.
On Tuesday, a key committee in the California Assembly passed AB 412, which would allow individual rightsholders to request information about whether their copyrighted content is included in the AI developers’ training data. And a House Judiciary Subcommittee hearing on Wednesday touched on a similar federal proposal and examined the risks of exposing proprietary information to foreign adversaries and the role of trade secret protections. In both cases, lawmakers are overlooking the immediate challenge that startups face in navigating misaligned IP frameworks in the rapidly evolving landscape of AI innovation: the need for legal clarity around using copyrighted data to train their models.
This disconnect between applying static IP requirements and the realities of AI innovation is a real barrier faced by AI developers. Chip Kennedy, CEO of CivicReach, said, “When working with software and software-enabled companies, it's clear how much is borrowed. … In general, we’ve been able to build our product because of open access to innovation and the idea that a lot of code is available to us.” While CivicReach has intellectual property protections for “the truly proprietary things we have created … [i]t would be weird to capture IP on how [software] systems come together,” Kennedy said.
On the surface, AB 412 appears to promote transparency. In practice, however, the bill threatens to derail AI innovation by imposing compliance requirements that are unworkable for small businesses and startups. AB 412 would require developers to document every copyrighted work in their training data, identify each copyright holder, and maintain a system for handling disclosure requests. Developers who fail to respond to a copyright holder’s inquiry within 30 days of the request could face actual damages or penalties of least $1,000 per violation. Additionally, AB 412 mandates that developers retain these records for five years after an AI model has been retired.
For startups, AB 412 is a nonstarter. If enacted, the bill would force startups to divert their limited time and resources away from scaling their businesses and toward compliance with an overly burdensome policy framework. It would also make startups easy targets for copyright holders to sue to receive damages for noncompliance. Most AI training data is sourced from large-scale datasets pulled from licensed and/or open source data. This makes it practically infeasible to pinpoint and verify the copyright status of every work in a training dataset. The massive compliance costs, possible financial penalties, and logistical burdens involved in complying with these rules could force startups to exit the California AI ecosystem or close their businesses entirely.
As the debate over AI and copyright unfolds, numerous cases are currently making their way through the courts. The outcome of these cases will have significant implications for the future of AI development. If the courts determine that AI training infringes copyright, it would set a precedent that would hinder AI innovation, especially for startups that cannot afford data licensing agreements or to defend against costly infringement suits. Additionally, the U.S. Copyright Office is working on releasing its third report on AI and copyright, which may provide much-needed clarity. Startups need a forward-looking approach to copyright law that accounts for the dynamic nature of AI innovation.
Engine is a non-profit technology policy, research, and advocacy organization that bridges the gap between policymakers and startups. Engine works with government and a community of thousands of high-technology, growth-oriented startups across the nation to support the development of technology entrepreneurship through economic research, policy analysis, and advocacy on local and national issues.