The UK plans to regulate the use of copyright in AI training.

The United Kingdom is developing measures to regulate the use of copyrighted content by technology companies for training their artificial intelligence models.

The UK government began consultations on Tuesday, aiming to clarify for both the creative industries and artificial intelligence developers how intellectual property is obtained and used by AI firms for training purposes.

Some artists and publishers are unhappy with how their content is freely collected by companies like OpenAI and Google to train their large language models—AI models that learn from vast amounts of data to generate human-like responses.

Large language models are a foundational technology behind modern generative AI systems, including ChatGPT by OpenAI, Gemini by Google, and Claude by Anthropic.

Last year, The New York Times filed a lawsuit against Microsoft and OpenAI, accusing the companies of infringing its copyrights and misusing intellectual property to train large language models.

In response, OpenAI denied the accusations, stating that the use of open web data to train AI models should be considered “fair use” and that the company provides an “opt-out” for rights holders “because it is the right thing to do.”

Separately, the image distribution platform Getty Images sued another AI development firm, Stability AI, in the UK, accusing it of extracting millions of images from its websites without consent to train its Stable Diffusion AI model. Stability AI contested the lawsuit, stating that the training and development of its model took place outside the UK.

Proposals for consideration: Firstly, the consultation will consider the possibility of an exception to copyright law for training artificial intelligence if it is used for commercial purposes, while allowing rights holders to retain their rights to control how their content is used.

Secondly, measures will be proposed to help creators license and receive compensation for their content used in training AI models and to provide AI developers with clarity on what material can be used to train their models.

The government stated that both creative industries and technology companies must do more to ensure that any standards and requirements for rights retention and transparency are effective, accessible, and widely accepted.

The government is also considering proposals that would require AI model developers to be more transparent about their training datasets and how they are obtained so that rights holders can understand when and how their content is used for training AI.

This could prove controversial, as technology firms are generally not willing to disclose data that powers their algorithms or how they train them, given the commercial sensitivity tied to revealing such secrets to potential competitors.

Previously, under former Prime Minister Rishi Sunak, the government tried to adopt a voluntary copyright code for AI.

Copyright Rules for AI: UK vs US In a recent interview with CNBC, Appian software company CEO Matt Calkins stated that the UK is in a strong position to be a “global leader” in this regard.

“The UK has staked a claim by saying it favors intellectual property rights,” Calkins said, pointing to the 2018 Data Protection Act as an example of how the UK is closely tied to intellectual property rights.

The UK is also not subject to the same overwhelming lobbying pressure from domestic AI leaders as the US, Calkins added, claiming that UK policymakers may be less susceptible to pressure from tech giants than their US counterparts.

“In the US, anyone writing AI laws will hear from Amazon, Oracle, Microsoft, or Google before that bill ever gets to a committee,” he said.

“That’s a powerful force that prevents anyone from writing sensible legislation or defending the rights of those whose intellectual property is wholesale taken by these big AI players.”

The issue of potential copyright infringement by AI companies is becoming more pronounced as tech firms move toward more “multimodal” AI systems, i.e., systems that can understand and create content in different forms — images, videos, and text.

Last week, OpenAI released its video-generation model, Sora, in the US and most countries worldwide. The tool allows users to input a scene description and generate a high-definition video clip.