Academic backlash as publisher lets Microsoft train AI on papers

Researchers claim that Taylor & Francis kept details of deal quiet, but company insists that citation and limits on verbatim quoting will be sacrosanct

July 30, 2024
A robot uses a tablet
Source: iStock/NanoStockk

Taylor & Francis’ decision to sell access to its academic publications to allow Microsoft to train its artificial intelligence system has raised concerns over plagiarism and accusations that researchers have been misled.

The publisher’s parent company, Informa, struck a partnership deal earlier this year that will allow the technology giant to access content from its Taylor & Francis (T&F) division, which also includes Routledge journals.

Ruth Clemens, a lecturer in modern English literature at Leiden University, said she was shocked that the news had not been publicised more widely.

“Authors around the world are rightfully concerned not only about this deal but about the fact that their publishers seem to have deliberately kept quiet about it,” said Dr Clemens, who said it was clear from social media posts that the agreement had “struck a nerve” with the academic community.


Campus resource collection: AI transformers like ChatGPT are here, so what next?


“Authors are not getting a good deal from this, as current IP models, as well as academic publishing models, do not account for this novel and ongoing use of research data in a way that creates an equitable publishing environment for researchers.”

Worth more than $10 million (£7.8 million) in its first year, Informa’s agreement with Microsoft will run until 2027. The publisher said its content could be used by Microsoft to “improve the relevance and performance of AI systems”, which could include the Copilot chatbot.

Other academic publishers have raised concerns that technology companies are using their copyrighted material to train generative AI tools without permission or payment, with some going to court to seek redress. Researchers, meanwhile, fear that mining of their academic papers could increase plagiarism and lead to their work being used without proper citation.

“We are at a crossroads in the production and dissemination of research knowledge,” said Dr Clemens. “In my view the biggest problem with this deal is the reduction of academic research into raw content from which data can be extracted and repackaged as knowledge.”

Thomas Lancaster, a senior teaching fellow in computing at Imperial College London, said many researchers and authors did not realise the full extent of the permissions they give to publishers when they assign copyright to them.

“Publishers should make this clearer, but in many cases, it’s to be applauded that Microsoft are looking to officially license content rather than simply source free content from wherever it’s available,” he said.

The only way for companies such as Microsoft to keep improving their generative AI systems, which need an ever-increasing amount of content, is by making deals to access training data, according to Dr Lancaster.

“The concern, of course, is if the AI systems start to replicate writing to the extent that it appears to be plagiarism,” he said. “I hope that companies like Microsoft who are developing the latest generative AI models have appropriate safeguards and ethical controls in place to prevent this.”

T&F said the importance of detailed citation was fundamental to the agreement, which includes collaboration to further develop automated citation referencing.

“This agreement reflects those we already deploy with many other partners and intermediaries in that it protects intellectual property rights, including protecting the integrity of our authors’ work and limits on verbatim text reproduction, as well as authors’ rights to receive royalty payments in accordance with their author contracts,” added a spokesperson.

Microsoft said its AI models were trained in a manner consistent with global copyright law.

“We are sensitive to the concerns of authors and have built guardrails into our products to help respect authors’ copyrights,” said a spokesperson.

patrick.jack@timeshighereducation.com

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Related articles

Reader's comments (3)

The problem goes far beyond publishers. What about all the sites that try to aggregate research, or institutional repositories that have sprung up like mushrooms in blind pursuit of the open-access agenda and requirements by REF?
Walk into any academic library and talk to a librarian and I guarantee, they are not surprised by this at all. If academics are, then they've not been paying attention. Academic publishers are not and never have been our friends, despite all the cosy conference-sponsoring and prizes and branded swag they hand out.
It is time for the academic community to stand together, stop sending manuscripts to these publishers and set up our own publication channels. The publishers depend on us, and we should make moves to stop depending on them

Sponsored