AI text detectors ‘biased against non-native English speakers’

Methods used by new tools ‘inadvertently flag’ work written by those who tend to use smaller variety of words and phrases

July 11, 2023

Twitter: @TWilliamsTHE

Source: iStock

Tools designed to detect whether academic writing has been generated by artificial intelligence “inherently discriminate” against non-native English speakers, a study has found.

Researchers tested the performance of seven widely used detectors on 91 essays that had been written by Chinese students as part of their Test of English as a Foreign Language (Toefl) exams. More than half were incorrectly labelled as “AI-generated”, equivalent to an average false-positive rate of 61.3 per cent.

The study – published by Stanford University academics Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu and James Zou in the journal Patterns – also analysed the detectors’ performance when presented with 88 eighth-grade essays written by American students and found that these were accurately classified.

Campus resource: How to use ChatGPT to help close the awarding gap

“The design of many GPT detectors inherently discriminates against non-native authors, particularly those exhibiting restricted linguistic diversity and word choice,” the authors conclude, adding that they believe the findings emphasise “the need for increased focus on the fairness and robustness of GPT detectors”.

ChatGPT’s emergence late last year sparked the launch of several AI writing detectors, all claiming high degrees of accuracy. Major players such as Turnitin have vied with start-ups and apps created by students to become the go-to detector used by universities concerned about whether students are using AI to cheat in tests.

Detectors use “text perplexity” to spot AI-generated text, the study explains, meaning that they predict what will be the next word in a sentence, mirroring the methods used by the text generators themselves. If words are easy to predict, text perplexity is low and AI is more likely to have been used; if the next word is hard to predict, text perplexity will be high.

Because non-native speakers often have a smaller vocabulary and “exhibit less linguistic variability”, they are more likely to be inadvertently penalised, the study finds.

The authors were also able to fool the detectors by prompting ChatGPT to self-edit its text by adding “more literary language” and therefore increasing the text perplexity. This caused detection rates to “plummet to near-zero”.

“The implications of GPT detectors for non-native writers are serious, and we need to think through them to avoid situations of discrimination,” the study concludes.

Potential repercussions include researchers from non-English-speaking countries being excluded from academic conferences or journals that prohibit the use of GPT, it warns.

“Non-native students bear more risks of false accusations of cheating, which can be detrimental to a student’s academic career and psychological well-being,” the paper adds. “Even if the accusation is revoked later, the student’s reputation is already damaged.”

Non-native speakers might also “ironically” be forced to turn to ChatGPT to develop their writing, the study suggests, because it can be used to “refine their vocabulary and linguistic diversity to sound more native”.

In light of the findings, the authors said it was “crucial” that “more robust and equitable methods” be developed by the companies creating AI detectors and that their use in educational settings be curtailed until then.

“Even for native English speakers, linguistic variation across different socioeconomic backgrounds could potentially subject certain groups to a disproportionately higher risk of false accusations,” they warn.

Detectors should not use a “one-size-fits-all approach” and instead be designed in collaboration with users and be benchmarked against diverse writing samples “that reflect the heterogeneity of users”, it adds.

They should also be subjected to “rigorous evaluation”, and users should be better made aware of their potential flaws.

tom.williams@timeshighereducation.com

Read more about

Read more about:

Academic integrity

Technology and new media

Register to continue

Why register?

Registration is free and only takes a moment
Once registered, you can read 3 articles a month
Sign up for our newsletter

Register

Subscribe

Or subscribe for unlimited access to:

Unlimited access to news, views, insights & reviews
Digital editions
Digital access to THE’s university and college rankings analysis

Please Login or Register to read this article.

Related articles

A robot hand presses a computer space bar, symbolising ChatGPT

It is too easy to falsely accuse a student of using AI: a cautionary tale

Based solely on a Turnitin report, Emily was condemned for using ChatGPT to write her essay. Except that she hadn’t, writes Daniel Sokol

By Daniel Sokol

10 July

Close up of a man using a mobile phone with an AI chatbot

ChatGPT forcing ‘fruitful conversations’ about assessment

Students’ sometimes unexpected use of AI, from generating emails to translating their original work, is encouraging a ‘rethink’ by academics

4 July

Turnitin announces AI detector with ‘97 per cent accuracy’

Edtech giant prepares to offer customers new tool from April as it grapples with challenges posed by ChatGPT

By Tom Williams

14 February

Montage of metal detectorists on beach with newsprint. To illustrate the scramble to create AI essay detectors.

Inside the post-ChatGPT scramble to create AI essay detectors

Edtech giants and plucky start-ups are vying to create potentially lucrative tools to combat the use of AI in assessments, but will they cause more problems than they solve?

By Tom Williams

6 February

Sponsored