Last week, OpenAI shared a blog post targeted at teachers. In this post, they highlighted how some educators are using ChatGPT to assist in their teaching and offered suggestions to help them get started.
Additionally, in a related FAQ section, OpenAI recognized a well-known fact: “that AI writing detectors often don’t work well and can unfairly punish students with incorrect findings.”
In the FAQ section titled “Do AI detectors work?”, OpenAI straightforwardly stated, “In short, they don’t. Despite various attempts, including OpenAI’s own efforts, to create tools that claim to spot AI-generated content, none of these tools have consistently and reliably recognized between AI-generated and human-generated content.”
Limitations and Challenges in Detecting AI-Generated Text
Back in July, a close look at the shortcomings of AI writing detectors like GPTZero, which experts have often criticized for their limited effectiveness. These detectors frequently make mistakes by noticing content as AI-generated when it’s not, mainly because they rely on unproven methods of detection.
Ultimately, there’s no foolproof way to consistently tell apart AI-generated text from human-written text, and these detectors can be easily fooled by rephrasing. During the same month, OpenAI decided to discontinue its AI Classifier, an experimental tool designed to identify AI-generated text, primarily due to its very low accuracy rate of just 26 percent.
OpenAI, recent FAQ also clears up another common misunderstanding, which is the idea that ChatGPT can somehow tell if text was written by a human or AI. OpenAI explains, “ChatGPT doesn’t have the ability to recognize if content was created by AI.
Sometimes, it might give responses to questions like ‘did you write this [essay]?’ or ‘could this have been created by AI?’ But these answers are random and not based on any real knowledge.”
Similarly, OpenAI acknowledges that its AI models sometimes make up false information, a topic we’ve explored in detail at Ars. OpenAI states, “Occasionally, ChatGPT, can sound very convincing but may provide incorrect or misleading information, which is often referred to as ‘hallucination’ in the literature.” The company also cautions, “It can even invent details like quotes or references, so relying solely on it as your primary research source is not advisable.”
(As an example, in May, a lawyer faced consequences for doing just that—citing six imaginary cases that he got from ChatGPT.)
Human Insights and Clues for Spotting AI-Generated Writing
Even though automated AI detectors may not work well, it doesn’t mean that people can’t spot AI-generated writing. For example, a teacher who knows how a student typically writes can notice when there’s a sudden change in style or quality.
Furthermore, some careless attempts to pass off AI-generated content as human-written can accidentally reveal clues. For instance, if someone includes the phrase “as an AI language model,” it suggests that they simply copied and pasted ChatGPT’s output without checking it carefully.
Additionally, there was a recent article in the scientific journal Nature that showed how people identified the phrase “Regenerate response” in a scientific paper. This phrase is the label of a button in ChatGPT.
Given the current state of technology, it’s a good idea to avoid using automated AI detection tools altogether. As Ethan Mollick, a well-known AI expert and professor at Wharton, emphasized in an interview with Ars in July, “Right now, it’s tough to tell if writing is generated by AI or not, and it’s likely to stay that way. AI detectors often make a lot of mistakes, so using them isn’t a good idea.”
Conclusion
OpenAI’s recent insights highlight the challenges of AI writing detectors, emphasizing their unreliability. They advocate for human judgment to discern AI-generated content and underline the importance of identifying telltale signs. While AI technology evolves, relying solely on automated detectors remains unwise due to their significant flaws.