CMU Study Shows Large Language Models Have Distinctive Styles LLMs Can Be Distinguished by Word Choice, Level of Detail and More Thursday, April 10, 2025 It's not unusual for people to have distinctive speech or writing styles, but a recent study shows that text-generating AI models exhibit similar idiosyncrasies.It's not unusual for people to have distinctive speech or writing styles. They can favor certain words and phrases or structure a sentence or a story uniquely.It turns out that text-generating AI models have similar idiosyncrasies.In a recent study, Carnegie Mellon University researchers found they could use characteristic word choices to determine which large language model (LLM) generated a particular bit of text with 97% accuracy."It was quite surprising that we achieved this level of accuracy," said Mingjie Sun, a Ph.D. student in the Computer Science Department.Sun and the other researchers achieved only 60 to 70% accuracy when trying to distinguish between just two LLMs on their own. The team's specialized classifier program achieved superior results despite tackling a much more demanding task — differentiating between not two, but five LLMs: ChatGPT, Claude, Gemini, Grok and DeepSeek.The computer analysis revealed distinct profiles for each LLM. ChatGPT, for instance, tended to offer detailed, explanatory texts, while Claude favored concise, straightforward answers.These idiosyncrasies aren't superficial but deeply embedded in each model. Even when texts were scrambled, rephrased, translated or summarized, the personality or style of each LLM remained distinct.One implication of these findings is the need for caution when using synthetic data — text generated by LLMs — to train a new generation of models. This practice could pass the source model idiosyncrasies to the next generation of LLMs, potentially affecting the behavior of these AIs in unforeseen ways. Zico Kolter, professor and director of CMU's Machine Learning Department, said that while using synthetic data for training was once a widespread method, its use has been on the decline.One thing that the researchers did not do was try to discern the difference between AI-generated and human-generated text. Such an ability might be used in ferreting out academic fraud, but other research groups have already done a great deal of work on this topic. So, Sun, Kolter and colleagues at the University of California, Berkeley; the University of Pennsylvania; and Princeton University focused their research on obtaining a deeper scientific understanding of LLMs."This work is much more about understanding the distinctive characteristics, the natures of different LLMs, the same way we think about different styles of writing by people," Kolter said. "Given how much content is being produced by LLMs on the internet these days, it is valuable to understand the distinguishing characteristics of these various models."Learn more on the research's project page.Media Contact:Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu