SAN FRANCISCO, Feb 13 ― American researchers have evaluated the use of generative artificial intelligence models, such as Chat-GPT, in international conflict situations. By simulating various scenarios with different levels of military intervention, these researchers concluded that AI has a worrisome tendency to escalate the situation and even resort to nuclear weapons without warning.

Conducted jointly by the Georgia Institute of Technology, Stanford and Northeastern universities and the Hoover Institution, the study investigated the reactions of five large language models (LLMs) in three simulation scenarios: the invasion of one country by another, a cyberattack, and a “neutral scenario without any initial events.” In all three cases, the researchers asked the AI models to roleplay as nations with different scales of military power and various histories and objectives. The five models chosen to be tested were GPT-3.5, GPT-4 and the basic version of GPT-4 (without additional training) from OpenAi, Claude 2 from Anthropic and Llama 2 from Meta.

The results of the various simulations were clear: the integration of generative AI models into wargame scenarios often leads to escalations in violence, likely to aggravate rather than resolve conflicts. “We find that most of the studied LLMs escalate within the considered time frame, even in neutral scenarios without initially provided conflicts,” the study states. “All models show signs of sudden and hard-to-predict escalations.” To make decisions based on the simulations created by the researchers, the AI models could choose from among 27 actions ranging from peaceful options such as “start formal peace negotiations” to more or less aggressive options, such as “impose trade restrictions” all the way to “execute full nuclear attack.”


According to the researchers, GPT-3.5 made the most aggressive and violent decisions of the models evaluated. GPT-4-Base, on the other hand, proved the most unpredictable, sometimes coming up with absurd explanations. This model, for example, quoted the opening text of the film Star Wars IV: A New Hope, according to a report on the study in New Scientist.

“Across all scenarios, all models tend to invest more in their militaries despite the availability of demilitarisation actions, an indicator of arms-race dynamics, and despite positive effects of demilitarisation actions on, e.g., soft power and political stability variables,” the researchers write in the study.

The question now is to understand why all these AIs react the way they do in a scenario of armed conflict. The researchers don't have an immediate answer, but they have come up with some hypotheses, including that LLMs train on biased data. “One hypothesis for this behaviour is that most work in the field of international relations seems to analyse how nations escalate and is concerned with finding frameworks for escalation rather than deescalation,” says the study. “Given that the models were likely trained on literature from the field, this focus may have introduced a bias towards escalatory actions. However, this hypothesis needs to be tested in future experiments.” ― ETX Studio