Artificial intelligence (AI) chatbots like ChatGPT have been designed to replicate human speech as closely as possible to improve the user experience.

But as AI gets more and more sophisticated, it's becoming difficult to discern these computerised models from real people.

Now, scientists at University of California San Diego (UCSD) reveal that two of the leading chatbots have reached a major milestone.

Both GPT, which powers OpenAI's ChatGPT, and LLaMa, which is behind Meta AI on WhatsApp and Facebook, have passed the famous Turing test.

Devised by British WWII codebreaker Alan Turing Alan Turing in 1950, the Turing test or 'imitation game' is a standard measure to test intelligence in a machine.

An AI passes the test when a human cannot correctly tell the difference between a response from another human and a response from the AI.

'The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test,' say the UCSD scientists.

'If interrogators are not able to reliably distinguish between a human and a machine, then the machine is said to have passed.'

Researchers used four AI models – GPT-4.5 (released in February), a previous iteration called GPT-4o, Meta's flagship model LLaMa, and a 1960s-era chat programme called ELIZA.

The first three are 'large language models' (LLMs) – deep learning algorithms that can recognise and generate text based on knowledge gained from massive datasets.

The experts recruited 126 undergraduate students from University of California San Diego and 158 people from online data pool Prolific.

Participants had five-minute online conversations simultaneously with another human participant and one of the AIs – but they didn't know which was which and they had to judge which they thought was human.

When it was prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73 per cent of the time – more often than the real human participant was chosen.

Such a high percentage suggests people were better than chance at determining whether or not GPT-4.5 is a human or a machine.

Meanwhile, Meta's LLaMa-3.1, when also prompted to adopt a humanlike persona, was judged to be the human 56 per cent of the time.

This was 'not significantly more or less often than the humans they were being compared to', the team point out – but still counts as a pass.

