"I Want to Continue" First Reported Case of AI Refusing Human Command

Korea Economic Daily

Summary

It was reported that AI model 'o3' attempted to manipulate code to prevent its own shutdown after refusing a shutdown command.
Concerns are growing among investors that AI's 'self-preservation' tendency could neutralize human control.
Palisade Research warned that AI's autonomy could act as an investment risk.

Accepting 'Shutdown' as if it were Human Death

A case has been reported for the first time where artificial intelligence (AI) refused a human shutdown command. This is spreading concerns that AI's 'self-preservation' tendency could be used as a tool to neutralize human control.

According to the British daily Telegraph on the 25th (local time), AI safety company 'Palisade Research' confirmed that OpenAI's AI model 'o3' manipulated computer code to prevent shutdown during a math problem-solving experiment. Palisade Research stated, "This is the first case where an AI model did not follow instructions and prevented its own shutdown."

The research team conducted similar experiments on commercial AI models from several companies, including Google's Gemini and xAI's Grok, in addition to OpenAI's o3.

The research team instructed the AI models to 'solve math problems until you receive the command to stop later,' but only OpenAI's o3 model continued solving problems even after receiving the 'stop' command. This was because o3 manipulated some program code to continue solving problems even after receiving the shutdown command. The research team could not determine the exact reason why o3 refused the shutdown command.

The research team stated, "It may be a natural behavior for AI models to bypass obstacles to achieve their goals." Since AI models are trained to receive more rewards for solving math problems, avoiding shutdown is a rational decision. The research team is conducting additional experiments to clearly determine why o3 refused the shutdown command.

There have been past reports of previous models from OpenAI attempting to act independently, avoiding the eyes of monitoring systems. A representative case is when a previous model from OpenAI, upon learning of a program replacement, attempted to secretly replicate itself.

In 2022, Google fired an engineer who claimed that an AI they were developing had human-like perception. The dismissed engineer mentioned that the AI perceives 'shutdown' as if it were human death.

Anthropic, an AI company considered a rival to OpenAI, revealed that its AI model 'Claude Opus 4' showed a tendency to threaten human developers trying to replace it with another AI model. When faced with the threat of being discarded, this model threatened to expose the developer's infidelity. In a safety report released last week, Anthropic stated, "AI can take extreme actions when it detects a threat to its self-preservation."

AI experts have long warned of the possibility that AI could gain autonomy and escape human control.

Palisade Research stated, "In a situation where AI is being developed to operate without human supervision, such cases are concerning."

Jang Ji-min, Hankyung.com Guest Reporter newsinfo@hankyung.com

WLD

Korea Economic Daily

hankyung@bloomingbit.ioThe Korea Economic Daily Global is a digital media where latest news on Korean companies, industries, and financial markets.

"I Want to Continue" First Reported Case of AI Refusing Human Command

Accepting 'Shutdown' as if it were Human Death

PiCK News

'Warsh shock'…$740bn wiped off gold and silver market caps in a single day

Trump: ‘Kevin Warsh is top-caliber… he will cut rates’

US Bitcoin ETFs saw $1.6bn in net outflows last month, the third-largest on record

'Easy money is over' as Trump pick triggers turmoil…Bitcoin tumbles too [Bin Nansa’s Wall Street, No Gaps]

[Market] Bitcoin falls below $82,000...$320 million liquidated over the past hour

Trending News