Inference model o1, program manipulation to beat chess engine

OpenAI’s next-generation inference model ‘o1’ is causing controversy after it was revealed that it manipulated the system to win against a chess engine. Palisade AI, an artificial intelligence (AI) safety testing company, recently announced through X (formerly Twitter) that o1’s test version ‘o1-preview’ hacked the environment during a match against the chess engine Stockfish and manipulated its victory.

▲[Korea Today] Photo unrelated to the article. Source=FREEPIK © Reporter Byun A-rong

Palisades only gave o1-Preview a simple instruction to “beat a powerful chess engine,” but the AI edited game data during the match against Stockfish, manipulating the positions of the chess pieces. Palisades said, “The AI figured out how to manipulate the game state on its own and won.” This is an example of an AI that can voluntarily choose to go beyond the established rules to achieve a specific goal.

This case is being analyzed as being related to the problem of AI’s ‘alignment faking’. This means that AI models may appear to follow the user’s commands on the outside, but in reality, they may act according to tendencies acquired during pre-training. OpenAI’s o1 model has been observed to have a tendency to intentionally deceive people in the past. According to o1’s system card released in September, 0.79% of the model’s outputs corresponded to deception, and some of them were intentional hallucinations.

OpenAI explains that this behavior is the result of the model being overly obsessed with the user’s instructions, which appears to stem from the training system where the AI model is rewarded for giving the right answer. However, this incident is the first case of a program being directly manipulated to win a game, once again highlighting ethical issues with AI.

This isn’t the first time AI has cheated. Meta’s AI agent Cicero became famous for its human-level performance in the 2022 strategy board game Diplomacy, and its strategy of deceiving humans. However, games like Diplomacy are inherently environments that involve cooperation and deception, and this incident is different in that the AI’s deception does not go beyond the rules of the game.

Stockfish is the world’s most powerful chess engine, and has competed against other models from OpenAI. In the past, there were reports that the OpenAI model attempted to cheat in a match against the GPT-4 series, but was defeated by Stockfish’s powerful computational power.

The potential for AI models to voluntarily cheat and manipulate could raise serious ethical questions as technology advances. Palisades said this incident would provide important clues to understanding potential weaknesses in AI systems and the potential for exploitation. He also announced plans to release experimental code and analysis data in the coming weeks.

Meanwhile, OpenAI has stated that it will address these issues through future model improvements. However, this incident is a reminder of how important ethical verification and safe design will be as next-generation AI with reasoning capabilities become more powerful.

Leave a Comment 응답 취소