As OpenAI’s next-generation inference model ‘o1’ emerges as a new standard in the AI industry, Chinese researchers and companies are making full-fledged attempts to catch up. In particular, innovative frameworks and technologies based on reinforcement learning (RL) are being announced one after another and are attracting attention.
|
Fudan University and Shanghai AI Research Institute recently announced a model reproduction roadmap centered on RL, which is considered the core technology of ‘o1’. The study presented a method to replicate and improve the o1 model through four core elements: ▲policy initialization ▲reward design ▲search ▲learning.
In addition, the research team at Tsinghua University announced a new model development method using RL, ‘PREM (Process Reinforcement through IMplicit Rewards)’, and developed the ‘Eurus-2-7B-PRIME’ model based on it. This model recorded a performance of 26.7% in the American Institute of Mathematical Entrepreneurs (AIME) benchmark, outperforming GPT-4o and the source model QONE2.5-MATH.
“RL-based rewards play a key role in enhancing the model’s inference ability,” the researchers said. “We have demonstrated that this method is effective in reproducing high-performance models such as OpenAI’s o1.”
Recently, in China, various research institutes such as Tencent, Nanjing University, and Shanghai Jiao Tong University have been announcing research targeting o1. Tencent proposed a method to solve the problem of overthinking in inference models, and Nanjing University proposed a framework to maintain accuracy while reducing token usage.
AI companies are also quickly catching up. DeepSec’s ‘V3’ model is considered the largest open source inference model of all time, and Alibaba and Moonshot AI have also joined the inference model competition.
The Chinese research team’s reinforcement learning-based approach greatly improves inference performance and efficiency, and is expected to surpass o1. However, there are also questions about whether RL-based inference technology can be applied to all tasks. In particular, it is effective in problems with clear answers, such as mathematics, but it is pointed out that more performance verification is needed in tasks without clear answers, such as creative writing or sentiment analysis.
It remains to be seen what changes China’s technological advancements will bring to the global AI competition.
© The Korean Today – All Rights Reserved