
Image by Freepik
OpenAI’s o3 Achieves Human-Level Intelligence On Key Benchmark Test
- Written by Kiara Fabbri Former Tech News Writer
- Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor
A recent breakthrough in artificial intelligence has brought researchers closer to creating artificial general intelligence (AGI), a long-sought goal in the field.
In a Rush? Here are the Quick Facts!
- OpenAI’s o3 AI scored 85% on the ARC-AGI general intelligence benchmark.
- The score equals average human performance and beats previous AI’s 55% record.
- The ARC-AGI test measures sample efficiency and ability to adapt to new tasks.
OpenAI’s new AI system, known as o3, achieved an 85% score on the ARC-AGI benchmark—a test designed to measure an AI’s ability to adapt to new situations, as reported by The Conversation .
This result surpasses the previous AI best of 55% and matches the average human performance, marking a significant milestone in AI research.The ARC-AGI benchmark evaluates an AI system’s “sample efficiency,” which refers to how well it learns from limited examples, says The Conversation.
Unlike widely used AI models like ChatGPT, which rely on massive datasets to generate outputs, the o3 model demonstrates the ability to generalize and adapt to novel tasks with minimal data. This capability is considered fundamental to achieving human-like intelligence, as reported by The Conversation.
Developed by French AI researcher François Chollet, the ARC-AGI test involves solving grid-based puzzles by identifying patterns.
Traditional LLMs rely on memorizing, fetching, and applying pre-learned “mini-programs” but struggle with fluid intelligence, as evidenced by low scores on the ARC-AGI benchmark. The o3 model introduces a test-time program synthesis mechanism, enabling it to generate and execute new solutions, as detailed by Chollet.
Chollet explains that at its core, o3 performs natural language program search within token space, guided by an evaluator model. When presented with a task, o3 explores possible “chains of thought” (CoTs)—step-by-step solutions described in natural language.
It evaluates these CoTs for fitness, recombining knowledge into coherent programs to address novel challenges effectively. The Conversation notes that OpenAI has not disclosed the exact methods used to develop o3, but researchers speculate the system employs a process akin to Google’s AlphaGo, which defeated the world Go champion in 2016.
However, Chollet notes that the process is computationally intensive. Generating solutions may involve exploring millions of potential paths in the program space, incurring significant costs in time and resources. Unlike systems like AlphaZero, which autonomously acquire abilities through iterative learning, o3 depends on expert-labeled CoT data, limiting its autonomy.
Despite these promising results, significant questions remain. OpenAI has released limited information about o3, sharing details only with select researchers and institutions.
The Conversation notes that it is unclear whether the system’s adaptability stems from fundamentally improved underlying models or from task-specific optimizations during training. Further testing and transparency will be critical to understanding o3’s true potential.
Furthermore, the Chollet highlighs the cost of this intelligence: solving ARC-AGI tasks costs $5 for humans but $17–$20 for o3 in low-compute mode. However, they expect rapid improvements, making o3 competitive with human performance soon.
The achievement reignites debates about the feasibility and implications of AG. For some researchers, the success of o3 makes the prospect of AGI more tangible and urgent. This is particularly crucial given cybersecurity concerns, as AI-generated malware variants increasingly evade detection .
However, others remain cautious, emphasizing that robust evaluations are needed to determine whether o3’s capabilities extend beyond specific benchmarks. As the AI community awaits broader access to o3, the breakthrough signals a transformative moment in the pursuit of intelligent systems capable of reasoning and learning like humans.

Photo by Shawn on Unsplash
Alibaba Nears $4 Billion Deal With South Korea’s E-Mart
- Written by Andrea Miliani Former Tech News Expert
- Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor
The Chinese tech giant Alibaba is negotiating a $4 billion deal with South Korea’s largest retailer, E-Mart. Both companies plan a joint venture to better compete in the region’s online retail sector.
In a Rush? Here are the Quick Facts!
- Anonymous sources said Alibaba and E-Mart are at the final stage of a negotiation to create a new entity worth $4 billion.
- E-Mart and Alibaba would face strong competitors in South Korea such as Coupang Inc. and Naver Corp and a challenging market.
- The companies could make an official announcement soon, after they close the deal.
According to Bloomberg’s exclusive , anonymous sources familiar with the matter have explained that both companies could combine online shopping assets to create a new entity.
The sources have confirmed that the negotiation is at its final stage, and they might make an official announcement by the end of the week.
If the deal closes, Alibaba and E-Mart will face well-established competitors in South Korea like Coupang Inc. and Naver Corp.
They will also have to deal with a challenging market as consumers’ confidence has decreased since the end of the pandemic, and the country has faced political turmoil after the president declared martial law a few days ago.
The new venture aligns with Alibaba’s expansion goals this year. The Chinese giant raised $5 in a multicurrency bond sales deal , one of the biggest in the Asia Pacific. The company has also been developing AI technology, in September it l aunched 100 open-source AI models and a text-to-video AI tool.
E-Mart has been expanding for the past few years through new acquisitions and organically, but its shares have dropped this year—about 7%—, the company is valued at $1.4 billion at the moment. Alibaba’s shares in Hong Kong, on the other hand, rose around 11% and the company is currently valued at above $200 billion.