
Now loading...
In a significant leap for artificial intelligence capabilities, ByteDance’s latest model, Doubao, has demonstrated superior performance across a wide array of benchmarks, positioning it as a frontrunner among top-tier language models. The company released detailed evaluation results showing Doubao outperforming competitors like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro in categories ranging from scientific reasoning to multimodal understanding and agent-based tasks.
The benchmarks reveal Doubao’s strengths in core academic areas. For instance, in mathematics challenges such as AIME 2025 and HMMT contests, it achieved perfect or near-perfect scores of 99 percent and 100 percent, respectively, surpassing rivals that hovered around 87 to 95 percent. Similarly, in coding evaluations like LiveCodeBench, Doubao scored 90.7 percent, edging out others at 84.7 to 87.8 percent. These results highlight its prowess in solving complex problems without external tools, a critical measure for real-world applications.
Beyond text-based tasks, Doubao excels in multimodal domains. On MathVista, a test for visual mathematical reasoning, it reached 89.8 percent accuracy, matching or exceeding leading models. In video analysis benchmarks like VideoMMMU, it scored 88.1 percent, demonstrating robust comprehension of dynamic content. Even in challenging areas like long-context processing on MRCR v2, Doubao hit 89.4 percent, far ahead of the 50 to 79 percent range of competitors, indicating improved handling of extensive inputs.
Agent and tool-use evaluations further underscore its versatility. Doubao led in SWE-Bench Verified with 80.9 percent, a key metric for software engineering tasks, and in search agent tests like BrowseComp at 77.9 percent. In science discovery benchmarks such as Superchem, it attained 63.2 percent, outperforming text-only scores from other models. These gains suggest potential advancements in practical AI assistants for coding, research, and everyday problem-solving.
While no model is flawless—areas like certain visual puzzles and real-world tasks show room for improvement—these results mark a milestone for ByteDance, challenging the dominance of Western AI giants. As the AI landscape evolves rapidly, Doubao’s broad capabilities could accelerate innovations in education, healthcare, and beyond, though ongoing ethical considerations remain essential.
