Skip to main content
Back to News
Perplexity AI launched DRACO, an open-source benchmark evaluating research agents via 100 tasks from...
Technology
1 min read
US

Perplexity AI launched DRACO, an open-source benchmark evaluating research agents via 100 tasks from...

The AMW Read

Perplexity (a key player in research/search) is updating the agentic evaluation landscape with a production-grounded benchmark, but this is an incremental tool release rather than a structural shift.
NoveltySignificance
AI Agents · Player Map

Perplexity AI launched DRACO, an open-source benchmark evaluating research agents via 100 tasks from real user queries. Spanning 10 domains, Perplexity leads with 89.4 percent accuracy in Law and 82.4 percent in Academic research. Shifting from synthetic puzzles to production-grounded data creates a rigorous standard for multi-step reasoning. This systemic evolution forces the AI industry to prioritize factual depth over conversational fluency. 🚀

#AIResearch #DRACO #PerplexityAI #LLM #Technology

How This Connects

Based on AI Agents · Player Map

  1. 2w agoUniPat AI releases SaaS-Bench, Claude Opus 4.7 passes only 3.8% of 106 real-office tasks, breaking the illusion of full office automation.
  2. 1mo agoAnthropic is shifting focus to compete with OpenAI and Microsoft over the agent control plane, the o...Anthropic
  3. 1mo agoAdobe launches Adobe CX Enterprise, an agentic AI system for customer experienceAdobe
  4. 1mo agoAnthropic Launches 10 Financial Services Agents, Sending FactSet Shares Down 8%Anthropic
  5. 1mo agoAlibaba's Metis agent slashes redundant AI tool calls from 98% to 2%, boosting accuracyAlibaba
  6. 4mo agoPerplexity AI launched DRACO, an open-source benchmark evaluating research agents via 100 tasks from... · THIS ARTICLE

Related News

More news from Perplexity

Stay updated with the latest news and announcements from Perplexity.

View all Perplexity news

Discover AI Startups

Explore 2,000+ AI companies with VC-grade analysis, funding data, and investment insights.

Explore Dashboard