We’ve all heard stories about AI writing reports or sifting through code faster than any human. But how do we actually measure how complex a job AI can handle? A new study from the Model Evaluation & Threat Research (METR) team took a different approach: they looked at what kinds of tasks real human experts do, timed how long those tasks typically take, and then checked if AI could do them too.
Why Lengthy Tasks Matter
Whether you’re fixing a bug in software, researching a legal case, or editing a series of videos, some jobs take many steps. They might require logical thinking, problem-solving, and even some creativity. Until recently, AI was only really good at shorter, bite-sized tasks—like summarizing a paragraph or suggesting a quick fix in code. But this study found that AI is getting better at tackling longer, more detailed work.
What the Researchers Did
- Gathered Real Tasks: They collected hundreds of actual tasks from fields like coding, data analysis, and cybersecurity. Each was something a skilled human typically handles—sometimes in minutes, sometimes in hours.
- Measured Human Times: They hired people with the right expertise to do each task. The humans timed themselves (e.g., “It took 12 minutes to fix this script”), so each job had a known “normal” duration.
- Tested AIs Over the Years: They tested both older AI models (like GPT-2 from 2019) and newer ones (like Claude or GPT-4 from the 2024–2025 era) to see if the AI could complete the same tasks—and how often it succeeded.
The Findings, in Simple Terms
Over the last six years, the “hardest task” an AI can reliably finish has been growing at a surprisingly fast rate. In 2019, AI might manage small chores—roughly the complexity of writing a short email. Today’s AI can tackle bigger challenges—like diagnosing a subtle software bug or writing a more involved script.
- Doubling of Capability: The complexity of tasks that AI can handle (based on real human work) has roughly doubled every seven months or so.
- Messy Work Is Still Tricky: If a project is really disorganized—like lacking instructions or involving multiple people talking at once—AI still struggles a lot more. Even so, the study found improvements there as well.
- One Big Caveat: Just because AI can do a task doesn’t mean it’ll do it perfectly every time. Sometimes it stalls or repeats mistakes, and certain tasks involving creativity or uncertain information remain tough.
What This Means for You
- Bigger Projects Could Use AI Soon: The tasks that AI can handle are no longer just short, single-step fixes. If progress keeps up, more multi-hour or even multi-day tasks might be within AI’s reach.
- Human Expertise Still Matters: While the study shows AI can get further than before, there’s a difference between “finishing a checklist item” and “making great strategic decisions.” Humans often bring judgment, context, or ethical considerations that AI can’t replicate.
- Watch for Rapid Changes: Because improvements are happening quickly, jobs or chores that felt “too big for AI” a year ago might be AI-friendly sooner than you think.
In Summary
The research is exciting because it looks at real, complicated tasks, not just quiz questions. Over time, AI is moving from simple “copy-paste” roles to more involved, multi-step work. The next few years could be a tipping point, where tasks that used to take you hours or days might be partially automated—freeing you up for the bigger-picture stuff that truly needs a human mind (and heart!).
Bottom line: If you’re an everyday user or a business leader, keep an eye on how these AI tools evolve. They’re getting better at “long-haul” tasks, but they’re not miracle workers just yet. As always, the best solutions often come from a mix of human expertise and AI speed.
Source: **https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/**