Tuesday TeardownsJun 2, 20263 min read

The Bird Study That Changes How You Should Think About AI Plus People

The best senior advisors have always been in the driver’s seat. The tools were just slower.

Researchers at the MIT Center for Collective Intelligence ran a meta-analysis of more than a hundred experiments on AI plus human performance. The average finding was a disappointment. Most of the time, the combination did not beat either side alone.

One experiment broke the pattern.

Bird image classification. AI alone got 73 percent right. Humans alone got 81 percent right. Humans plus AI got 90 percent right.

That 90 percent is the number everyone wants. Most of the time, nobody gets it.

Why the bird result worked

The combination worked because the humans were already better than the AI at the task. They had judgment the machine did not have. In the experiments where the AI was better than the humans, the combination got worse. The humans either rubber-stamped the algorithm or fought it. Partnership failed because the person in the loop had nothing to contribute that the machine could not already do.

That is the proof most people miss. The combination only wins when the person is already an expert who outperforms the machine. The person is not a backstop. The person is the one deciding when the machine is right.

The problem with “human in the loop”

The industry adopted a phrase for this kind of collaboration. Human in the loop. It has two problems.

The first is “human.” A junior analyst in the loop and a twenty-year operator in the loop are two entirely different systems. The MIT finding is specific: combinations work when the person is already better than the AI. The word that matters is “expert.”

The second is “in the loop.” The phrase is passive. It describes someone watching a process happen. The bird-study humans were not watching. They were deciding which AI suggestions to accept, which to override, and when their own eyes were the better instrument. Making the call is a different job than watching the loop.

In a recent conversation, an advisor described the approach as “really important that we have humans in the loop to validate what the models are saying.” I pushed back. Validation is the passive version. What the MIT data rewards is active. The expert drives. The expert decides when the machine is right, when it is wrong, and when neither answer is good enough.

Expert in the Driver’s Seat

The better phrase is Expert in the Driver’s Seat.

The expert brings judgment the machine does not have. The conviction to say “this number looks right but the story underneath it does not make sense.” The machine brings speed and scale the expert does not have. The ability to process a thousand documents in minutes, to hold more data in working memory than any person can.

The expert stays in control of the call. That is the setup the MIT numbers reward. It is also what good diligence has always been trying to do. The best senior advisors have always been in the driver’s seat. The tools were just slower.

What changes when you get the phrase right

“Human in the loop” leads to tool design where the machine does the work and the person reviews it. That is 73 and 81 separately.

“Expert in the Driver’s Seat” leads to tool design where the expert defines the questions, the machine does the running, and the expert decides what the output means. That is 90.

The phrase shapes the tools. The tools shape the work. The MIT data says one of those approaches is worth 90. The other is worth 73 or 81, depending on who shows up. Get the phrase right.

-Regis

About the Author

Regis Hadiaris is Managing Partner, AI and Product Innovation for The Wisory. He is responsible for IntelliQ, the company’s proprietary platform designed to enhance the quality, speed, and precision of strategic and investment decisions.

Share