Is Correlated Uncertainty Leading To AI POC Failures?
You're doing it wrong, and a study on radiologists shows why.
The widespread failure of AI proof-of-concepts in corporate environments isn't primarily a technology problem—it's a human problem. While organizations rush to implement AI solutions, expecting transformative results, they consistently overlook a critical factor: humans systematically misuse AI predictions, particularly when uncertainty is involved. Recent research in medical AI reveals patterns that explain why so many corporate AI initiatives fail to deliver on their promises.
A study by researchers from MIT and Harvard examined AI-human collaboration in radiology and uncovered a troubling phenomenon. The researchers found that "AI and radiologist predictions are somewhat correlated, but radiologists don't take this correlation into account." This concept of correlated uncertainty represents a fundamental blind spot in how humans interact with AI systems. When both human judgment and AI predictions share similar biases or uncertainties, users fail to recognize this overlap, leading to compounded errors rather than improved decision-making.
The implications extend far beyond radiology. I believe this same pattern manifests itself in corporate settings and is a major reason why POCs are failing at such high rates. For example, when business analysts use AI for demand forecasting, when HR departments employ AI for candidate screening, or when financial teams leverage AI for risk assessment. Teams assume AI provides independent validation of their thinking, not recognizing that the AI may have learned from similar data patterns that shape their own intuitions. This false sense of independent confirmation creates a dangerous echo chamber effect.
The research revealed another critical insight about confidence levels. While "confident AI predictions improve radiologists' accuracy," the study found that "uncertain AI reduces it." This finding challenges the common corporate practice of implementing AI as a universal decision support tool. Most organizations deploy AI solutions with the expectation that they will uniformly enhance human decision-making across all scenarios. However, when AI systems express uncertainty—precisely when human judgment might be most valuable—they actually degrade human performance.
This degradation occurs through a psychological mechanism that corporate AI implementations rarely account for. When presented with uncertain AI predictions, humans don't simply ignore them; they actively incorporate this uncertainty in counterproductive ways. They second-guess their initial assessments, spend excessive time deliberating (as the study noted, "radiologists take longer to make decisions when they receive AI assistance"), and often arrive at worse conclusions than they would have reached independently.
Proof of Concept projects are failing because they ignore the complex cognitive dynamics at play when humans interact with AI. Employees don't naturally understand when to trust AI predictions, how to weight them against their own judgment, or how to recognize when AI uncertainty signals a need for alternative approaches.
The conclusion from the radiologist research paper on how to fix this is one that would be tough for companies to implement: "depending on the confidence of the AI prediction, cases should be assigned either to AI or to radiologists, because uncertain AI predictions lead radiologists astray." This suggests that the optimal deployment of AI isn't universal augmentation but selective automation—using AI independently when it's confident and relying on human judgment alone when AI is uncertain.
For corporations, this means reconceptualizing AI deployment entirely. Rather than viewing AI as a universal enhancement tool, organizations need to develop sophisticated routing systems that direct decisions to either AI or humans based on confidence thresholds. This requires not just technical infrastructure but also cultural change—employees must accept that sometimes they shouldn't collaborate with AI at all.
Failures are happening because organizations haven't grappled with the fundamental challenge of human-AI interaction. The path forward requires acknowledging an uncomfortable truth: sometimes the best human-AI collaboration is no collaboration at all.
Thanks for reading.

A similar long-standing and difficult dynamic exists between policymakers (the humans in this context) and intelligence analysts (representing the AI as a somewhat inscrutable monolith):
When presented with low confidence analysis often based on significant collection gaps, the resulting policymaker uncertainty can manifest in counterproductive actions like second-guessing and excessive deliberation. Few policymakers have the experience to understand when to (not) trust intelligence analysis, how to weigh it against their own judgment, or how to recognize when a lack of good options signals a need for alternative approaches.
I used to summarize this in my standard conference talk as “people trust anything in a fixed width font.”
I spent 2015-2020 or so trying to sell enterprise customers on the value of uncertainty estimates with limited success. This is a bit mental change for most folks. It probably needs to happen use case by use case.