Why AI Projects Fail

Neil Mix

Tech Innovator & Startup Advisor

The Woodland Studio

←

Mostly accurate isn't the same as mostly useful.

Here's a classic example of AI adoption that doesn't go well.

A project is identified to use a machine learning to answer a difficult question.
Early progress demonstrates that the AI is capable of answering with a high degree of correctness, approaching human accuracy.
Plans begin to bring the project to production.
However, after some time, the project is deemed unworkable and cancelled.

What went wrong? The project was assumed to work from the start even though it's using a technology that is fundamentally non-deterministic.

It's seductively easy with AI to forget the fundamental: start with your outcome in mind. It's easy to build an AI prototype that shows it is capable of doing something correct and human-like much of the time. But that's a long way from achieving an actual business outcome.

Start with: are you trying to replace the human or augment the human? Both are hard! But for very different reasons. Replacing the human is statistically difficult but easy to automate. Augmenting the human is statistically easy but challenging to automate.

The easiest way to doom a machine learning project before it's even started is to assume you can replace the human. It takes a ton of work to build AI that doesn't require human oversight, and you won't even know if it's possible until the end of the project. Few AI projects get to this level of accuracy, but all of them require substantial upfront investment with an uncertain outcome.

If you eventually discover that your pet project can't quite get accurate enough to replace a human, you have to redesign the project from scratch to place the human in the loop. This substantially changes the financial motivation for the project. Now, rather than benefiting from the reduction of an entire human salary, you have to make that human efficient enough to pay for the cost of the AI.

We humans have a tendency to be unaware of our assumption that because AI gives very human-like answers, it will also be human-like in its failure. As any AI chat user knows, nothing could be further from the truth. When AI gets things wrong, it has a tendency to get them way wrong, so far wrong that most any working human would never come to the same conclusion. This is why AI can be so maddening. The failure mode can be severe and non-human, even though the success mode feels so human.

And it turns out this is really important! It's not enough to just say the AI can get things right 95% of the time and claim success. You also have to consider how severely wrong it can be when yielding incorrect answers. 95% rivet success isn't helpful if one of those failures causes the building to collapse! So even when AI achieves human-like success rates, projects fail because fear of liability arises due to unpredictable and severe failure cases.

We humans also have a tendency to forget that errors stack up rapidly across a series of chained decisions. For example, after five consecutive decisions, a 1 in 20 failure rate becomes a 1 in 5 failure rate. Even worse, a 4-in-5 success rate becomes a coin flip after a third chained decision.

I've seen a machine learning project - along with an entire industry trying to replicate that same project - fail to adopt AI for all of these reasons. (I apologize for the vagueness here but I'm obligated not to disclose details about this.) The industry in question showed tremendous interest in adopting AI for a specific use-case automating the work of very high-cost employees. But the AI just couldn't get good enough to render judgement without human review. And then, when the industry pivoted into augmenting their costly human workflows to find efficiency savings, they found that the human time savings simply weren't enough to pay for the AI. Dead end, after years and years of effort and untold millions spent.

To work through these challenges and find use-cases that work, I follow a process that assumes AI failure from the start. Start small and keep things cheap, always prepared to walk away if it's just not working out. Get as close to end users as quickly as possible to get direct feedback and mitigate workflow integration challenges. From there it's a process of iteration, prototyping and rapid revision to build confidence.

AI-assisted coding really helps with speed here. In fact, it flips the script and transforms how projects like this get executed. Old world: projects require several people to coordinate over an extended period of time, so we only get one shot to do it right. New world: a single person can define the concept and code it in weeks not months. We can try, fail fast, and iterate repeatedly at reduced time and cost compared to the old process. (See "The rise of super-humans" section near the end of my first vibe coding essay)

A big challenge is getting people's time to review prototypes and try them out. In the modern business world, everyone is busy all the time, and they're skeptical of anything that might disrupt their flow. It's frustrating to build something that should be helpful only to get a "meh" and uncommitted response. There's a conundrum here - the initiatives are speculative and yet we want to treat them as real projects to make people serious about them.

It turns out that this catch-22 is a feature, not a bug. One thing I learned from my time in VC is that improvement isn't enough to get end-user attention. An idea needs to be transformative to really catch on. So when I find people are having trouble finding time to review prototypes, I take that as a signal we're merely improving rather than transforming.

This happens more than you might think. The distance between executives and front-line is larger than it appears, and something that looks disruptive to executives often turns out to be "meh" on the front lines. That's OK and to be expected - going back to the drawing board isn't so awful when you haven't bet the farm on a project. Not every project matures to production, but by taking a lightweight and fail-fast approach we can mitigate costly "boil the ocean" risk.

About The Woodland Studio

Hi, I'm Neil, a technologist, software engineer, investor, musician, and father. Welcome to my personal reflection space. I'm also an advisor and consultant by day, and I'm available for hire. Please check out my business site if you'd like to learn more.