The other side of ROI in AI: Managing Mistakes

I’ve had many conversations lately talking about Artificial Intelligence (AI) benefits and calculating the Return On Investment (ROI) for it. But there is one question that has remained constant in the back of my mind: Why are we so laser focused on the benefits and return AI will generate when we are not giving enough attention to the actual cost of making errors?

Nano Banana’s representation of The Other Side of AI ROI: Managing Mistakes

Image of data graphs

I’ve been down many rabbit holes on the intricacies of the models and agents, but for the sake of this article I will be taking things up a level and generalising a lot.

Large Language Models (LLMs) are probabilistic beasts, they work on predicting the next word in the sequence. Next word selection depends on the certainty that the next word is correct. We can tweak all the hyperparameters as much as we want to get creativity (or randomness) but what is the cost of making a wrong selection? Like in the software, errors are inevitable, as a matter-of-fact errors are an integral part of the Machine Learning (ML) solutions. Since we cannot fix all the errors, we are left with the only option of embracing them.

The probabilistic paradox

But before diving into error cost, we should probably try to understand why they are happening. As referenced above, LLMs are probabilistic, and actions have reward functions tied to them and models are trained to maximise the reward. In this process they are also penalised for incorrect decisions. So, they are encouraged, just as students on the multiple option questions to try to guess if they are not certain. This is also part of the reason why hallucinations happen: models or agents are simply guessing.

Personally, I believe it should rather reply with just “Apologies, I don’t know” or “I am not certain that this is the correct answer”. From the psychological safety perspective, humans are more likely to trust systems that admit limitations or uncertainty. And of course, many errors come not from the model itself, but from the way how we design prompts, context, or workflows (another rabbit hole).

I know that there is also an entire conversation behind bias being baked in the training data and systems. I’m intentionally omitting it here to keep this article more focused.

Estimating the error cost

Let’s assume that we are making a prediction or offering an insight to Maro. This Maro from the parallel universe is someone who trusts AI decisions implicitly. Maro needs to make a decision which has a value of £1000 tied to it, and he relies solely on the agent’s recommendation. Maro decides to go with the recommendation and after some time reality hits, Maro realises that decision was wrong. This error cost him £1000, but what are the odds of this repeating? So, he starts to wonder, if he can estimate the possible cost of the error?

As I’ve already mentioned, each decision the model/agent makes has probability or confidence tied to it. Confidence level is usually bounded by the number between zero and one. Let’s say that the confidence of the correct result is 75% which would be 0.75. It looks like we have a formula now for calculating expected error rate and it is following: 1 (perfect confidence) minus 0.75 (actual confidence) times £1000 (decision cost) which equals £250.

(1–0.75) x £1000 = £250

This may or may not seem pricey, but let’s consider a current trend where an agent makes multiple decisions in a chain. Let’s assume this agent chains three decisions and each decision has allocated confidence (for the simplicity 0.75 each). Let’s look at the maths for this one:

(1 — (0.75)³) x £1000 =

(1 — (0.75 x 0.75 x 0.75) x £1000 =

(1–0.421875) x £1000 =

0.578125 x £1000 =

£578.12

This is where the alarm sounds, as our agent (in this scenario) is more likely to make an error than a correct prediction, and it is pricey one.

But this is just the monetary value assigned to the error There are indirect costs tied to errors such as loss of trust, regulatory penalties, reputational damage among others. Not all errors cost the same. To keep this blog more or less digestible, I will explore indirect costs in a later piece.

Can we learn to live with errors?

What can we do to minimise and live with errors? There are many different approaches, and one of them is providing users with multiple options to choose from.

Think about this, when you search a term on Google you are not given one result, you are presented with a ranked list. Google (even before AI became prevalent) realised that the best result might not be the top result, but it is highly likely it will be within the first 10 results (what are the odds of best result not being in the top 10?).

Multiple options don’t happen very often with LLM interactions, but they also sometimes offer you more than one answer to choose from. Under the bonnet (or hood?), it is very likely that they are capturing this output to improve further.

We covered earlier in the piece the importance of transparency over illusion; instead of hallucinating answers, the model should just say “I don’t know” when certainty is low(er). We can redesign the models to output explicit confidence score and set rules around it.

For example:

• confidence >85% → auto answer
• 50-85% → answer with a warning
• <50% → I don’t know response

However, to achieve this level of change there would have to be a shift in design philosophy. Engineers should be encouraged to build models that prioritise correctness over completeness. Or at least audit trails should be very detailed and decisions are understandable to users and laypeople. There is a whole field of AI research called Explainable AI (XAI) which focuses on creating systems with transparent decisions that are interpretable and have verifiable output(s).

The key takeaway here is that humans should always be in the loop and be armed with a dose of scepticism, especially in the high-cost environments.

Conclusion

AI’s value should not be determined solely on what it gets right and how much return it generates, it should also include what it gets wrong and how much those wrong answers cost us. As long as we work with probabilistic systems, error is not a flaw that we need to eliminate, but a reality that we need to manage.

Hopefully, calculating expected error cost could give us a clearer picture of the risk we are taking when relying solely on model/agent decisions. We should also consider designing solutions that acknowledge uncertainty, or offer alternatives backed by the reasoning.

With better transparency and stronger safeguards, we can make AI systems more trustworthy. After all, progress in AI will be measured not just by accuracy and ROI, but also by how we handle errors and inaccuracies gracefully.

Related blogs

Blog
XAI marks the spot: what it is, why it matters and how it’s applied Read more
Blog
Tinker, tailor, tune a model Read more
News
Cafcass Teams up with Version 1 Innovation Labs to Explore RPA Read more