Agentic AI: From Hidden Tool to Trusted Partner

TL;DR 

After two decades in software engineering, I’ve seen plenty of “next big things.” But agentic AI feels different, not just another tool, but a partner stepping out of the shadows. My blog explores how these systems are changing how we work, from obedient tools to autonomous collaborators. I share my framework for understanding their evolution (the four stages of agentic AI), unpack three domains of risk (data, hallucination, and inference), and argue why “human in the loop” isn’t optional, it’s essential. The future isn’t about replacing people but about amplifying human judgment with machines we can trust. 

Table of Contents

Why I’m Reflecting on the Rise of Agentic AI

I’ve been sharing fragments of these thoughts around the internet in shorter form on LinkedIn over the past few weeks and months, but I thought it was time to put some more considered thinking into a longer piece.

For a bit of context, I’ve been a software engineer for over 20 years. I’ve seen a lot of shiny new tools and frameworks come across my desk, some I loved, many I ignored, but all of them left me with an opinion or two. My centre of gravity has always been productivity. We only get a certain number of tokens to spend each week, so I’m constantly looking for ways to spend them where it counts.

And while people might call me a technologist, I’ve come to realise that I’m actually more of a people-and-process person. I’m most effective when I can ensure we’ve got the right skills and capacity on a project and that we’re investing in understanding the why and reflecting on the how. Delivery, after all, is a team sport. That’s why I focus on helping new teams get started quickly and operate efficiently.

So, what does all this have to do with AI?

 

For a deeper look at agentic AI in the enterprise, see our blog Agentic AI for Your Enterprise: Cutting Through the Hype.

 

From Traditional Tools to Agentic AI: How AI Is Changing Collaboration

For most of my career, I’ve sold the idea that people can do less and the tools can do more. But something feels different this time.

In previous generations, our tools lived in the shadows. They did what we told them to do and were limited by the knowledge of the humans wielding them. That’s where consulting teams had real impact, we knew what the tools could do, how to apply them, and how to multiply their value through experience.

At Cevo, for example, we’ve always focused on productivity through DevOps, removing the pain of manually configured this and inconsistently executed that, and replacing them with consistent, automated pipelines. Those pipelines carried our opinions, our best practices, our lessons learned.

But these new tools like LLMs and agentic frameworks, don’t feel quite like that. They’re no longer in the shadows. They can respond to open-ended questions. They can collaborate. When was the last time you “interacted” with your CI/CD pipeline to determine what the best course of action should be?

That’s the difference. These new tools aren’t just following commands. They’re starting to think with us.

Agentic AI systems are no longer just automating they are collaborating

Balancing Autonomy and Control in Agentic AI Systems 

With this change in capability a whole raft of new questions are being raised and it feels like we’re all circling the same challenge: how do we embrace this agentic AI environment without giving up too much control or letting the robots drive us into a Terminator-style future?

I have started to look at it through a risk-based lens. As we give these tools more autonomy and capability, we let go of control and for many, that control was only partial to begin with.

We hire humans with trust and governance. We set boundaries, we review and we define what’s acceptable. The same principles should apply to AI. Governance isn’t about slowing innovation; it’s about shaping it safely. But this is new and different and not something we’ve had to do for tools before – because before we knew up front what the tools did (or were supposed to do) and we could just govern their use by if we used them or not. Our new Agentic tools are different – their behaviours are less bounded and so our natural risk level manifests in our willingness to adopt and “trust” these tools.

What’s even more challenging for us is how fast these tools are changing. And how rapidly we are composing complex solutions with them. When we move up the complexity curve, from simple assistants to semi-autonomous agents, our governance needs to evolve too.

 

You can explore practical approaches to evaluating AI agents in AI Agent Evaluation Techniques: Step 1 in AI Excellence.

 

Implementing Guardrails for Humans and Connected AI Agents

At the simplest level, our guardrails should exist around people using them: what data can I safely share with a chatbot? These guardrails should be in place today – we likely have data use policies and access controls to ensure only those who should have access to that data do. The subtle extension is we need to deliver training to the users on which modern Gen AI tooling is company approved – in some scenarios going as far to block unauthorised tooling at the network edge – all tools and process we have already today.

As we move to connected agents accessing live systems and APIs, those guardrails change, and we must extend into policy and auditing.

Again, this isn’t new territory. We already have vendor risk frameworks, data protection policies, and third-party assessments. We just need to extend them into this new domain of AI, this “shadow IT” of automation that’s spreading faster than we can approve it. Our assessment frameworks need new Gen AI specific constructs, focused on Bias, Data Security and Data Usage – but extensions of our existing tooling evaluation frameworks.

We are entering an era where AI does not just do what we ask it decides what needs to be done. Agentic AI

The Four Stages of Agentic AI Evolution 

As we work to develop these assessment frameworks, it helps to look at how AI agents are evolving, and as I see it at the moment, there look to be four stages emerging. 

 

Stage 1: Autonomous agents 

These operate in a fixed data set, like a junior who follows instructions well but needs a lot of context upfront. Think ChatGPT and other chat-based solutions. 

 

Stage 2: Connected agents 

Now they have access to external systems, APIs, CRMs, file stores. They can act in the real world, like tools on steroids which introduce new challenges of trust, exposure, and misuse – but can be managed through tooling specific assessments, human validation and clearer evaluation approaches. 

 

Stage 3: Collaborative agents 

This is where we’re starting to play today, multiple agents, each performing specialised tasks, working together toward a shared goal. It’s reminiscent of small command-line utilities piped together to produce complex outcomes.  While the complexity here increases, I still feel that the governance and evaluation still applies at the individual agent level as in Stage 2 – its just that these agents start providing feedback and context to each other – and therefore experienced human oversight becomes even more essential . 

 

Stage 4: Ecosystem agents 

Somewhere in the future (likely near future) I can see clusters of agents collaborating across domains, networks of intelligence that interact, learn, and adapt. This is where small mistakes can cascade and amplify through interconnected systems. It’s both thrilling and terrifying in equal measure. 

 

The more we invest in understanding how each previous generations work and are governed, the more we’ll be in a position to exploit the next one. 

Trust clarity and context will define whether agentic AI becomes a partner or a liability

Understanding Agentic AI Risks 

As I’ve been working with these systems, three main domains of risk keep surfacing. 

1. Platform and Data Risk

When we hand over credentials, data, or system access to third-party tools, how do we really know what they’re doing? Do we trust the vendors? Are they storing logs responsibly? Using our data for training?  

Governance must evolve to answer those questions in real time, not after the fact. 

 

For a guide to navigating AI ethics and responsibilities, see Navigating the Ethical Landscape of AI: Challenges and Responsibilities

 

2. AI Hallucination and False Confidence

Or, as I call it, misplaced conviction. Humans do it too, confidently quoting a “fact” that turns out to be wrong. AI just does it faster and with more authority. The danger isn’t that AI is wrong; it’s that it sounds so right. 

That’s why human review isn’t optional. It’s essential. You can’t delegate to a machine something you don’t understand yourself. If you wouldn’t trust your own expertise to validate an outcome, don’t ask an AI to produce it for you. 

I’m even starting to evolve from talking about “human-in-the-loop” to “expert-in-the-loop” – if you can’t call bullshit on what these tools generate, then you are probably overstepping your use of them. 

 

3. Inference and Contextual Drift

As agents collaborate, they build context, summaries, assumptions, mental models. Over time, these evolve into a kind of personality. And like humans, they can develop bias or drift. 

We already see this in AI coding assistants. They perform brilliantly at the start of a small project but start to stumble as complexity and history build up. It’s not that the tool is “wrong”; it’s just lost the context. 

The answer isn’t to throw the tools away, it’s to mentor them. To design systems where agents collaborate like human teammates do. One writes the code, another tests it, a third reviews security, a fourth checks for cost efficiency. Each has its focus, and together they deliver more than the sum of their parts. 

 

Designing for Trust, Clarity, and Context 

So where does that leave us? 

We’ve explored the four stages of agentic evolution, from single agents to collaborative ecosystems and the three domains of risk: data, hallucination, and inference. 

My message is simple “don’t fear complexity; govern it”. 

We don’t need to slow innovation. We just need to design for trust, clarity, and context. Expert-in-the-loop isn’t optional. It’s essential. 

As I wrap up these musings, I’ll leave you with a few questions that still guide my own thinking: 

  • How will we know when to trust agents and when to challenge them? 
  • What does accountability look like when outcomes come from dozens of interacting AIs? 
  • And most importantly, how do we design systems that make humans not redundant, but more capable? 

 

Because in the end, this isn’t a story about machines replacing people. It’s a story about amplifying human judgment, scaling what makes us good at what we do, while keeping our hands firmly on the wheel. 

The horizon might be blurring, but if we stay attentive, curious, and accountable, we can still see the path ahead. 

Enjoyed this blog?

Share it with your network!

Move faster with confidence