In 2026, productivity matters, time matters, and for a lot of people, the pressure to move faster to avoid the next round of layoffs is very real. They are trying to keep up. They are trying to stay useful. They are trying not to get buried.
So when a workflow gets faster because AI can read the task, pull the context, review the doc, and help move the work along, that is a real gain. It should be treated like one.
The problem is that this can also create an accidental shadow IT, in the name of productivity. Nobody wakes up trying to make attribution fuzzy or governance harder. They are just trying to get the work done better. But once the system starts acting across tools and systems, the record of who actually did the work can get a lot murkier than the reality.
The value is real. The attribution can still be off.
Take a simple workflow like working on a draft across Asana and Google Docs. One prompt can read the task brief, pull the research notes, inspect the doc, suggest revisions, tighten a section, and write updates back into both systems. That is real value. It cuts drag. It cuts tab-hopping. It cuts the dumbest kind of context switching. If you care about throughput at all, that should not be hard to admit.
The problem is that the record of what happened can get a lot fuzzier than the reality.
Because now the workflow has gotten faster, but the systems of record may mostly tell the story as if one very active, very productive human did everything manually. If you looked at the logs cold, a lot of it could read like a really productive Matt day.
That is not fully wrong. It is also not fully right.
Why the record gets fuzzy.
This is where most writing on the topic has been recently; it either goes full utopian and pretends the machine is just helping cleanly and harmlessly, or it goes full panic mode and treats tool use like the beginning of the robot uprising. Both are lazy.
The real version is messier than that.
The workflow feels like help right up until you follow the path. Then you realize it is not just help. It is action. And once it is action, the identity and logging questions stop being optional.
That is the point where a productivity workflow becomes a governance workflow.
Not because the machine helped. It should help. The problem is that not every tool call is the same, and once the system starts changing state across tools, the identity, scope, approval, and logging model has to keep up with that reality, or the record starts telling a cleaner story than what actually happened.
What does that mean technically?
Not every action path is the same. Reading a document is not the same as editing it. Drafting text is not the same as posting it. Creating a task is not the same as closing one. Updating a field is not the same as approving access. Retrieval is not deletion. A call that stays inside a narrow workflow is not the same thing as one that can change state in a real system and leave consequences behind.
That distinction matters because the fuzzy audit story does not come from “AI” in the abstract. It comes from the system invoking tools, crossing boundaries, acting under some identity, and leaving behind an incomplete account of how that happened.
That is why the underlying question is not “can the machine use tools?” The real question is what it was allowed to do, under what identity, with what scope, and how honestly that got recorded afterward.
The second that a tool call can do something real, you crossed into delegated authority. From there, the question is not whether it matters. The question is how much.
This is where the implementation starts lying by omission.
A connector exists. The integration works. And that is usually where the fuzziness starts.
Because the connector is not the point. The point is what the connector is now allowed to do.
That is a different question, and it is the one people love to just pish-posh and ignore.
They say the app is already approved. They say the workflow owner knows what they are doing. They say the assistant is only helping. They say it is just using the same permissions the user already had. They say they can always turn it off later.
All of that sounds reassuring right up until you actually follow the path. That is the point where a productivity workflow becomes a governance workflow, and now the details matter. Retrieval path. Drafting path. Update path. Approval path. Delete path. Underlying identity. Delegated scope. Environment boundary. Who initiated the action? What system executed the action? What actually got recorded afterward?
That is why the Replit database deletion incident is useful. Not because it is some cinematic AI cautionary tale. The opposite, really. It is useful because it is operationally boring. The system had authority it never should have had. The environment boundaries were weaker than people thought. The damage followed from the permissions in front of it. The Register covered it too, and it is also logged in the AI Incident Database. Different wrappers. Same lesson.
The system did exactly what it was allowed to do.
That is why OWASP’s 2025 LLM Top 10 is useful. “Excessive Agency” is a better frame than a lot of softer AI language because it pulls the discussion back toward authority, scope, and action instead of vibes.
And it is also why the OWASP Top 10 for Agentic AI Applications matters. It treats tool misuse and identity abuse as real categories, not edge cases that somebody can ignore until the quarter closes.
The MCP specification is explicit about this too. Tools expose real actions outside the model itself, and the spec requires explicit user consent before invocation. The protocol names the risk. The implementation is still on the operator.
The issue is not that the machine can “use tools.” The issue is whether anyone can say clearly what those tools are allowed to do, under what identity, and how the resulting actions are supposed to be attributed later.
Approved tools get more reach. The review boundary usually does not.
This part matters because a lot of the time, the problem does not start with some flashy new agent platform.
It starts with an already-approved tool.
The app is already in the environment. Procurement signed off. Security reviewed it. People trust it. Then the product gets better. New retrieval path. New memory. New connector. New embedded action. New way to stop just informing and start doing.
Nothing about that is inherently bad. Modern software should get more useful.
The problem is when the capability surface changes and everyone keeps acting like the old approval still covers the new reality.
That is where things drift.
Not because nobody reviewed the app. Because everybody reviewed the app once, then stopped asking what changed when the workflow got more capable.
That is part of why I keep pulling on the ownership angle too. If bounded execution is one half of the problem, ownership and re-review are the other half. I wrote about that in Approved Tool, Expanding Agent. The short version is that approved does not mean permanently safe in whatever shape the platform happens to be in now.
That is the same lesson from a different side.
So what actually matters?
Once you stop flattening the problem, the controls start making more sense.
Allowlists matter because not every callable function belongs in scope just because the platform supports it. Scopes matter because “it can use the tool” is not precise enough to be useful.
This is exactly the kind of thing I was writing about in Practical RAG Retrieval Authorization Patterns. Different surface, same principle. The path matters. The scope matters. The quiet defaults matter.
Distinct identities matter because the difference between initiated by and executed by should survive logging. If the system is acting through broad delegated user context, or through some over-privileged service identity, then the model is not the only thing you need to worry about. You need to know who initiated the work, what system executed it, under whose authority it landed, and whether the logs preserve that chain truthfully.
Approval boundaries matter because some actions should stay bounded, visible, and reviewable. Ownership and re-review matter because approved tools do not stay still. They gain more reach over time.
These are much harder problems than “did the AI help?”
Break-glass matters too, but this is another place where people get fuzzy. There is a difference between controlled exception handling and leaving a dangerous path lying around because somebody might need it someday. Temporary access is one thing. Standing privilege because vibes is another. If you need an emergency route, make it explicit, narrow, time-bound, and reviewable.
Logging matters because this is where the whole thing either holds or falls apart. If you cannot reconstruct what happened later, the rest of the governance story starts looking decorative. NIST’s draft Control Overlays for Securing AI Systems land in the same place good operators usually do: task-scoped access, human approval gates for high-impact actions, and enough telemetry to reconstruct what the system actually did afterward. If you want the framework version of the same argument, that is the document to read.
That is not anti-tooling. It is anti-fuzziness.
What this gives people
A lot of people already know these workflows are useful. They do not need to be sold on that. What they often do not yet have is clean language for where the discomfort starts.
This gives them some.
The workflow is better. The productivity gain is real. The system is worth wanting. But the identity and audit story can lag behind the usefulness curve.
That is a real problem, and it is not well served by generic “AI governance matters” filler.
People need language for:
- Delegated authority
- Initiated by versus executed by
- Approved tools gaining new reach
- Logs that flatten the actor chain
- Workflows that are genuinely better but not cleanly attributable
- Systems of record that tell a simpler story than what actually happened
What I would actually do
I would start by inventorying every place the system can call a tool or act through delegated authority.
Then I would classify the action paths honestly:
- Retrieve
- Draft
- Create
- Update
- Approve
- Delete
- Trigger
Then I would look at the identity behind each one.
What identities are new? Were they added quietly because a trusted platform got more capable, and nobody really re-reviewed the implications?
Which calls happen through the user’s context? Or is this happening through a distinct system or agent identity?
Which identities are permissioned narrowly? Which are too broad?
Then I would look at the logs and ask a very simple question:
If somebody came back to this six months from now, would the record tell the truth about how the work happened?
Not the cleaned-up story. Not the comforting story. The truth.
Because that is the actual standard here.
Not “did the workflow move faster?” It probably did.
Not “did the machine help?” It probably did.
Did the system of record stay honest about what happened?
That is the real test.
The point
Tool calling is where AI starts doing useful work.
That should be encouraged.
The goal is not to scare people away from better workflows. The goal is to make sure better workflows do not quietly outrun the identity, approval, and logging models that are supposed to keep the story straight.
Because that is the real risk here.
Not that the machine helped.
That the machine helped, acted, changed things in multiple places, and then left behind a record that made the whole chain look simpler and more human than it really was.
That is where the truth starts slipping.
And once the truth starts slipping, the controls are no longer abstract governance ideas. They are just the things you need if you want the workflow to stay fast, useful, and honest at the same time.
Keep pulling on the thread
If this resonated, there are two directions worth following.
The first is ownership. Bounded execution only holds if someone is actually accountable for what the system is allowed to do, and that accountability has to survive org chart drift, platform updates, and the quiet accumulation of reach that happens when a trusted tool gets more capable over time. Approved Tool, Expanding Agent is about that.
The second is retrieval. The action path problem and the data access problem are the same problem from different angles. If the system can retrieve more than it should, the scope question applies there too, not just to writes and updates. Practical RAG Retrieval Authorization Patterns gets into what that actually looks like in practice.
And if you are thinking about where to start this week, Inventory. Every place the system can call a tool or act through delegated authority is a place where the record either stays honest or starts slipping. If you know what you have, you can make a plan.

Leave a comment