Agentic Operations: The Missing Discipline in Data Science

Written by

Rulan Hersono

When Jesper's Agent Wrote the Presentation

There is something fitting about a talk on agentic operations that was — partly — built by an agent. At this year's Data Innovation Summit in Stockholm, Zimply's CTO Jesper Fredriksson toAgentic Operations: The Missing Discipline in Data Scienceok the stage at the Databases & Data Quality Stage [M9] with a session titled "Agentic Operations: The Missing Discipline in Data Science?" The session had been on the schedule for less than two weeks. Jesper had used his own AI agent to gather data and assemble the slides. He admitted — with a grin — that he accidentally uploaded the agent's version rather than his final edited one.

It was the right accident for the right talk.

The Exponential Is Already Here

Jesper opened with the METR benchmark — a graph that tracks how long a task an AI model can complete with 50% probability, plotted against the model's release date. When GPT-4 launched, that number was a matter of minutes. As of early 2026, Claude Opus 4.6 clocks in at approximately 12 hours of sustained human-equivalent work.

The implication: we are likely to see AI agents capable of tackling tasks that would take a human 100 hours, within this calendar year. That is not a distant forecast — it is the trajectory we are already on.

But AI capability is only one side of the story.

Agents Are Already Running in Your Organisation — Whether You Know It or Not

The second data point Jesper presented came from CrowdStrike's RSA 2026 analysis of Falcon endpoint telemetry. Their data shows 1,800 distinct AI applications running on enterprise endpoints, with 160 million unique application instances across their customer base.

Source: CrowdStrike, RSA 2026 — Falcon endpoint telemetry

Agents are not coming. Instead, they are already running inside most organisations. Many of them are unsanctioned — shadow AI deployed by individuals and teams without organisational oversight.

And yet, according to a March 2026 report from DigitalApplied:

78% of organisations have an agent pilot
Only 14% are running agents in production

Source: DigitalApplied, AI Agent Scaling Gap (March 2026)

The gap between pilot and production is enormous. The organisations that have successfully bridged it share one structural practice: they created a dedicated AI operations function — separate from both IT and the business unit.

Agentic Operations in Production: Zimply's Experience

Zimply has been automating business processes since 2018. Today, the company operates roughly 100 AI assistants in production across clients in accounting, order management, and other domains. Deploying agents is one thing. Keeping them working reliably — week after week, as the underlying models, tools, and data change — is something else entirely.

That operational reality is what the rest of the talk was built around.

Agentic Operations: Why Coding Was the Easy Part

Jesper referenced a statement from Boris Cherny, Head of Claude Code at Anthropic, in February 2026: "Coding is largely solved."

Whether or not one agrees completely with that framing, the point Jesper was making is accurate and important. The hard parts of agentic AI are not the code generation. They are:

Governance — what is the agent permitted to do?
Trust — how do you know it is doing what it should?
Access control — who and what can the agent reach?
Reliability under change — what happens when a component is updated?

These are operational and organisational challenges, not engineering ones in the traditional sense.

Why Operating an Agent Is Fundamentally Different from Operating a Model

This is the conceptual heart of the talk. The distinction Jesper drew is worth dwelling on.

In classical MLOps, you are managing a model. Inputs and outputs are relatively well-defined. Drift means a measurable distribution shift on a metric. Failure looks like accuracy dropping below an SLA threshold. You train, validate, deploy, and monitor.

With agents, you are managing a system — one that includes the model, its memory structures, the tools it can call, prompt scaffolding, other agents it may coordinate with, and the environments it acts in. Failure modes are different in kind:

LLM upgrades: A new minor model version ships and your carefully constructed prompts subtly break. There is no regression test to catch it.

Prompt drift: Edits accumulate over months. Nobody remembers why a clause was added. Nobody wants to remove it. The prompt becomes archaeology.

Changing components: A tool, retriever, or downstream API changes and the agent's reasoning path shifts silently with it.

Silent quality decay: Outputs get worse before anyone notices. There is no red bar on a dashboard. There is just a customer complaint.

Agentic Operations and Emergent Failures

Jesper cited IBM Research's 2025 paper "Agentic AI Needs a Systems Theory" for the framing of emergent failures — defined as failures that arise not from any single component, but from the complex interactions between agents, tools, environments, and humans.

Source: IBM Research, Agentic AI Needs a Systems Theory (2025)

This is the frontier of what makes agentic operations genuinely difficult. Unit tests on the model do not catch emergent failures. You need observability over the entire trajectory — every tool call, every reasoning step, every branch the agent took to arrive at its action.

Trust in Agentic Operations

One of the most quotable moments of the talk was Jesper's observation about trust in operational contexts. It was not phrased as a data point. It was phrased as hard-won experience.

Trust is earned slowly: months of clean trajectories, predictable cost, predictable latency, no surprises in the audit log.

Trust is lost instantly: one agent that fires off the wrong refund, one that leaks data into the wrong tenant, one unexplained loop that runs for six hours. When that happens, the system goes back to manual approval that quarter.

This is the operational reality at Zimply, and it shapes everything about how they approach deployment.

What Is Coming Next for Agentic Operations

Jesper outlined three near-term trends from the operator's seat:

The buyer side wakes up. Procurement will start asking: show me the evals, show me the work. Governance readiness will become a commercial requirement.
Agents consuming agents. As multi-agent architectures become more common, the blast radius of any single failure expands. Failures propagate through chains.
LLM deprecations as everyday work. The question will increasingly be: which model from vendor X is actually stable right now? Model lifecycle management becomes an operational discipline.

Where This Is Going: The Agentic Operating Model

Jesper cited the MIT Sloan Management Review — specifically a piece by Davenport and Bean, Five Trends in AI and Data Science for 2026 — for the prediction that within five years, AI agents will handle most transactions in large-scale business processes.

Source: Davenport & Bean, Five Trends in AI and Data Science for 2026, MIT Sloan Management Review

That future requires a new operating model. The Berkeley California Management Review introduced a framework for what they call the Agentic Operating Model, built around four interdependent layers:

Cognitive specialisation
Coordination architecture
Real-time control
Organisational governance

Source: Berkeley CMR, Governing the Agentic Enterprise

The concept of "human in the loop" is giving way to "human on the loop" — where humans design the governance structures and protocols, then step back from moment-to-moment validation.

Why Agentic Operations Requires Governance

A Grant Thornton 2026 AI Impact Survey of approximately 1,000 senior leaders found that 78% of C-level executives lack strong confidence they could pass an independent AI governance audit within 90 days.

Source: Grant Thornton, 2026 AI Impact Survey (n ≈ 1,000 senior leaders)

That number is striking. The majority of organisations deploying agents cannot fully account for what those agents are doing.

MLOps → AgentOps: Why Data Scientists Should Own This

Jesper's central argument is that the discipline of agentic operations — "AgentOps" — is the natural successor to MLOps, and that data scientists are the right people to own it.

The instincts are the same: measurement, reproducibility, regression testing. What changes is the surface area.

	MLOps	AgentOps
Unit of work	One model, scored	A system, behaving
Workflow	Train · validate · deploy · monitor	Trajectory · trace · rollback · re-eval
I/O definition	Well-defined inputs and outputs	Spans tools, memory, other agents
Drift definition	Distribution shift on a metric	Subtle change in how the agent reasons
Failure modes	Accuracy below SLA	Looping, deception, leakage, silent decay

Engineering will not take this on alone. Compliance cannot. Somebody has to. The argument is that data scientists — with their native fluency in evaluation, monitoring, and systematic thinking about model behaviour — are the right seat at the table.

Five Agentic Operations Disciplines to Build

Jesper closed with a practical framework — five operational disciplines that turn a pilot into a production system you can stand behind.

Trajectory Observability
Capture the full path the agent took — not just the final response. Every tool call, every memory access, every intermediate decision. If an agent ends up with an action you did not expect, you need to be able to step back and understand why. A PR reviewer agent, for instance, should not just review the code — it should review the trace of how the coding agent arrived at that code.
System Versioning
Pin the model, the prompt, the tool schemas, and the scaffolding as a single versioned unit. Roll back the system, not just the model. Jesper described maintaining his agent's memory structures in a Git repository — so he can always return to the memory state that existed when a specific action was taken.
Action Governance
Define what the agent is allowed to do. Implement approval gates, blast-radius limits, dry-run modes, and audit trails. Start with more human control and gradually expand autonomy as confidence is established — not the other way around.
Evals Over Trajectories
Score the path, not just the answer. Did the agent use the right tool? Did it stay in scope? Did it avoid leakage? Evaluating final outputs alone is insufficient when the process matters as much as the result. This is also where the world-state problem becomes acute: if the agent took an irreversible action in the real world, you cannot always reconstruct the conditions under which it acted.
Escalation by Design
Know when to hand off. Build in confidence thresholds, ambiguity detection, and clear human-in-the-loop pathways. Escalation should not be a failure state — it should be a designed feature of any production agent system.

Closing Thoughts

Agentic AI is not a future problem. It is a present operational challenge that most organisations are not yet structured to address. The pilot-to-production gap is real, and it will not close on its own.

The organisations that will succeed are the ones that treat agentic operations as a discipline — not an afterthought. That means building the harness around the model: the observability, the versioning, the governance, the evals, and the escalation pathways.

At Zimply, this is the work we do every week. We deploy agents and we operate them — in production, for real clients, where the cost of failure is real.

If you are thinking about compliance, governance, and agent behaviour at scale, this is the conversation worth having.

This article is based on a talk by Jesper Fredriksson, CTO at Zimply, presented at Data Innovation Summit 2026, Stockholm, May 7. He previously served as AI Engineer Lead at Volvo Cars, with five years in automotive data science and a decade in medical imaging research. He has been a returning Data Innovation Summit speaker since 2016.

Do you want to learn how Zimply can optimize your business?

Book demo

See more cases

Zimply get rid of the boring stuff

Increase productivity

Less errors

Reduce costs

Save time

Learn what AI can do for your company

Book a meeting, digital or physical and we will tell you more

Book Demo