Zimply logotyp
Zimply logotyp
  • Advisory
    • Kickstart AI Automation
      Not sure where to begin? We’ll help you find the way.
  • Solutions
    • Finance
      Automate invoices, orders, bookkeeping
      • Zimply Accounting AI Assistant
        Experience genuine automated accounting with AI!
      • Zimply Purchase Order Matching AI Assistant
        Experience seamless automated Order Matching with AI!
      • Zimply Invoice Extractor AI agent
        Streamline your invoice management with smart invoice scanning and invoice automation
    • Operations
      Streamline workflows and internal processes
      • Zimply Incoming Order Registration AI Assistant
        Optimize Your Customer Order Management with AI!
    • Sales
      Automate lead qualification and conversational commerce
      • Zimply Conversational Commerce AI Assistant
        AI assistant that engages customers, answers questions, and drives sales directly in chat
      • Zimply Leads AI Assistant
        Our smart AI assistant makes lead generation more precise, automated, and profitable.
    • HR
      Internal knowledge & employee support
      • Zimply Q&A AI Assistant
        Next-Gen HR Support – Zimply smarter
    • IT & Compliance
      Secure and structured automation
    • Customer Service
      Automate support and inquiries
  • Customers
  • About Zimply
  • Resources
    • Comprehensive Guide
      How to Easily Get Started with AI in Your Business
    • News
      The latest news at Zimply.
    • Knowledge Base
      Zimply shares valuable knowledge about AI.
    • FAQ
  • Contact
  • EN
  • Book a Demo
  • Advisory
    • Kickstart AI Automation
  • Solutions
    • Finance
      • Zimply Accounting AI Assistant
      • Zimply Purchase Order Matching AI Assistant
      • Zimply Invoice Extractor AI agent
    • Operations
      • Zimply Incoming Order Registration AI Assistant
    • Sales
      • Zimply Conversational Commerce AI Assistant
      • Zimply Leads AI Assistant
    • HR
      • Zimply Q&A AI Assistant
    • IT & Compliance
    • Customer Service
  • Customers
  • About Zimply
  • Resources
    • Comprehensive Guide: How to Easily Get Started with AI in Your Business
    • News
      The latest news at Zimply.
    • Knowledge base
      Zimply shares valuable knowledge about AI.
    • FAQ
  • Contact Us
  • EN
  • Book Demo
Categories
  • News

12 May, 2026

Agentic Operations: The Missing Discipline in Data Science

IMG_2396
Written by
Rulan Hersono
DIS News Post Cover

When Jesper's Agent Wrote the Presentation

There is something fitting about a talk on agentic operations that was — partly — built by an agent. At this year's Data Innovation Summit in Stockholm, Zimply's CTO Jesper Fredriksson toAgentic Operations: The Missing Discipline in Data Scienceok the stage at the Databases & Data Quality Stage [M9] with a session titled "Agentic Operations: The Missing Discipline in Data Science?" The session had been on the schedule for less than two weeks. Jesper had used his own AI agent to gather data and assemble the slides. He admitted — with a grin — that he accidentally uploaded the agent's version rather than his final edited one.

It was the right accident for the right talk.


The Exponential Is Already Here

Jesper opened with the METR benchmark — a graph that tracks how long a task an AI model can complete with 50% probability, plotted against the model's release date. When GPT-4 launched, that number was a matter of minutes. As of early 2026, Claude Opus 4.6 clocks in at approximately 12 hours of sustained human-equivalent work.

The implication: we are likely to see AI agents capable of tackling tasks that would take a human 100 hours, within this calendar year. That is not a distant forecast — it is the trajectory we are already on.

But AI capability is only one side of the story.


Agents Are Already Running in Your Organisation — Whether You Know It or Not

The second data point Jesper presented came from CrowdStrike's RSA 2026 analysis of Falcon endpoint telemetry. Their data shows 1,800 distinct AI applications running on enterprise endpoints, with 160 million unique application instances across their customer base.

Source: CrowdStrike, RSA 2026 — Falcon endpoint telemetry

Agents are not coming. Instead, they are already running inside most organisations. Many of them are unsanctioned — shadow AI deployed by individuals and teams without organisational oversight.

And yet, according to a March 2026 report from DigitalApplied:

  • 78% of organisations have an agent pilot
  • Only 14% are running agents in production

Source: DigitalApplied, AI Agent Scaling Gap (March 2026)

The gap between pilot and production is enormous. The organisations that have successfully bridged it share one structural practice: they created a dedicated AI operations function — separate from both IT and the business unit.


Agentic Operations in Production: Zimply's Experience

Zimply has been automating business processes since 2018. Today, the company operates roughly 100 AI assistants in production across clients in accounting, order management, and other domains. Deploying agents is one thing. Keeping them working reliably — week after week, as the underlying models, tools, and data change — is something else entirely.

That operational reality is what the rest of the talk was built around.


Agentic Operations: Why Coding Was the Easy Part

Jesper referenced a statement from Boris Cherny, Head of Claude Code at Anthropic, in February 2026: "Coding is largely solved."

Whether or not one agrees completely with that framing, the point Jesper was making is accurate and important. The hard parts of agentic AI are not the code generation. They are:

  • Governance — what is the agent permitted to do?
  • Trust — how do you know it is doing what it should?
  • Access control — who and what can the agent reach?
  • Reliability under change — what happens when a component is updated?

These are operational and organisational challenges, not engineering ones in the traditional sense.


Why Operating an Agent Is Fundamentally Different from Operating a Model

This is the conceptual heart of the talk. The distinction Jesper drew is worth dwelling on.

In classical MLOps, you are managing a model. Inputs and outputs are relatively well-defined. Drift means a measurable distribution shift on a metric. Failure looks like accuracy dropping below an SLA threshold. You train, validate, deploy, and monitor.

With agents, you are managing a system — one that includes the model, its memory structures, the tools it can call, prompt scaffolding, other agents it may coordinate with, and the environments it acts in. Failure modes are different in kind:

LLM upgrades: A new minor model version ships and your carefully constructed prompts subtly break. There is no regression test to catch it.

Prompt drift: Edits accumulate over months. Nobody remembers why a clause was added. Nobody wants to remove it. The prompt becomes archaeology.

Changing components: A tool, retriever, or downstream API changes and the agent's reasoning path shifts silently with it.

Silent quality decay: Outputs get worse before anyone notices. There is no red bar on a dashboard. There is just a customer complaint.


Agentic Operations and Emergent Failures

Jesper cited IBM Research's 2025 paper "Agentic AI Needs a Systems Theory" for the framing of emergent failures — defined as failures that arise not from any single component, but from the complex interactions between agents, tools, environments, and humans.

Source: IBM Research, Agentic AI Needs a Systems Theory (2025)

This is the frontier of what makes agentic operations genuinely difficult. Unit tests on the model do not catch emergent failures. You need observability over the entire trajectory — every tool call, every reasoning step, every branch the agent took to arrive at its action.


Trust in Agentic Operations

One of the most quotable moments of the talk was Jesper's observation about trust in operational contexts. It was not phrased as a data point. It was phrased as hard-won experience.

Trust is earned slowly: months of clean trajectories, predictable cost, predictable latency, no surprises in the audit log.

Trust is lost instantly: one agent that fires off the wrong refund, one that leaks data into the wrong tenant, one unexplained loop that runs for six hours. When that happens, the system goes back to manual approval that quarter.

This is the operational reality at Zimply, and it shapes everything about how they approach deployment.


What Is Coming Next for Agentic Operations

Jesper outlined three near-term trends from the operator's seat:

The buyer side wakes up. Procurement will start asking: show me the evals, show me the work. Governance readiness will become a commercial requirement.
Agents consuming agents. As multi-agent architectures become more common, the blast radius of any single failure expands. Failures propagate through chains.
LLM deprecations as everyday work. The question will increasingly be: which model from vendor X is actually stable right now? Model lifecycle management becomes an operational discipline.


Where This Is Going: The Agentic Operating Model

Jesper cited the MIT Sloan Management Review — specifically a piece by Davenport and Bean, Five Trends in AI and Data Science for 2026 — for the prediction that within five years, AI agents will handle most transactions in large-scale business processes.

Source: Davenport & Bean, Five Trends in AI and Data Science for 2026, MIT Sloan Management Review

That future requires a new operating model. The Berkeley California Management Review introduced a framework for what they call the Agentic Operating Model, built around four interdependent layers:

  • Cognitive specialisation
  • Coordination architecture
  • Real-time control
  • Organisational governance

Source: Berkeley CMR, Governing the Agentic Enterprise

The concept of "human in the loop" is giving way to "human on the loop" — where humans design the governance structures and protocols, then step back from moment-to-moment validation.


Why Agentic Operations Requires Governance

A Grant Thornton 2026 AI Impact Survey of approximately 1,000 senior leaders found that 78% of C-level executives lack strong confidence they could pass an independent AI governance audit within 90 days.

Source: Grant Thornton, 2026 AI Impact Survey (n ≈ 1,000 senior leaders)

That number is striking. The majority of organisations deploying agents cannot fully account for what those agents are doing.


MLOps → AgentOps: Why Data Scientists Should Own This

Jesper's central argument is that the discipline of agentic operations — "AgentOps" — is the natural successor to MLOps, and that data scientists are the right people to own it.

The instincts are the same: measurement, reproducibility, regression testing. What changes is the surface area.

MLOps AgentOps
Unit of work One model, scored A system, behaving
Workflow Train · validate · deploy · monitor Trajectory · trace · rollback · re-eval
I/O definition Well-defined inputs and outputs Spans tools, memory, other agents
Drift definition Distribution shift on a metric Subtle change in how the agent reasons
Failure modes Accuracy below SLA Looping, deception, leakage, silent decay

 

Engineering will not take this on alone. Compliance cannot. Somebody has to. The argument is that data scientists — with their native fluency in evaluation, monitoring, and systematic thinking about model behaviour — are the right seat at the table.


Five Agentic Operations Disciplines to Build

Jesper closed with a practical framework — five operational disciplines that turn a pilot into a production system you can stand behind.

  1. Trajectory Observability
    Capture the full path the agent took — not just the final response. Every tool call, every memory access, every intermediate decision. If an agent ends up with an action you did not expect, you need to be able to step back and understand why. A PR reviewer agent, for instance, should not just review the code — it should review the trace of how the coding agent arrived at that code.
  2. System Versioning
    Pin the model, the prompt, the tool schemas, and the scaffolding as a single versioned unit. Roll back the system, not just the model. Jesper described maintaining his agent's memory structures in a Git repository — so he can always return to the memory state that existed when a specific action was taken.
  3. Action Governance
    Define what the agent is allowed to do. Implement approval gates, blast-radius limits, dry-run modes, and audit trails. Start with more human control and gradually expand autonomy as confidence is established — not the other way around.
  4. Evals Over Trajectories
    Score the path, not just the answer. Did the agent use the right tool? Did it stay in scope? Did it avoid leakage? Evaluating final outputs alone is insufficient when the process matters as much as the result. This is also where the world-state problem becomes acute: if the agent took an irreversible action in the real world, you cannot always reconstruct the conditions under which it acted.
  5. Escalation by Design
    Know when to hand off. Build in confidence thresholds, ambiguity detection, and clear human-in-the-loop pathways. Escalation should not be a failure state — it should be a designed feature of any production agent system.

Closing Thoughts

Agentic AI is not a future problem. It is a present operational challenge that most organisations are not yet structured to address. The pilot-to-production gap is real, and it will not close on its own.

The organisations that will succeed are the ones that treat agentic operations as a discipline — not an afterthought. That means building the harness around the model: the observability, the versioning, the governance, the evals, and the escalation pathways.

At Zimply, this is the work we do every week. We deploy agents and we operate them — in production, for real clients, where the cost of failure is real.

If you are thinking about compliance, governance, and agent behaviour at scale, this is the conversation worth having.


This article is based on a talk by Jesper Fredriksson, CTO at Zimply, presented at Data Innovation Summit 2026, Stockholm, May 7. He previously served as AI Engineer Lead at Volvo Cars, with five years in automotive data science and a decade in medical imaging research. He has been a returning Data Innovation Summit speaker since 2016.

Do you want to learn how Zimply can optimize your business?

Book demo

See more cases

Share the post

Related blog posts

Zimply get rid of the boring stuff

Increase productivity

Less errors

Reduce costs

Save time

Learn what AI can do for your company

Book a meeting, digital or physical and we will tell you more

Contact us

Book Demo

Stockholm

Artillerigatan 42
114 45 Stockholm

Gothenburg

Stampgatan 14
411 01 Gothenburg

Areas

Finance

Operations

HR

Sales

Compliance & IT

Customer Service

Industry we serve

Retail

Banking & Insurance

Travel

Legal

Real Estate

Hospitality

Accounting & Audit

Other

About Zimply

Knowledge Bank

About Us

FAQ

Customer

Our Partners

Contact us

Get started with AI easily in your company. Receive a free guide and template in your inbox.
This field is for validation purposes and should be left unchanged.
Samtycke(Required)
Zimply logotyp

Cookie Settings

Cookie Policy

Privacy Policy

© 2026

This website uses cookies

Cookies ("cookies") consist of small text files. The text files contain data which is stored on your device. To be able to place some type of cookies we need your consent. We at Zimply Innovation Nordic AB, corporate identity number 559163-4828 use these types of cookies. To read more about which cookies we use and storage duration, click here to get to our cookiepolicy.

Manage your cookie-settings

Necessary cookies

Necessary cookies are cookies that need to be placed for fundamental functions on the website to work. Fundamental functions are for instance cookies that are needed for you to use menus and navigate the website.

Functional cookies

Functional cookies need to be placed for the website to perform in the way that you expect. For instance to remember which language you prefer, to know if you are logged in, to keep the website secure, remember login credentials or to enable sorting of products on the website in the way that you prefer.

Statistical cookies

To know how you interact with the website we place cookies to collect statistics. These cookies anonymize personal data.

Ad measurement cookies

To be able to provide a better service and experience we place cookies to tailor marketing for you. Another purpose for this placement is to market products or services to you, give tailored offers or market and give recommendations on new concepts based on what you have bought from us previously.

Ad measurement user cookies

In order to show relevant ads we place cookies to tailor ads for you

Personalized ads cookies

To show relevant and personal ads we place cookies to provide unique offers that are tailored to your user data