← BLOG · 4 MIN · BY RALF KLEIN

The hidden cost of untracked AI agents

Untracked automations get cut first when budgets tighten. The three hidden costs of running AI agents, and the minimum tracking that fixes each.

  • company

When a budget review arrives, the first automations on the chopping block are the ones nobody can describe in numbers. Not the worst performers. The invisible ones. An agent quietly clearing 3,000 tickets a month gets cut because it has no line in the report, while a flashier project with a slide deck survives on narrative alone.

This is the trap of the untracked AI agent. It can be doing real work and still lose the argument, because the argument is decided on what shows up in the report, not on what happened in production.

Forrester's 2026 predictions put a number on the consequence: enterprises will defer 25% of planned AI spend into 2027, with fewer than one in three decision makers able to tie AI value to financial growth. That deferral is not a verdict on whether the agents work. It is a verdict on whether anyone could prove it.

Untracked agents carry three specific costs. None of them show up on an invoice, which is exactly why they are dangerous.

Cost one: budget exposure

Every agent you run costs money to operate. Token spend, platform fees, the engineer time to keep it alive. That side of the ledger is always visible, because someone gets a bill.

The savings side is only visible if you instrument it. So an untracked agent shows up in your finances as pure cost with no offsetting value. When the FinOps review runs, it looks identical to waste.

This is the asymmetry that kills automations. The cost is automatic and the value is opt-in. Gartner forecasts that over 40% of agentic AI projects will be canceled by the end of 2027, citing escalating cost and unclear business value. Unclear is the operative word. A cancelled agent is rarely a proven failure. It is usually an unproven success.

The minimum protection is one number per agent: human hours saved, priced at a defensible baseline. Not a quarterly estimate. A running total that sits next to the run cost so the two can be read in the same glance.

Cost two: attribution loss

The second cost shows up when an agent does create value and someone else claims it.

A triage agent routes inbound requests so the support team closes tickets faster. The support team reports faster resolution times. The agent that made it possible appears nowhere in that report. Six months later, the support tooling gets the credit and the renewal, and the agent gets questioned.

Attribution loss is worse than budget exposure because the value is real and still lands on the wrong line. The work happened. The reporting routed the credit to a human team or an adjacent tool, and the agent became invisible inside its own win.

The fix is per-execution attribution. Each time the agent completes a unit of work, it logs a savings event tagged with its own identity:

{
  "agent_id": "support-triage",
  "task_type": "ticket_routing",
  "outcome": "success",
  "human_baseline_minutes": 4
}

At a realistic agent volume of 5,000 to 40,000 executions a month, that is a precise hours-saved figure attributed to the agent that earned it, not to whoever happened to be downstream.

Cost three: scope creep blindness

The third cost is the slowest and the most expensive.

Agents drift. A workflow built to classify invoices starts handling refunds because someone added a branch. A summariser quietly grows a second job. Each addition is reasonable on its own. The aggregate is an agent doing five things, costing five things, and reported as one thing.

Without per-task tracking you cannot see the drift, so you cannot tell which part of the agent earns its keep and which part is dead weight. You renew or cancel the whole thing as a unit, when the right move is almost always to keep two branches and cut three.

Scope creep blindness is why teams end up with automations nobody fully understands. The protection is tracking at the task level, not the agent level. Tag each task type separately so a single agent reports a breakdown:

curl -s -X POST "https://humanhours.dev/api/v1/track" \
  -H "Authorization: Bearer hh_live_..." \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"finance-ops","task_type":"invoice_classification","outcome":"success","human_baseline_minutes":6}'

When the breakdown shows invoice classification saving 200 hours a month and the bolted-on refund branch saving 4, the cut decision makes itself.

The pattern underneath all three

The three costs are one problem wearing three masks. Cost is measured automatically and value is not, so the default state of any agent is expensive and unproven. Tracking flips the default.

The minimum that protects against all three is the same small commitment: one savings event per execution, attributed to the agent, broken down by task type, priced against a baseline you can defend in a finance review. That is not a dashboard project. It is one HTTP call placed where the work completes.

The agents that survive the next budget cycle will not be the best ones. They will be the legible ones. The two sets overlap less often than anyone wants to admit, and the difference between them is whether someone instrumented the value before the review, not after.