Back to blog
6/24/20267 min readPlannerPoker Team

Gartner's AI Coding Cost Warning Belongs in Sprint Planning

Gartner predicts AI coding token costs could surpass the average developer salary by 2028. Product owners should bring token budgets, model tiers, agent loops, and ROI assumptions into planning poker before AI-assisted work enters the sprint.

A laptop, notebook, and calculator representing AI coding cost planning
Desktop with laptop and calculator image from Unsplash, released under CC0 1.0 via Wikimedia Commons. Source CC0 1.0

Gartner's June 24, 2026 warning should land directly in sprint planning: AI coding costs may overtake the average developer's salary by 2028 as token consumption surges.

The headline is provocative, but the planning lesson is practical. Gartner says rising large language model token consumption, agentic coding workflows, and consumption-based licensing models are straining software engineering budgets. Its related guidance on token optimization says some teams have already seen cost increases up to 100x over older coding assistants as they move from autocomplete-style help to agentic tools that read code, plan changes, run loops, and generate larger outputs.

That matters to product owners because AI-assisted work is no longer automatically cheap just because the first draft appears quickly.

Planning poker should not turn story points into dollars. But token cost is a real signal about scope, complexity, uncertainty, and operational risk.

AI coding cost is part of the work now

Most teams started with AI coding tools as a seat-license experiment. A developer paid for an assistant, got completions, asked a few questions, and maybe generated a small patch. That model was easy to ignore in sprint planning.

Agentic coding is different.

An agent can read a large repository, load long histories, inspect tests, generate multiple patch attempts, run commands, evaluate failures, rewrite files, and ask for more context. Each loop can consume tokens. Each high-context request can cost more. Each frontier model choice can change the bill.

A backlog item that says "use AI to modernize this module" may include:

  • Repo-wide context loading.
  • Multiple plan and critique loops.
  • Test execution and failure analysis.
  • Pull request description generation.
  • AI code review.
  • Security or dependency scanning.
  • Rework after human review.
  • Additional prompts to explain generated changes.

That is work. It is also spend.

Planning poker should expose token assumptions

Imagine a story that says: "Use an AI coding agent to refactor the billing rules service."

One engineer votes 5 because the agent can update code quickly. Another votes 13 because the service has legacy tests, payment edge cases, customer data boundaries, long audit history, and uncertain review effort.

Both estimates may be rational. They are estimating different token and review assumptions.

The low voter is assuming:

  • Narrow context.
  • One model run.
  • Good existing tests.
  • Small patch.
  • Human review is straightforward.
  • No repeated agent loops.

The high voter is assuming:

  • Long repository context.
  • Several failed attempts.
  • Expensive model tier.
  • Large generated diff.
  • Extra security review.
  • Human time spent validating agent output.
  • Retry cost when tests fail.

Do not average those votes immediately. Ask what agent workflow each person imagined.

The final estimate should follow the actual operating model, not the idea that AI makes the work disappear.

Product owners need AI cost acceptance criteria

AI cost control is often treated as an engineering management problem, but product owners own the promise behind the work. If the story needs an AI planning report, an AI code migration, or an AI-generated workflow, the acceptance criteria should include the conditions that make the cost sustainable.

Useful criteria might include:

  • "Use the lowest-cost model that meets the quality threshold."
  • "Summarize long context before sending it to a frontier model."
  • "Cap retries and show a clear failure state."
  • "Log token usage per report, workspace, or pull request."
  • "Require human approval before a high-cost agent run starts."
  • "Do not send irrelevant files or comments into model context."
  • "Cache stable context where privacy rules allow it."
  • "Show admins usage before the budget is exhausted."

These are not accounting details. They shape user experience, support, pricing, and trust.

Experimentation still needs a budget shape

Business Insider reported comments from Claude Code creator Boris Cherny that companies are right to focus on AI ROI, but should still leave room for experimentation. That is the balance product teams need.

If teams lock down every token too early, they may miss useful workflows. If they ignore spend entirely, they may create a product that looks productive in demos and painful in production.

A healthier planning pattern separates three budgets:

  • Discovery budget: controlled experiments that search for valuable AI workflows.
  • Delivery budget: known agent runs attached to sprint stories.
  • Operations budget: recurring usage after the feature ships.

Planning poker can reveal which budget a story belongs to. A spike should not be estimated like a production workflow. A production workflow should not be treated like a free-form experiment.

Token cost changes story splitting

When token cost is invisible, teams often ship one large AI story.

When token cost is visible, it becomes easier to split:

  • Measure token use on five representative issues.
  • Compare small, medium, and frontier model output quality.
  • Add context pruning before generation.
  • Add admin usage visibility.
  • Add retry limits and failure states.
  • Move one workflow from prototype to production.
  • Expand only after cost and quality are known.

Each story creates evidence. That evidence makes the next estimate better.

This is exactly where planning poker helps. The team is not only estimating development effort. It is estimating uncertainty in context size, model choice, retry rate, review effort, and production behavior.

The takeaway for June 24

Gartner's prediction is a warning against lazy AI planning. AI coding tools can help teams move faster, but agentic workflows also create new cost curves. Token spend, model tier, context length, review loops, and retry behavior are now part of the delivery system.

Product owners should bring those assumptions into refinement before the sprint commitment.

Before voting, ask:

  • Which AI tool or model is expected?
  • How much context will it need?
  • How many retries are acceptable?
  • What happens when the token budget is hit?
  • Who reviews AI-generated changes?
  • What evidence proves the AI run was worth it?
  • Which parts of the story can be done without an expensive agent?

Then vote privately. Reveal the spread. Discuss the assumptions behind the low and high estimates. Record the cost and review boundaries before the story enters the sprint.

AI coding may change how software is built. Planning poker still protects the team from pretending uncertainty is free.

Sources