Back to blog
6/12/20267 min readPlannerPoker Team

OpenAI's Ona Deal Turns Long-Running Agents Into Sprint Scope

OpenAI plans to acquire Ona so Codex agents can run in secure, persistent cloud environments. Agile teams should estimate the review gates, permissions, logs, and operational work before sprint commitment.

A neural network diagram representing long-running Codex agents in sprint planning
A neural network diagram by Loxaxs, released under CC0 via Wikimedia Commons. Source CC0 1.0

OpenAI's June 11, 2026 announcement that it plans to acquire Ona is a strong signal for engineering and product teams: AI agents are moving from short sessions to long-running production workflows.

OpenAI says Ona will bring secure cloud execution and orchestration technology into the Codex ecosystem. The goal is to give Codex agents a persistent place to work, including customer-controlled cloud environments where agents can continue after a laptop is closed or the original session ends. OpenAI also says more than 5 million people now use Codex each week, and that the most valuable agent work is increasingly unfolding over hours or days rather than minutes.

That is more than acquisition news. It is sprint planning news.

When an agent can keep working across tools, branches, cloud services, credentials, tickets, and review loops, the story is no longer just "let Codex do it." The story includes the environment the agent runs in, what it can access, how humans steer it, how work is reviewed, how logs are captured, and how the team recovers if the agent takes the wrong path.

Planning poker needs to estimate that whole operating model.

Persistent agents change the unit of work

Most teams already understand a short AI session. A developer asks a question, generates a patch, reviews it, runs tests, and decides what to keep.

Long-running agents are different.

A persistent Codex workflow might investigate a failing test suite, update dependencies, modernize old code, draft a pull request, respond to review comments, run a migration, or triage vulnerabilities over several hours. It may need repository access, package credentials, issue tracker context, cloud logs, staging data, and permission to trigger automated checks.

That makes the work larger than the prompt.

Before voting, ask:

  • Where will the agent run?
  • Which repository, system, ticket, or dataset can it access?
  • Which credentials are available, and how are they scoped?
  • What can the agent change without approval?
  • Who reviews intermediate decisions?
  • Which logs prove what happened?
  • What happens when the agent gets stuck?
  • How does the team stop, roll back, or restart the work?

If those answers are missing, the estimate should include uncertainty.

The environment is now part of the estimate

OpenAI describes Ona's value as secure, persistent environments where agents can access the tools, systems, and context they need over time. That detail matters because the agent environment is now part of the deliverable.

For normal app work, teams often estimate code, tests, and release steps. For long-running agent work, the estimate may also include:

  • Building or configuring the cloud workspace.
  • Connecting repositories and package registries.
  • Setting secrets and token boundaries.
  • Defining branch and pull request rules.
  • Capturing agent activity logs.
  • Adding human approval checkpoints.
  • Creating fallback paths for failed runs.
  • Updating support or incident notes.

That work is easy to miss because it feels like setup. But if setup determines whether the agent can safely touch production systems, it belongs in the story.

A low vote and high vote may both be right

Planning poker is useful because it exposes different assumptions before the team commits.

Imagine a story that says: "Use Codex to modernize the billing service dependency stack."

One engineer votes 5 because Codex can inspect the repo, update packages, run tests, and draft a pull request. Another votes 13 because the billing service touches customer data, payment logic, deployment scripts, alerts, and compliance evidence.

Both voters may be right. They are estimating different boundaries.

The low voter is assuming:

  • Codex runs in a contained development workspace.
  • No production credentials are available.
  • The agent only opens a pull request.
  • Existing tests are reliable.
  • Human review catches business logic issues.

The high voter is assuming:

  • Dependency updates may change payment behavior.
  • Secrets and package tokens need review.
  • Test coverage is incomplete.
  • Audit logs are required.
  • Rollback needs to be planned.
  • Security and finance stakeholders may need signoff.

Do not average those votes too quickly. Ask what environment, permissions, and review model each person assumed.

The final estimate should follow the risk boundary, not the demo.

Separate agent execution from agent acceptance

Long-running agents can make progress while people are away, but that does not make the result automatically accepted.

Good sprint planning separates three things.

First, execution. What can the agent attempt? This includes code changes, analysis, migrations, test runs, documentation drafts, issue triage, or dependency updates.

Second, supervision. When should humans interrupt, redirect, approve, or reject? This includes checkpoints, notifications, pull request review, and escalation paths.

Third, acceptance. What evidence proves the work is done? This includes passing tests, security checks, screenshots, logs, data validation, customer impact review, and release readiness.

If a ticket only estimates execution, it will look too small. The team still has to estimate supervision and acceptance.

Write agent boundaries into Jira before the vote

The best way to keep long-running agent work from becoming vague is to write the operating boundaries before planning poker starts.

Useful Jira notes might say:

  • "Codex may update code and tests, but cannot merge or deploy."
  • "Agent runs in a customer-controlled cloud workspace with repo access only."
  • "Secrets are read-only and scoped to staging."
  • "Human approval required before dependency lockfile changes are accepted."
  • "Activity logs must be attached to the ticket."
  • "Security review required because this touches authentication."
  • "Split this story if the agent needs production data."

These notes make the estimate more honest. They also help future AI summaries preserve the decisions the team made.

Product managers should care too

It is tempting to treat persistent Codex work as an engineering implementation detail. It is not.

If agents can work across the software lifecycle, product managers need to understand how that changes delivery promises. A long-running agent might shorten the time to first pull request, but the team may spend more time on review, validation, permissions, and operational safety.

That affects sprint scope.

For product teams, the planning question becomes:

  • Is this agent work exploration or delivery?
  • Which human role owns the result?
  • Is the agent allowed to change customer-facing behavior?
  • Does the story need a new acceptance checklist?
  • Are we creating a reusable workflow or a one-time task?
  • How will the team explain the work during sprint review?

The agent can move fast. The product commitment still needs to be clear.

A readiness checklist for long-running Codex work

Before committing persistent agent work to a sprint, check seven areas.

First, scope. What outcome should the agent pursue, and what is explicitly out of bounds?

Second, environment. Where does the agent run, and who controls that workspace?

Third, access. Which repositories, systems, credentials, data, and tools can it use?

Fourth, authority. Can the agent read, draft, modify, open pull requests, trigger jobs, or deploy?

Fifth, supervision. When does a human review progress or approve the next step?

Sixth, evidence. What logs, tests, diffs, screenshots, or reports prove what happened?

Seventh, recovery. How will the team stop, revert, retry, or escalate if the agent goes wrong?

If the team cannot answer these, the next step is refinement, not sprint commitment.

The takeaway for June 12

OpenAI's planned acquisition of Ona shows where AI coding agents are heading. The next frontier is not just better model output. It is persistent, secure, governed execution over real work.

That makes estimation more important.

Planning poker gives teams a practical way to keep humans in charge of the commitment. Vote privately. Reveal the spread. Ask what each person assumed about access, authority, logs, review, and rollback. Record the reasoning before the story enters the sprint.

Long-running agents can help teams move faster, but only if the sprint plan includes the work that makes them safe to trust.

Sources