Engineering Deterministic Outcomes: Why Unconstrained Agentic Loops Fail and the Case for Constrained Feedback Harnesses

title: "Engineering Deterministic Outcomes: The Perils of Unconstrained Agentic Loops" date: 2026-06-09 description: "An analysis of agentic loops, token economics, and the implementation of constrained feedback engines in software development." tags: [ai, agents, engineering, devops, llm]

The current discourse surrounding Large Language Model (LLM) orchestration is dominated by a single, seductive concept: the Agentic Loop. Proponents of autonomous agency suggest that we are moving away from "prompting" and toward "system building," where an agent is given a high-level specification—a spec.md or a Product Requirements Document (PRD)—and left to iterate autonomously until a goal is achieved.

However, beneath the hype lies a significant engineering challenge: the distinction between Human-in-the-Loop (HITL) workflows and Unconstrained Agentic Loops. While the former provides governance through iterative human validation, the latter introduces high-variance "slot machine" dynamics that can lead to catastrophic token expenditure and architectural drift.

The Architecture of Iteration: HITL vs. Autonomous Loops

To understand the technical divergence, we must first define the two primary operational modes of AI-assisted development.

1. Human-in-the-Loop (HITL)

In a standard HITL workflow—utilizing tools like Cursor, Claude, or OpenAI's ecosystem—the human acts as the central controller in a continuous feedback cycle. The architecture follows a linear, supervised progression:

Prompt/Instruction: The developer provides a specific task (e.g., "Implement authentication logic").
Generation: The agent generates code or a feature.
Validation: The human reviews the output, runs tests, and verifies against the mental model of the product.
Iteration: The human provides corrective prompts based on the observed result.

In this model, the "loop" is closed by human cognition. This ensures that every architectural decision remains aligned with the intended product vision, preventing the agent from making unmanaged assumptions.

2. Unconstrained Agentic Loops

The emerging trend, championed by high-scale researchers, involves removing the human from the middle of the loop. In this architecture, the agent is provided a task list (e.g., via slash_goal or similar autonomous primitives) and is instructed to iterate on its own output using its own generated results as feedback.

Theoretically, the agent reads its output, evaluates it against the initial .md specification, and continues generating code until completion. While this promises high-velocity development, it introduces two critical failure modes: Assumption Drift and Token Exhaustion.

The Failure Modes of Autonomy

Assumption Drift and Specification Incompleteness

No PRD or spec.md is ever truly exhaustive. As an agent iterates autonomously, it inevitably encounters edge cases not defined in the initial prompt. Without a human to adjudicate these "n0" decisions (decisions with no predefined instruction), the agent must make assumptions regarding UI/UX, state management, and error handling.

When an agent is allowed to "hallucinate" architectural decisions to satisfy a loop, it creates technical debt that is often invisible until the entire process has concluded. The result is a finished product that functions according to the code but fails to meet the unstated requirements of the human stakeholder.

The Economics of Token Burn

The second failure mode is purely economic. Autonomous loops are computationally expensive. In an unconstrained loop, every iteration consumes significant tokens for context processing and generation. For developers operating on standard $20/month or even $100/month tiers, a single runaway loop can exhaust a monthly budget in hours.

The scale of this issue is best illustrated by industry outliers; there are documented cases where high-scale researchers have burned upwards of $1.3 million in tokens within a single month to power sophisticated agentic research. For the vast majority of engineers, unconstrained loops represent an unsustainable "token donation" to trillion-dollar AI providers rather than a productive development tool.

The Solution: Constrained Feedback Engines

The path forward is not the abandonment of loops, but the implementation of Constated/Constrained Loops. A successful loop must have a fixed, binary, or highly structured feedback mechanism that does not rely on subjective human intuition for every turn.

Case Study: The `greploop` Implementation

A viable engineering pattern involves using an agentic loop specifically for Code Review, where the feedback is quantitative rather than qualitative.

Consider a workflow utilizing Cursor (as the IDE/Harness), GitHub (as Version Control), and Greptile or CodeRabbit (as the Code Review Agent). The architecture of a "constrained loop" can be structured as follows:

The Trigger: A developer pushes code to GitHub via Cursor.
The Evaluator: An automated agent (e.g., Greptile) intercepts the Pull Request (PR). This agent performs static and dynamic analysis, providing a numerical score (0-5) based on security, performance, and adherence to standards.
The Loop Logic (greploop): A custom skill or script is executed where the agent:
- Reads the Greptile review/score from GitHub.
- Evaluates if Score < 4.
- If true, it pulls the feedback into Cursor, applies fixes, and pushes a new commit to trigger a re-review.
- The loop terminates only when Score >= 4 or a maximum of $N$ iterations (e.g., 5 turns) is reached.

Technical Constraints for Success

For this constrained loop to remain stable, two technical constraints must be observed:

Granularity of Change: The agent should not attempt to review or fix PRs exceeding 1,000 lines of code. Beyond this threshold, the context window limitations and the complexity of dependency mapping increase the probability of the agent losing track of the global state, leading to a failure in reaching the target score.
Binary/Quantitative Feedback: The loop must rely on an objective metric (like a security score or test pass/fail) rather than subjective "style" preferences which are harder for an autonomous agent to converge upon without human intervention.

Conclusion: The Future of Agentic Engineering

We are currently in the "experimental" phase of agentic loops. They are highly effective for low-stakes prototyping—such as building a simple simulation or an SEO-driven content generator where the output is high-volume and low-complexity.

However, for mission-critical software engineering, the Human-in-the-Loop remains the superior architecture. The future of AI development lies not in replacing the human with an autonomous loop, but in augmenting the human with highly specialized, constrained loops that handle the repetitive, verifiable aspects of the SDLC (Software Development Life Cycle), such as automated linting, security auditing, and unit test enforcement.