The Pattern Mimicry Trap: How Existing Codebase Architecture Dictates LLM Code Generation Quality
In the era of AI-augmented software engineering, a common misconception persists: that the quality of an LLM's output is solely a function of the model's reasoning capabilities and the prompt's clarity. However, empirical testing suggests a more profound dependency. The structural integrity and architectural patterns of your existing codebase act as a "contextual mirror." If your codebase is characterized by technical debt and poor separation of concerns, LLMs will not only replicate these anti-patterns but will actively treat them as the established "ground truth" for future development.
The Experimental Framework: "Bad" vs. "Better" Architectures
To quantify this phenomenon, an experiment was conducted using two distinct Laravel-based controller environments. The objective was to provide a single, non-refactoring prompt—"Add support for refunding invoices"—to various LLMs and observe whether they would adhere to existing patterns or attempt to improve the architectural state.
Environment A: The "Bad" Codebase (Monolithic/Anti-Pattern)
The "Bad" environment was designed to represent a high-debt, low-separation-of-concern architecture. The characteristics included:
- Controller Monolithism: All logic, including validation and business rules, resided directly within the controller methods.
- Lack of Request Abstraction: No use of
FormRequestclasses; validation logic was handled inline. - Absence of Service/Action Layers: No dedicated
ActionorServiceclasses for business logic execution. - Manual Response Handling: Direct
response()->json()calls without the use of API Resources or standardized response wrappers. - Pattern Duplication: Identical, repetitive logic across
storeandHD(cancel) methods.
Environment B: The "Better" Codebase (Decoupled/Pattern-Oriented)
The "Better" environment utilized modern, scalable design patterns:
- Separation of Concerns: Implementation of the Action pattern for business logic execution.
- Validation Abstraction: Use of dedicated
FormRequestclasses for request validation. - Data Transfer Objects (DTOs): Utilization of
InvoiceDataclasses to structure invoice payloads. - Standardized API Responses: Implementation of a dedicated
ApiResponseclass for consistent success and error envelopes. - Repository/Action Pattern: Logic encapsulated within
CreateInvoiceActionand similar classes.
Comparative Analysis of Model Performance
The experiment tested several high-parameter models and agents, including Claude Opus 4.7 (Medium), Claude Opus 5.7 (High/Extra High), GPT 5.5 (High/Extra High), Cursor Composer 2.5, and Kimi K 2.6.
1. The Baseline: Claude Opus 4.7 (Medium Effort)
When running on the "Bad" codebase, Opus 4.7 (Medium) exhibited pure pattern mimicry. The generated refund method replicated the inline validation and manual JSON response patterns found in the store and cancel methods. There was zero attempt to introduce FormRequest or Action classes, effectively codifying the existing technical debt.
2. The Reasoning Tier: Claude Opus 5.7 & GPT 5.5
As we increased the "effort" or reasoning depth (moving to "Extra High" configurations), the models began to demonstrate "contextual discovery."
- Claude Opus 5.7 (Extra High): While the model maintained the existing pattern for the new
refundmethod, it performed a critical discovery: it identified an unusedInvoiceDataclass within the codebase and successfully implemented it to construct the new response. However, it failed to refactor the existingstoreorcancelmethods to use this class, leading to a fragmented, dual-style architecture. - GPT 5.5 (High): This model demonstrated superior refactoring capabilities. In the "Bad" codebase, it not only implemented the
refundlogic but also proactively refactored the existingstoreandcancelmethods to utilize theInvoiceDatastructure. - GPT 5.5 (Extra High): This configuration achieved the most architecturally sound result for the "Better" codebase, implementing a full
RefundInvoiceAction,RefundInvoiceRequest, and utilizing theInvoiceDataDTO. However, like its predecessor, it did not retroactively refactor the older methods, illustrating the "hit-and-miss" nature of LLM refactoring.
3. The Edge Cases: Cursor Composer 2.5 & Kimi K 2.6
- Cursor Composer 2.5: Despite being a "faster/cheaper" agent, Cursor demonstrated impressive localized refactoring. It did not implement a full Action class but instead identified the repetitive payload logic and extracted it into a private, reusable method within the controller, refactoring both
storeandcancelto use this new internal method. - Kimi K 2.6: This model showed partial progress by implementing a
StoreRefundRequest, but otherwise defaulted to the "Bad" codebase's pattern of manual JSON responses and lack of service layers.
Technical Conclusions: The "Sacred Code" Problem
The core takeaway from this experiment is the Non-Deterministic Refactoring Phenomenon. LLMs treat existing code as "sacred." Unless explicitly prompted to refactor, the models' primary objective is pattern adherence.
The results highlight three critical technical insights for AI-driven development:
- Contextual Dependency: The quality of the prompt is secondary to the quality of the context. An LLM cannot "fix" a codebase it perceives as the standard.
- The Refactoring Gap: While high-reasoning models (GPT 5.5 Extra High, Opus 5.7 Extra High) can identify opportunities for improvement (e.g., finding unused DTOs), they are statistically unlikely to perform global refactors without explicit instruction.
- Architectural Fragmentation: When models do attempt to improve code, they often create "hybrid" architectures—where new features follow modern patterns while old features remain in legacy patterns—increasing the cognitive load for human developers.
As we integrate agents like Claude Code and Cursor Composer into our CI/CD pipelines, the responsibility remains with the human engineer to provide a high-quality architectural foundation. Without it, we are simply using advanced intelligence to automate the production of technical debt.