Architecting an Agentic AI SaaS: A Deep Dive into Next.js 15, Gemini OCR, and MCP-Driven Workflows

Building a scalable AI-driven Software-as-a-Service (SaaS) requires more than just a prompt and an API key. It requires a robust system architecture capable of handling asynchronous processing, complex data extraction, and multi-channel agentic interactions.

In this post, I will break down the full-stack architecture of BookZero.ai, an AI-powered financial management platform designed to automate receipt and bank statement processing. Within just 28 days of launch, the platform achieved over 1,150 active users, driven primarily by organic search impact. This breakdown covers the four critical pillars of the system: the Client Side, the Controller, External Services, and Observability.

The Four-Pillar System Architecture

To maintain a high-performance, scalable environment, I partitioned the architecture into four distinct functional sections. This separation of concerns allows for independent scaling of the frontend, the business logic (controller), the heavy-lifting background jobs, and the monitoring layer.

1. The Client-Side Layer: Multi-Channel Interaction

The user interface is not limited to a single web portal. The system interacts with users through three primary channels:

Web Application: The primary dashboard for managing expenses and viewing analytics.
Telegram Bot: A lightweight interface for quick interactions, such as uploading receipts or checking usage via slash commands.
Email Notifications: Automated transactional emails for system updates and alerts.

For the web application, I utilized Next.js 15 and React 19. The frontend stack is optimized for state management and data fetching efficiency:

Styling: Tailwind CSS for utility-first responsive design.
UI Components: Shadcn UI for a consistent, accessible design system.
Server State Management: TanStack Query to handle asynchronous data fetching, caching, and synchronization with the backend.
Client State Management: Zustand for lightweight, high-performance local state management (e.g., managing UI toggles or temporary upload states).

2. The Controller: The Logic Engine

The "Controller" acts as the brain of the application, handling request routing and orchestration. A critical distinction in this architecture is the placement of the Next.js App Router within the Controller layer rather than the Client layer. Because we utilize Server-Side Rendering (SSR), the routes are served from the server to ensure optimal SEO and initial load performance.

The Controller manages two types of execution paths:

Server Actions: These are used for internal service calls, such as handling user authentication, onboarding flows, or updating user profiles within our internal database.
API Route Handlers: These are dedicated to interacting with external services. This includes the heavy-duty logic for our OCR (Optical Character Recognition) pipeline and managing large-scale imports from Google Drive.

By separating these, we can maintain a clean boundary between internal business logic and external third-party integrations.

3. External Services: AI, Data, and Background Processing

The core value proposition of BookZero.ai lies in its ability to extract structured data from unstructured documents. This requires a sophisticated orchestration of AI models and background workers.

The AI Pipeline

We employ a multi-model strategy to balance cost, speed, and accuracy:

Gemini (Google): Utilized specifically for the OCR pipeline. Gemini's multimodal capabilities allow us to ingest receipt images and bank statement PDFs, extracting key entities such as vendor names, transaction dates, and total amounts with high precision.
OpenAI: Powers the Chat Agent. When users query their spending via the dashboard or Telegram, OpenAI handles the natural language understanding (NLU) and the generation of structured responses.

Data and Infrastructure

Database & Auth: Supabase serves as our primary backend-as-a-service, providing PostgreSQL capabilities and robust authentication.
Background Jobs & Queue Management: To prevent hitting rate limits on the Gemini or OpenAI APIs, we use QStash. QStash acts as a serverless messaging and scheduling solution, allowing us to implement auto-retries, batching, and parallel request processing.
Caching: Upstash Redis is implemented as a hot cache to reduce latency for frequently accessed data and to manage rate-limiting logic.
Integrations: The system integrates with Stripe for subscription management, Resend for transactional email, and Google Drive for automated cloud-based document ingestion.

Agentic Integration via MCP

A key innovation in this architecture is how we connect AI agents to the SaaS product. While low-code tools like n8n can be used for simple Telegram/WhatsApp integrations, I implemented a custom Chat Orchestrator within the Controller.

This orchestrator utilizes the Model Context Protocol (MCP). By building an MCP server, we can expose our internal API endpoints as "tools" that the AI agent can call. This allows the agent to perform real-world actions—like querying a specific month's spending or generating a chart—directly through the LLM's tool-calling capabilities. This logic is reused across both the Web and Telegram interfaces, ensuring a unified agent experience.

4. Observability: Monitoring and Analytics

A production-grade system is blind without observability. I use a dual-layered approach to monitor both system health and user behavior.

System Observability (Sentry): Sentry is our primary tool for error tracking and performance monitoring. It allows us to capture runtime exceptions, monitor API latency, and receive real-time alerts for high-severity outages or regressions in the OCR pipeline.
Product Analytics (PostHog): To understand user retention and conversion, we use PostHog. We track specific events (e.g., "click on buy button," "receipt upload success") to build conversion funnels. This data is critical for identifying where users drop off in the onboarding flow. Interestingly, we also pull PostHog analytics via API to display growth metrics directly within our internal admin dashboards.

The Development Workflow: From Spec to Implementation

The speed of development for BookZero.ai was driven by an AI-augmented workflow. My process follows a strict roadmap:

Specification & Planning: Using G-Stack and specialized prompting techniques to create a technical spec and implementation plan.
UI/UX Design: Utilizing Claude Design (an AI-driven design tool) to generate high-fidelity wireframes and UI components based on prompts.
Implementation: Leveraging Claude Code for the heavy lifting of writing the Next.js 15 logic, implementing the MCP server, and configuring the QStash queues.

By integrating AI at every stage—from design to deployment—we can move from a concept to a revenue-generating, multi-channel AI SaaS in a fraction of the traditional development time.

Architecting an Agentic AI SaaS: A Deep Dive into Next.js 15, Gemini OCR, and MCP-Driven Workflows

Architecting an Agentic AI SaaS: A Deep Dive into Next.js 15, Gemini OCR, and MCP-Driven Workflows

The Four-Pillar System Architecture

1. The Client-Side Layer: Multi-Channel Interaction

2. The Controller: The Logic Engine

3. External Services: AI, Data, and Background Processing

The AI Pipeline

Data and Infrastructure

Agentic Integration via MCP

4. Observability: Monitoring and Analytics

The Development Workflow: From Spec to Implementation

Stay in the loop

Stay in the loop