ai claude-code nanobanana-2 mcp json-prompting gemini-api automation image-generation python-api developer-tools mcp-skills google-ai-studio

Automating High-Fidelity Image Generation: Integrating NanoBanana 2 with Claude Code via MCP Skills and JSON Prompting

5 min read

Automating High-Fidelity Image Generation: Integrating NanoBanana 2 with Claude Code via MCP Skills and JSON Prompting

The current landscape of generative AI is often fragmented. While state-of-the-art image generation models like NanoBanana 2 and NanoBanana Pro produce industry-leading visual fidelity, the workflow remains bottlenecked by manual intervention. Users are typically forced into a high-latency loop: navigating web interfaces, manually engineering prompts, and context-switching between the browser and their development environment.

This post explores a paradigm shift in generative workflows: the integration of Claude Code (an agentic CLI/desktop interface) with the NanoBanana 2 API using Model Context Protocol (MCP) skills. By leveraging structured JSON prompting, we can transform Claude Code from a text-based assistant into an autonomous image generation engine capable of executing complex, multi-turn visual tasks directly from the terminal.

The Problem: The Limitations of Natural Language Prompting

Standard natural language prompting is inherently imprecise. When you ask an LLM to "make a perfume bottle," the resulting image is subject to the model's stochastic interpretation of "perfume bottle." You lose control over critical aesthetic parameters such as lighting temperature, surface materials, focal length, and environmental composition.

The solution lies in JSON Prompting. By utilizing a structured JSON schema, we can force the LLM to define specific attributes within a deterministic framework. A robust schema includes keys such as:

  • type: The core subject classification.
  • name: Specific identifiers for the object.
  • description: Granular detail of the subject.
  • mid-ground/background: Spatial depth and environmental context.
  • lighting: Specificity regarding luminosity and shadows.
  • surface_materials: Detailed textures (e.g., brushed aluminum, frosted glass).

When Claude Code generates these JSON objects, it acts as a high-level orchestrator, translating vague user intent into a high-density technical specification that the NanoBanana 2 model can execute with precision.

Architectural Overview: The Two-Skill System

To achieve this integration, we implement a dual-skill architecture within Claude Code using the Model Context Protocol (MCP).

Skill 1: The JSON Prompting Engine

The first skill is a transformer. Its sole responsibility is to ingest a natural language prompt and output a valid JSON object adhering to the NanoBanana schema. This skill utilizes the npmx skillfish utility for installation, allowing for global availability across all detected Claude agents.

Installation via NPM:

npmx skillfish add [username]/claude-code-nanobanana-skills

This skill provides the "intelligence" layer, ensuring that even a simple prompt like "a girl drinking a beer" is expanded into a multi-dimensional JSON object containing lighting, framing, and environmental metadata.

Skill 2: The NanoBanana 2 Execution Engine

The second skill is the execution layer. This skill interfaces directly with the NanoBanana 2 Python API. It takes the JSON output from Skill 1, parses the parameters, and executes a call to the model.

This execution engine is built on a Python-based implementation of the NanoBanana API, supporting:

  • Multi-turn image editing: Using a reference image as an input to modify specific elements (e.s., swapping a logo).
  • Resolution and Aspect Ratio Control: Programmatic adjustment of output dimensions (e.g., 9:16 for Instagram).
  • Reference Image Injection: Passing existing image buffers to the model for style or content consistency.

Implementation Workflow

1. Environment Configuration and API Provisioning

The backend of this workflow relies on the Gemini API via Google AI Studio. To ensure the agent can authenticate, the API key must be exported to the system's environment path.

Security Note: While you can use export GEMINI_API_KEY="your_key" directly in the terminal, it is highly recommended to use a secure method to avoid leaking credentials in chat histories.

Setting a Spending Cap: To prevent runaway costs during autonomous agentic loops, it is critical to configure a spending cap within the Google AI Studio billing console. Setting a hard limit (e.g., $10.00) ensures that even if an agent enters an infinite loop of generation tasks, the financial impact is contained.

2. Deploying the Execution Skill

The execution skill is deployed by providing Claude Code with the necessary Python source files. The agent then wraps these files into a functional MCP skill. The core logic involves:

  1. Loading the NanoBanana 2 Python library.
  2. Parsing the JSON schema from the prompt engine.
  3. Executing the generate method with the specified parameters.

3. The Agentic Loop in Action

Once configured, the workflow becomes entirely autonomous. A developer can issue a single command in Claude Code:

"Generate a 9:16 image for Instagram of a person holding a Sprite and a Mac Mini with the OpenClaw logo on the screen, using the NanoBanana 2 skill."

The Agent's Internal Process:

  1. Prompt Expansion: Claude Code invokes the JSON Prompting Skill to expand the prompt into a structured schema (defining the Sprite bottle's condensation, the Mac Mini's aluminum texture, and the specific lighting).
  2. API Execution: Claude Code invokes the NanoBanana 2 Skill, passing the JSON payload to the Python API.
  3. Image Retrieval: The model generates the image and saves the asset to the local directory.
  4. Verification: The agent confirms the file creation and presents the result.

Advanced Use Case: Multi-Turn Image Manipulation

The true power of this integration is revealed in multi-turn editing. Because the execution skill supports reference images, you can perform complex "image-to-image" tasks. For example, you can upload an existing .png and instruct Claude Code: "Swap the Cloud logo in this image for a ChatGPT logo using the NanoBanana 2 skill." The agent parses the existing file, identifies the target area via the API, and generates a modified version while maintaining the original composition.

Conclusion

By moving image generation from a web-based UI to an MCP-enabled terminal environment, we unlock the ability to scale visual asset production. The combination of Claude Code's reasoning, JSON-based structural precision, and the NanoBanana 2 API's generative power creates a highly efficient, programmable pipeline for automated UI/UX design, marketing asset generation, and much more.