ai pka knowledge-management markdown agentic-workflow claude-code obsidian data-migration llm automation

Architecting an Agentic Knowledge Base: Migrating a 29GB Legacy Database to a Markdown-Centric PKA Framework

5 min read

Architecting an Agentic Knowledge Base: Migrating a 29GB Legacy Database to a Markdown-Centric PKA Framework

Data migration has historically been a high-friction endeavor, characterized by the risk of broken relational links, loss of metadata, and the structural degradation of interconnected notes. However, the emergence of advanced Large Language Models (LLMs) and agentic workflows has fundamentally altered the paradigm. This post explores the technical implementation of migrating a massive, 21-year-old, 29 GB knowledge base—comprising over 27,000 items—into a highly structured, local-first Personal Knowledge Assistant (PKA) framework.

The PKA Architecture: A Multi-Agent Ecosystem

The core of this migration is not merely moving files, but transitioning from a proprietary, database-driven structure (SQLite) to a transparent, Markdown-based ecosystem. The PKA framework utilizes a specialized multi-agent architecture designed to operate within a local directory. This system is built on the principle of "Agentic Orchestration," where a fleet of specialized agents performs discrete tasks within a unified folder structure.

The current iteration of the PKA ecosystem consists of nine specialized agents, each defined by specific .md files containing their Standard Operating Procedures (SOPs). Key agents in this stack include:

  • Larry (The Orchestrator): Acting as the central "Spock" or single point of contact, Larry manages the high-level execution of tasks and directs queries to specialized sub-agents.
  • Silas (The Database Architect): Responsible for maintaining structural integrity, ensuring that the migration from SQLite to Markdown preserves the relational density of the original dataset.
  • Nolan (The HR Agent): Manages the lifecycle of new agents, handling the "hiring" (configuration) of new specialized personas. /
  • Packs (The Researcher): Facilitates deep-dive research to support agent onboarding.
  • Mac (The Automation Expert): Bridges the gap between the local folder and external tool integrations.
  • Charter (The Infographic Designer): A specialized agent that generates HTML-based graphics and converts them into user-facing visual assets.
  • Pixel (The Image Generator): Interfaces with external APIs (such as OpenAI or Nano Banana) to generate high-fidelity imagery.
  • Iris (The Design System Architect): Ensures all generated content adheres to a consistent design language.
  • The Journal Writer: Manages the temporal aspect of the knowledge base, utilizing Wiki-links to interconnect daily logs with long-term knowledge nodes.

Technical Implementation: The Migration Workflow

The migration process leverages Claude Code (or Claude via terminal) to execute a programmatic transformation of the legacy data. The technical challenge involves parsing a 29 GB directory containing 17,000+ Markdown files, 17,000+ attachments, and complex SQLite-based relational data, and re-mapping them into the PKA scaffold.

1. Environment Initialization

The process begins by initializing the LLM within the new PKA scaffold. By using the terminal to launch Claude within the specific directory, the model gains immediate context of the folder structure.

A critical technical optimization used here is the Redirection Layer. Rather than loading the entire knowledge base into the context window—which would lead to massive token consumption and latency—the system utilizes a lightweight Claude.md file. This file acts as a pointer, instructing the LLM to look into specific sub-directories and agent.md files only when necessary. This "just-in-time" context loading is essential for maintaining performance in large-scale repositories.

2. The Role of the 1M Context Window

For a migration of this scale, the choice of model is critical. While smaller models can handle individual file transformations, the orchestration of a 29 GB migration requires a model with a massive context window, such as Claude 3.5 Sonnet or Claude 3 Opus, ideally with a 1 million token context window. This allows the model to ingest the "inventory" of the legacy database—understanding the sheer scope of the 27,000 items—before generating the migration scripts.

3. Data Normalization and Schema Mapping

The migration is not a simple copy-paste operation. The agent Silas (the Database Architect) analyzes the source schema. During the migration, the LLM identifies discrepancies, such as missing entities (e.g., a "quotes" table that does not exist in the target Markdown structure).

The LLM then executes a multi-step transformation:

  1. Parsing: Reading the legacy SQLite/Markdown source.
  2. Normalization: Converting database rows into standardized Markdown files.
  3. Relational Mapping: Reconstructing the original connections using Wiki-links (e.g., [[Note Name]]).
  4. Metadata Injection: Utilizing YAML frontmatter to store persistent metadata, such as owner, date, and tags, ensuring the data remains machine-readable for all agents in the ecosystem.

Markdown vs. HTML: The Agentic Split

A key architectural decision in the PKA framework is the strict separation of concerns between Markdown and HTML.

  • Markdown (.md): The primary format for all internal knowledge, agent SOPs, and long-term storage. Markdown is chosen for its high "readability" for LLMs, its lightweight nature, and its compatibility with tools like Obsidian for graph visualization.
  • HTML: Reserved exclusively for user-facing content and visual assets generated by agents like Charter.

This split ensures that while the agents work in a highly efficient, natural-language-optimized environment, the end-user receives polished, visually rich outputs.

Conclusion: The Future of Local-First AI

The success of this migration demonstrates that the future of knowledge management lies in LLM Independence. By moving away from proprietary, closed-loop ecosystems (like Claude's internal "memory" or specific plugin architectures) and toward a local, Markdown-based folder structure, users gain total sovereignty over their data.

Whether using Claude, Gemini, or a local LLM, the PKA framework remains functional because the intelligence resides in the SOPs and the folder structure, not the provider's server. This architecture transforms a static archive into a living, breathing, agentic ecosystem.