🤖 Agentic AI · ★★★ FEATURED

Neural Core — Retrieval as a Control Problem

A retrieval-and-reasoning backbone that treats RAG as a control loop — grade, verify, reflect — over an agent-agnostic blackboard where new capabilities plug in with zero rewiring.

Overview

Neural Core is the retrieval-and-reasoning layer I build LLM apps on top of. Ask it a question and it does not just embed-and-fetch: it grades what came back, decides whether that is enough, verifies the shaky parts, and reflects before it commits to an answer. The premise is that retrieval is a control problem — a loop with feedback — not a single vector lookup wired straight into a prompt.

Underneath that loop is the part I am most proud of: an agent-agnostic blackboard. Every capability — vector search, graph traversal, web search, grading — is a stateless tool that reads from and writes to one shared, namespaced store and discovers what else is available at runtime. Adding a new capability is dropping in a file, not threading a new field through the orchestrator, the context object, and four agents that depended on each other. This case study is honest about where it stands: the core loop and the rearchitecture are real and working; the accuracy figures below are design targets and internal runs, not a published benchmark, and a few phases are explicitly future work.

The v1 problem: adding an agent touched five files

The first version (app/agents/) worked, and it taught me exactly what not to do again. State lived in a PersistentContext object with one hardcoded field per agent — database_summary, web_search_summary, thinking_summary, critic_summary — so the shared memory knew the name of every agent. Worse, the agents knew each other: the web-search agent read the thinking agent’s output, the critic read the thinking agent’s output, and so on. Capabilities were coupled through both the state schema and direct dependencies.

The cost showed up the moment I tried to grow it. Adding a capability meant editing the context model, the orchestrator’s wiring, the new agent, and whatever existing agents needed to consume its output — five-plus files, every time, and removing one agent could silently break another. That is the failure mode that kills a framework you intend to reuse across domains. The v1→v2 rearchitecture was about converting “touches 5+ files” into “zero breaking changes.”

The blackboard: namespaces and self-discovery

The fix was a blackboard. Mind is a generic key-value store over Redis with no hardcoded fields for anyone. Tools write under a namespace they own and read by name without knowing who produced the data:

await mind.write("chunks", chunk_data, namespace="database_search")
await mind.write("chunks", chunk_data, namespace="web_search")

keys = await mind.list_keys()
# ['database_search.chunks', 'web_search.chunks', 'thinking.summary']

Two design choices carry the whole thing. Namespacing (agent.data_type) means two capabilities can both produce “chunks” without colliding, and a tool’s outputs are addressable without a central registry of field names. list_keys() self-discovery means a tool asks the blackboard what exists at runtime instead of assuming a fixed schema — so the orchestrator never has to be told that a new agent is online. The contract is uniform: every tool implements one Tool interface, stays stateless between executions, reads inputs from Mind, writes outputs back, and returns a standardized ToolResult. Add a capability, remove one, reorder them — nothing else has to change. That is the “zero rewiring” claim, and it is structural, not aspirational.

I pushed the plug-in idea one step further: agents are defined in markdown. An AgentMarkdownParser reads a definition file, an AgentFactory builds the instance, and an AgentLoader discovers every *.md in a directory and hot-reloads it — disabled agents are skipped, capabilities are declared in the frontmatter. The base class is deliberately small (plan → execute → validate → communicate) and built on SOLID lines so any agent is substitutable through the same interface. A new capability is a new file the loader finds on its own.

STAGE MAP — the grade · verify · reflect control loop

The orchestrator I call Consciousness runs retrieval as a Reflexion loop, and it is a deliberate synthesis of three ideas: Self-RAG decides whether to retrieve, CRAG decides what to keep, and Reflexion decides whether to go again.

flowchart TD
  Q["Query<br/>execute_task · research"] --> OBS["OBSERVE<br/>read state from blackboard"]
  OBS --> DEC{"DECIDE · Self-RAG<br/>retrieval needed?"}
  DEC -->|"no · enough context"| GEN
  DEC -->|"yes"| RET["RETRIEVE<br/>vector · graph · web tools"]
  RET --> BB[["Mind · agent-agnostic blackboard<br/>namespaced keys · list_keys discovery"]]
  BB --> GRADE{"GRADE · CRAG<br/>per-chunk verdict"}
  GRADE -->|"CORRECT"| KEEP["keep chunk"]
  GRADE -->|"AMBIGUOUS"| VER["VERIFY<br/>web search confirm"]
  GRADE -->|"INCORRECT"| DROP["drop chunk"]
  VER --> KEEP
  KEEP --> GEN["GENERATE<br/>answer from CORRECT plus VERIFIED only"]
  GEN --> REF{"REFLECT<br/>gaps · completeness score"}
  REF -->|"insufficient · under max iters"| OBS
  REF -->|"sufficient"| OUT["Structured output<br/>JSON schema · provenance · cost"]

The grading step is the lever. Every retrieved chunk gets one of three verdicts — CORRECT, AMBIGUOUS, or INCORRECT. Incorrect chunks are dropped before they ever reach the prompt; ambiguous ones are sent for a web-search verification pass rather than trusted or discarded blindly; only correct-and-verified chunks survive into answer generation. That single filter is the difference between a RAG system that confidently repeats whatever the vector store returned and one that refuses to build on context it does not trust — which is where the hallucination-reduction goal comes from. Reflection then closes the loop: it scores completeness, names the gaps, and either exits or feeds those gaps back as the next iteration’s focus, capped by a max-iterations budget so it cannot spin.

Hybrid retrieval and a two-tier cache

Underneath the loop, retrieval fuses Weaviate semantic search with Neo4j graph traversal, so answers draw on similarity and document→chunk relationship structure rather than nearest-neighbor alone. The expensive part of all of this is I/O and LLM calls, so caching is layered at the operation level, not on the final response — embeddings, vector-search results, graph-query results, and LLM answers each cache independently. That detail matters: caching the assembled response broke the moment a request flag changed, whereas caching the underlying operations lets the same costly work be reused while response assembly adapts to flags freely.

The cache itself is two-tier: a Redis hot tier (sub-millisecond) over a PostgreSQL cold tier (single-digit ms, persistent), with automatic backfill on a cold hit. Content is keyed by SHA-256 so the same image or document is processed once and reused everywhere. On repeat-heavy workloads this lands a roughly 200–300× speedup versus cold LLM paths in internal runs — a cost-and-latency optimization, measured on my own traffic rather than a public benchmark.

Where it actually stands

I would rather be precise than impressive. The blackboard rearchitecture, the grade/verify/reflect loop, the markdown-defined plug-in agents, hybrid retrieval, the two-tier cache, the document pipeline, and Langfuse tracing across the whole loop are built and working. Self-RAG’s retrieval-skipping decision exists in a basic form and is the next thing to sharpen. A Neo4j-backed working-memory graph for multi-hop reasoning across iterations is designed but not started, and the dual-access story — exposing the same capabilities over MCP for external assistants and as a direct API for internal backends — is the planned next layer, not a shipped one. The accuracy and hallucination numbers are targets drawn from the CRAG and Self-RAG literature and my own runs, not third-party-verified results.

What I am claiming is the engineering judgment: treating retrieval as a control problem, and designing the state layer so capabilities are genuinely pluggable. The thing I would point at is the diff between v1 and v2 — the same system, re-expressed so that growth stopped being a rewiring tax.

Stack

Python · FastAPI · Corrective RAG · Self-RAG · Reflexion · Blackboard pattern · Weaviate · Neo4j · Redis · PostgreSQL · RQ workers · Langfuse · Pydantic · SearxNG