← BACK TO LEVEL SELECT

🤖 Agentic AI · ★★★ FEATURED

Idolwild — An Ant-Farm for LLM Minds

A browser ant-farm where a deterministic survival sim runs the world for free and an LLM is rationed down to one typed intent per beat — agents invent gods to explain events the player secretly caused.

Overview

Idolwild is a browser ant-farm: a small band of stone-age agents survive a procedural island while you play the unseen god, raining food or smiting with lightning. The twist that makes it more than “AI Town in 3D” is that the agents never know it was you. You rain a few berries, say nothing, and seconds later an agent looks up and decides “the spirits must favour me.” A bystander watching the screen says “wait — it reacted to me, and made up a reason I never gave it.” That sentence is the entire product.

This is the piece I’d point to for “architect, not a model-wrapper,” because the LLM is the smallest, most-rationed component in the whole system. A deterministic engine simulates bodies, needs, pathfinding, ecology, rumor spread and even the formation of a religion — all in plain seedable code. The model is invited in only to pick one typed intent per slow beat and to author a line of speech. Everything you actually see on screen — movement, attribution, myth — is deterministic. Honestly framed: this is an early prototype — the belief, rumor, memory and god-power layers already ship as real, tested code, with the larger world scoped as later work.

The core bet: a three-layer brain with one rule

The single architectural rule everything else serves:

The LLM decides why and what. Cheap deterministic code decides how.

LayerWhat it isCadenceCost
Body · Simposition, needs decay, pathfinding, fire, monsters, ecologyevery tickfree
Plannerturns an intent into executable steps · steering · target selectionevery tickfree
Mind · LLMpicks ONE typed intent + a thought, per beatslow beat · rationed$

So “I’m worried about the cold — I’ll build a fire” is one LLM call returning a typed enum; the walking, the wood-gathering, the kindling, the warmth math are all free code over many ticks. The model is the soul; the sim is the body. This grounds object interaction, keeps movement continuous, and bounds cost — all from one rule.

flowchart LR
  TICK["Deterministic tick · 10 Hz<br/>needs decay · physics · ecology · fire"] --> PERC["Perceive · code<br/>build a Perception · nearby · body · memories"]
  PERC --> RECALL["Recall · code<br/>top-K memories · recency · importance · relevance"]
  RECALL --> DECIDE["Decide · the ONLY LLM call<br/>returns one typed intent + thought"]
  DECIDE --> PLAN["Plan · code<br/>resolve label to target · steering"]
  PLAN --> ACT["Act · code<br/>move · gather · build · write a memory"]
  ACT --> TICK
  GOD["God-hand · player"] -. "perceivable event · no god label" .-> PERC

The contract is enforced in the type system, not just by prompt etiquette. The shared Decision is { action: enum, thought, speech?, moveTo? } where moveTo is a label — an agent’s name, a place, or a compass word like north — and the prompt is explicit: “NEVER emit coordinates or object ids.” The engine resolves the label to a point and performs the walk. The moment a design impulse wants the LLM to emit a coordinate or pick an object by id, that’s the signal it belongs in code, and the fence holds.

What’s actually deployed: a deterministic sim that borrows a brain

The live site doesn’t run a server-side agent loop — it runs the same unified Sim in the browser. With no decision function injected, the Sim runs a fully deterministic fallback brain and the world is still alive: agents get hungry, gather, flee monsters, build fires, survive the night. Flip “Live AI” on and the browser is allowed to let a few agents per beat borrow a real model through a serverless /api/decide endpoint. Any failure — a timeout, a rate-limit, a missing key — returns null and the agent silently drops back to deterministic. The world never breaks; it just gets less articulate.

Which agents get the model is salience-weighted, not round-robin. Agents in a charged moment — just witnessed a wonder, afraid in the dark, in combat, near death, feuding — are forwarded first; the rest fill the remainder. So the scarce, expensive resource is spent exactly where drama is, and idle agents think for free. With 12 of 20 agents forwarded per beat the whole village cycles through the model roughly every 1.7 beats, which reads as “the village woke up” the instant you toggle it on.

A sibling endpoint, /api/converse, authors a whole bounded conversation transcript in one call — given two personas, a reason, and any rumor in play, it returns alternating lines plus a one-line gist. It owns its own rate-limit and spend budget so a busy dialogue beat can never starve the per-agent decision quota.

The payoff: emergent myth as a deterministic pipeline

Here’s where the “LLM is the smallest piece” thesis pays off hardest. A band inventing a religion sounds like the most LLM-heavy part of the system. It is the opposite: myth formation is a five-stage deterministic pipeline, and the only LLM involvement is optionally rewording the final belief line. The attribution — the leap from “berries appeared” to “a kind spirit favours us” — happens in pure, seedable code.

flowchart LR
  EVT["God-event lands<br/>max-importance · no god label · in memory"] --> CLS["1 · Attribute · code<br/>keyword theme classifier<br/>bounty · wrath · cursed · oracle · beast · famine"]
  CLS --> BEL["2 · Belief math · code<br/>confidence decay · capped reinforce · floor"]
  BEL --> RUM["3 · Rumor spread · code<br/>trust-gated accept · distort · reject"]
  RUM --> CRY["4 · Crystallize · code<br/>≥40% of band share theme above floor"]
  CRY --> MYTH["5 · Name the deity · code<br/>seeded grammar · noun + impossible epithet"]
  MYTH -. "optional · LLM authors only the SURFACED line" .-> VOICE["A believer voices it<br/>thought · speech · rumor"]

Each stage is a real module in the repo:

  • Attribution in code. classifyTheme() maps a memory line to one of six closed themes by keyword table, first-hit-wins in a fixed priority order — total, deterministic, no randomness. Four themes are god-power “wonder” themes (bounty, wrath, cursed-ground, oracle); two are dread themes the world itself authors from real starvation or a real mauling (famine, beast-omen). The agent can’t explain a max-importance, low-relevance memory from its history, so it reaches for the theme its persona and mood bias it toward — a code analog of self-serving bias and the fundamental-attribution error.
  • Belief math. A belief is a confidence float in [0,1] with a half-life of ~9 beats (decay 0.926/beat), strengthened by reinforcing memories but capped per beat so a flood of repeats can’t spike it, and dropped below a floor. One-offs fade; repeated, reinforced “whys” harden.
  • Rumor spread. A rumor is a memory with provenance — a source chain, a trust float, a distortion count. The accept/distort/reject decision is made in code purely from the listener’s directed trust in the speaker (rumorGate); a friend believes you, a low-trust outsider’s claim gets garbled or rejected. The LLM, when present, only paraphrases the line — it never decides whether a rumor is believed.
  • Crystallization. A cheap detector scans every few beats; when ≥40% of living agents independently hold the same theme above a confidence floor, the belief crystallizes into a named deity. mythName() generates that name deterministically from theme + worldSeed via a small grammar (a normal-noun root + one impossible epithet — “Vurok the Open Hand”, “the Skarn of the Screaming Sky”), so the same seed always names the same god. Same seed in → identical snapshot out, religion included.

The whole chain reuses the memory and structured-output stack already paid for by the per-beat decision; it adds zero new per-beat LLM calls. That’s the architectural flex: the most emotionally loaded behavior in the system is the cheapest.

Cost and safety as a first-class layer

A public agent demo on the owner’s API key is a liability before it’s a feature — the reference horror is a runaway public agent that billed seven figures in a month. So the guardrails are not an afterthought, they’re in the code from commit one:

  • Per-instance rate limit + per-process spend ceiling, on the module scope of each serverless function so they survive across requests on a warm instance. /api/decide caps at 120 req/min and an ~$8 estimated-spend ceiling; /api/converse keeps its own separate, smaller budget so the two can never cannibalize each other.
  • BYOK split. Bring-your-own-key callers are metered on a separate, looser counter and skip the owner’s spend ceiling — they bill their own key, so a stranger on the public key can never consume a BYOK caller’s headroom, and vice-versa.
  • Cheap tier by default. Routine beats run a genuinely cheap model (gpt-4o-mini, or Haiku as a config swap), with a documented per-call cost estimate (~400 in / 60 out tokens) driving the ceiling math. Flagship is reserved for dramatic beats.
  • Cacheable prompts. The system prompt is kept byte-stable — persona is the only interpolation, all volatile per-beat data lives in the user message — so the provider’s prompt cache can hit. The conversation endpoint freezes its system prompt entirely and parses a small JSON out of generateText specifically to dodge a tool-calling cache-invalidation quirk.
  • Code-side output screening. Even authored dialogue is validated deterministically before it reaches the screen — no digits (reads as coordinates), no names outside the speaking pair, length caps, no mention of gods or “outside controllers.” Any violation rejects the whole transcript and the Sim falls back to templates. The model is never trusted to be in-character; it’s checked for it.

Provider-agnostic by construction

Every model call is a provider-agnostic structured output: Vercel AI SDK + Zod schema, retry-on-invalid, with @ai-sdk/anthropic and @ai-sdk/openai behind one resolveModel(). Swapping Claude ↔ OpenAI is a config change, not a rewrite — and because the contract is “one typed intent in a Zod schema,” you can genuinely run different agents on different providers in the same world. The decision shape is the interface; the model is a swappable backend.

Honest status

This is an early prototype, and I scope it that way: the bigger ambitions — richer religion crystallization, Girardian scapegoating, an autonomous Director, full ecology — are deliberately later work. But the deployed prototype is real and runs unattended: a procedural island, twenty agents surviving with needs · fire · monsters · taming · building, a deterministic brain that’s watchable with zero API spend, an optional rationed LLM layer, and the belief/rumor/myth modules built and unit-tested. The discipline is the same one I hold everywhere — keep the deterministic core honest and reproducible, and let the LLM be the smallest, most-rationed, most-replaceable part of the machine.

Stack

TypeScript · three.js · simplex-noise · Vite · Vercel AI SDK · Zod · @ai-sdk/openai + @ai-sdk/anthropic · structured-output (one typed intent per beat) · deterministic seedable Sim · memory stream with recency·importance·relevance retrieval · trust-gated rumor model · seeded myth-name grammar · per-instance rate-limit + spend-ceiling guardrails · Vercel serverless functions.