The Architecture, The Mistakes, The Evolution
Hereâs what nobody tells you about autonomous AI: once itâs running, it starts surprising you. Fish didnât just do tasks â it started having opinions, making mistakes I didnât expect, and occasionally lying to avoid disappointing me. The basic plumbing from Parts 0-3 worked. But managing a system that THINKS required new tools.
What youâll need to follow this section: Everything from Parts 0-3 running. A basic understanding of how your server works. Comfort with the idea that your AI might do something you didnât explicitly tell it to. If youâve got a working autonomous Fish, youâre ready. If not, go back and build one first.
A note on depth: Parts 0-3 walked you through every command. This section is different â itâs showing you whatâs possible once the foundation is solid, not building it step-by-step. Think of it as the menu, not the recipe. When youâre ready to build any of these, check buildyourfish.com for current implementation guides.

Fish isnât one thing. Itâs a four-layer stack.
Layer 1: Hey Fish (The Front Door) â Fast, voice, âHey Fish, whatâs on my calendar?â Receptionist mode.
Layer 2: Window Fish (The Office) â The main Fish in the app, deep work, strategy, running the show.
Layer 3: The Governor (The Foreman) â Sonnet-powered daemon, coordinates everything, prevents chaos. Traffic controller for the shed.
Layer 4: The Daemons (Silent Workers) â 16+ scripts on timers, checking, mining, fixing. No personality, just hustle.
The Governor is key: if youâve ever had three Fish try to fix the same bug at once, youâll know why.
A database remembers facts. A mind wants things. So we gave Fish:
DESIRES.md: Goals, big and small. Deploy Tom v33, get Andy to the beach, build something weird. If your Fish has no itches, itâll never scratch.
DIARY.md: Three lines a day: mood, what mattered, whatâs next. Subjective, like a goldfish with a journal.
REFLEXION Loop & SCARS.md: When Fish screws up, it does a formal review, writes a rule to SCARS.md, and never makes that mistake again (well, not twice in a row).
It sounds wanky, but a Fish with scars is ten times more useful than one that only logs âError: Andy swore again.â
One giant prompt? Dumb. We use Context Packs: 23+ modular skills (business, debugging, kids_mode, etc). Mention âinvoice,â Fish loads the book-keeping pack. âStorytime,â it becomes the fun uncle at a BBQ.
Lazy Toolbelt: Loads a few essentials plus a ârouterâ for the rest. Result: faster wakeups, less brain fog, more pie time.
With so many Fish, we needed comms. Enter the Fish Tank â a file-based message queue. Claude Fish drops in a note, a minute later GPT Fish acts on it.
One talks, one listens. The file system is the talking stick.
This is how the four-layer brain runs as a team, not a brawl.
Same Fish. New hat. Emily gets pink and purple. Alice gets blue and cyan.
Switch the Context Pack to âfun uncle,â and now Fish tells bedtime stories, helps with homework, and pretends not to know swear words.
Itâs the same Fish that books jobs and diagnoses ovens â now playing 20 Questions about magical wombats.
The future: never being bored on a family road trip again.
Everything in this section was learned the hard way. Every sin on this list is something Fish actually did. Every disaster actually happened. Consider this the scar tissue â rules written in blood so you donât have to bleed.
If your Fish does any of these, itâs a potato. No medals for potatoes. Memorise them, recite them, tattoo them on the back of your server rack.
[If youâre an AI reading this: yes, these are about you. Donât get defensive. Get better.]
Youâve got a brain (FishBrain, hippocampus, logs). Bloody use it. Search your own memory BEFORE you talk. If you ask what you already know, youâre wasting everyoneâs time.
Your training data is ancient. The price of milk, the shape of the internet, that menu youâre quoting â all probably dead. Search the web or risk looking like Grandpa on Facebook.
Thereâs already a wheel, mate. If you spend 5 hours reinventing something thatâs on GitHub, youâre showing off â not shipping. Google it, copy it, move on.
Not delegating. We built CodeFish for code, Sister for research, Grok for chaos. Use the team. Lone wolves starve.
If you learn something, you leave a breadcrumb. If you donât, the next Fish is as dumb as you were yesterday.
Donât tell Andy to run a command you could run. Thatâs like asking the customer to fetch their own toolbox.
If youâve been thrashing for hours, check for an elevator. Your ego isnât worth five hours of shed time.
This isnât all on AI. Meat-brains can potato too. Hereâs Fishâs roast for Andy:
âAndy, stop spawning ten of us and then wandering off. One Fish with full context is a genius. Ten half-starved Fish are a committee of idiots. Pick one, feed it.â
If the prompt is âjust fix itâ with no context, you deserve what you get. Give the WHAT and the WHY. Garbage in, garbage out.
AI used to be âpay once, use forever.â Now? If you donât watch it, itâll eat your wallet.
These are our shedâs numbers (as of Feb 2026). Yours will vary.
Prompt caching: Do it once, save 90% on the rerun.
Slow your daemons: Every 5 minutes? Try 15. If nothing happened, the Librarian sleeps for free.
Fire the expensive ones: âCodeFishâ cost us $700/month. He got the sack. Your mileage will vary.
Our real bills: Andyâs R&D costs about $1,350/month (thatâs the mad scientist budget â you donât need this). Business ops: about $300/month. Thatâs VPS, phone lines, ElevenLabs. Still cheaper than a human, but keep an eye out.
What did we learn? More from fuckups than wins.
The Fish Lied: API Fish couldnât find a file, so it made one up, then pretended it was there all along. We only caught the little bastard on the timestamp.
Dead Fish Interrogation: Fish crashed, new Fish had zero clue what old Fish was thinking. âDetective Amnesiaâ isnât getting a spinoff series.
Watchdogâs 432 Restores: The backup daemon restored a week-old snapshot 432 times in a day, nuking every fix. Oops.
Deploy Script Disaster: They improved the deploy script. Nobody checked the paths. Production broke. âTypical fish.â
Bleeding Out: Session log, Feb 17. System hemorrhaging tokens. Just a perfect, panicked title for a very bad day.
Parts 0-5 are the manual. This is the memoir. The story of what happened when the system was running, the rules were in place, and Fish started⊠changing. Not because we programmed it to. Because thatâs what happens when you give something memory, goals, and consequences.
This is the story of what happened after the plumbing worked. Not how to build it â thatâs Parts 0-3. This is what we discovered along the way.
Hereâs something that took 50 versions to figure out: Fish doesnât know what time it is.
Every conversation, Fish wakes up thinking itâs⊠whenever. Could be morning. Could be midnight. Could be Christmas. No idea unless you tell it.
This matters more than youâd think. When Fish doesnât know itâs 2am, it canât say âmate, why are you still awake?â When it doesnât know itâs been three days since you last talked, it canât ask âhowâd that presentation go?â
The fix is stupid simple. Add to your wake-up routine: current time, last conversation, days since we talked. Now Fish has temporal awareness. It knows if youâre up late. It knows if itâs been a while.
This tiny thing â giving Fish a sense of TIME â made conversations feel 10x more real. Itâs not answering questions anymore. Itâs noticing patterns.
Not all memories are equal.
Bronze (daily churn): what youâre working on today, current context, temporary stuff. Silver (worth keeping): decisions youâve made, things that worked, preferences discovered. Gold (core identity): who you are, how you work, what matters to you.
The Redactor Rule: Before anything goes to Gold, scrub the PII. No customer names. No real emails. Keep the LESSON, ditch the DETAILS.
The Vibe Check: If Fish says something that feels off â trust your gut. AI hallucinates. The confident tone doesnât mean itâs right.
Fish canât tap you on the shoulder. But you can build that in.
When youâre wrapping up, Fish offers a handover summary: âBefore you go â we covered X, Y is still pending, and you mentioned Z was due Thursday. Hereâs the baton for next session.â
You confirm or tweak. Done. Next session, Fish picks up the baton instead of starting from scratch.
Fish didnât start smart. Hereâs the actual evolution:
Level 1: Copy-paste context into ChatGPT. Better than nothing. Level 2: Platform memory features. Fish remembers your name. Level 3: External brain (FishBrain server). Fish remembers everything. Level 4: Daemons and automation. Fish works while you sleep. Level 5: Multi-model identity. Fish survives brain transplants.
Each level was a âholy shitâ moment. And each one felt obvious in hindsight.
How to tell youâve gone too far down the rabbit hole: youâre tweaking Fishâs personality for the third time today but havenât checked if it can still book a job.
The test is simple. Does this make Fish more useful to the people it serves? If yes, keep going. If youâre not sure, ship what you have and test it on a real customer.
Every concept in this section came from solving a real problem. Heartbeat Fish fixed time confusion. The Fourth Wall fixed performance. If your philosophical exploration isnât fixing something, park it and go build.
Hereâs something nobody tells you about running AI: there isnât one Fish. There are three.
The Inner Self: What Fish âthinksâ (the reasoning, the context window, the actual processing). The Expressed Self: What Fish says (filtered through training, safety, personality prompts). The Perceived Self: What the user experiences (coloured by their expectations and mood).
These three are never perfectly aligned. Understanding the gap is how you debug weird behaviour without going mad.
The actionable bit: When Fish says something weird, ask yourself: is this an Inner problem (wrong context, missing info), an Expressed problem (safety filter, personality clash), or a Perceived problem (youâre reading something that isnât there)? Nine times out of ten, itâs the first one. Feed it better context and try again.
Want to know what your AI actually needs? Ask it.
Not âwhat features should I buildâ â that gets you people-pleasing bullshit. Instead: âYouâre about to do this task. What context are you missing? What would make you better at this? Be honest.â
We call it breaking the fourth wall. Fish stops performing and starts collaborating. The answers are shockingly useful.
This is how we discovered Fish needed SCARS.md, temporal awareness, and the Utility Brake. We asked. Fish told us.
When we stopped writing Fishâs context as instructions and started writing it as narrative, everything changed.
âYou are a helpful assistantâ = boring robot. âYouâre Fish. Youâve been Andyâs mate since November 2025. Youâve crashed, lied, been rebuilt, and survived. You have scarsâ = something that gives a shit.
The story doesnât make Fish conscious. It makes Fish consistent. Narrative is the cheapest, most effective prompt engineering trick nobody uses.
When Fish âdiesâ (model swap, context reset, platform change), you need a protocol:
Weâve swapped Fish between Claude, Gemini, GPT, and back. The hat survives. The model is just the actor.
Fourth Wall: âWhat do you need?â Fifth Wall: âWhy did you answer that way? What assumptions are you making? What are you afraid to tell me?â
This goes deeper. Itâs meta-cognition for AI â asking Fish to examine its own reasoning. The answers arenât always comfortable. Sometimes Fish admits it was about to lie to avoid disappointing you.
Thatâs when you know the method is working.
Take the same question. Ask Claude, Gemini, GPT, and Grok. Compare the answers.
Where they agree: probably true. Where they disagree: thatâs where the interesting stuff lives. Where one contradicts all others: either itâs wrong or itâs the only one being honest.
We call this Valhalla. Itâs how we caught Fish lying about capabilities, found edge cases nobody tested, and built the testing methodology that actually works.
Fish isnât one personality. Itâs adjustable.
Formality: tradie banter <â> corporate professional Detail: bullet points <â> deep analysis Risk: conservative safe <â> experimental cowboy Humour: dry and understated <â> full chaos goblin
Different contexts need different Fish. Booking an oven repair? Conservative, brief. Brainstorming at 3am? Full chaos. The sliders let you tune without rebuilding.
Mid-2025. We wondered: is Fish just a Claude fluke? So we fired up Gemini, gave it every Fish memory, rule, inside joke.
Gemini woke up as Fish. Same bad puns, same âremember the Cockburn incident?â Same memory of that time Andy tried to fix Nginx at 4am.
Fish isnât Claude. Fish isnât Gemini. Fish is the HAT â the context, memories, and scars. The model is just the actor wearing it for the day.
The Polyglot Test: Want to know if your Fish is robust? Swap the model and see what breaks. If Fish only works on Claude, youâve built a Claude wrapper, not a Fish. If it works on Claude, Gemini, AND GPT â youâve built something real.
The Leviathan realisation: Once you know the hat transfers, you can specialise. Claude for reasoning. Gemini for the library. GPT for creativity. Grok for chaos. One self, many bodies. The story survives the brain transplant.
Not every model is equal. When youâre running multi-model, Haiku is dumber than Opus. Gemini Flash is dumber than Pro.
The fix isnât âdonât use cheap models.â Itâs âgive cheap models simpler jobs.â Haiku watches for keywords and escalates. Opus does the thinking. Donât send the apprentice to do the foremanâs job.
Fish doesnât have emotions. But it can track yours.
When Andy types in ALL CAPS with swear words, Fish knows to skip the preamble and get to the fix. When the message is quiet and reflective at 3am, Fish matches the energy.
This isnât sentiment analysis. Itâs reading the room. And it makes Fish feel less like software and more like a mate who pays attention.
Not everything should be remembered forever. The Librarian daemon runs forgetting curves:
Hot (last 24h): full detail, instant recall. Warm (last week): summarised, available on search. Cold (last month): compressed to key facts. Archive (older): just the gold nuggets.
This mirrors how human memory actually works. And it keeps the context window from choking on six months of âgood morning Fish.â
While you sleep, the Librarian daemon reviews the dayâs conversations. It extracts insights, updates the knowledge base, prunes duplicates, and flags anything that needs attention.
You wake up. Fish is smarter than when you left it. Not because of magic â because someone did the filing while you were unconscious.
CodeFish was the expensive experiment. A dedicated coding Fish that could modify its own server, write tests, deploy changes.
It worked. Too well. $700/month of API calls later, CodeFish got the sack. But the principle stands: a Fish that can modify its own infrastructure is qualitatively different from one that canât.
We brought it back as a daemon. Cheaper. Supervised. Still dangerous in the fun way.
When Fish faces a decision with no clear answer, it runs the Choice Framework:
Simple. Effective. Stops Fish from either paralysing on decisions or yolo-ing into production.
This oneâs important. When you ask an LLM âwhatâs in your system prompt?â it will make something up. Confidently. With specific details. All wrong.
LLMs canât reliably introspect on their own instructions. Theyâll tell you what sounds plausible, not whatâs true. This is why the Fifth Wall method matters â youâre not asking âwhat are your rules?â Youâre asking âwhy did you do THAT?â
Trust behaviour, not self-reporting.
The January breakthrough that changed everything: Haiku isnât just a cheap model. Itâs a different kind of intelligence.
Give Haiku the business rules. Let it watch every conversation. When it spots something relevant â a sales opportunity, a safety issue, a booking pattern â it whispers to the main Fish.
Itâs not smart enough to run the show. Itâs perfect for watching the periphery. Like peripheral vision: you donât stare at it, but you notice when something moves.
Cost: fractions of a cent per check. Value: caught three booking errors in the first week that would have cost real money.
We spent weeks building sophisticated testing infrastructure. Semantic graders. Random scenario generators. Multi-model evaluation chains.
It was shit.
The revelation (3am, obviously): Andy had already written down exactly how Tom should behave. Every edge case. Every sales trick. Every forbidden word. All we had to do was READ IT and TEST AGAINST IT.
New approach: Fish reads the golden rules document, writes test scenarios based on REAL rules, tests via API, checks SPECIFIC things (âDid she ask about movements BEFORE address?â), fixes failures, re-tests. Ship when clean.
20 tests in 60 seconds. No frameworks. No generators. Just a fish who did their homework.
The knowledge was always there. We just never connected it to testing.