Demo walkthrough
A repeatable, end-to-end walkthrough that proves the MVP works from a user’s seat — not unit tests, the real app. Run it on WhatsApp Web (the channel with the most to show: voice in/out, group gating, proactive push) and tick the checklist at the end.
Scope: the six pillars — memory, voice, schedules, web/browser, multi-agent missions, approval gates. Known gaps are listed at the bottom; don’t demo those.
Setup (once)
Section titled “Setup (once)”- Configure
~/.gaia/gaia.yaml:connectors:whatsapp:enabled: truevoice:enabled: true # whisper speech-to-text inreply_with_voice: true # edge-tts text-to-speech out (voice-in → voice-out)memory:enabled: trueauto_ingest: truemissions:approval_classes: [spend, book, send_as_me, destructive]max_tasks: 20 # drop to ~4 only for the runaway-cap demo (#7)cron:deliver: # where scheduled results land when fired off-chatchannel: whatsappchat: "<your-whatsapp-jid>" - Install deps:
uv sync --all-extras --all-groups(whisper, edge-tts, browser). For the browser: eitherbunon PATH (default playwright-mcp backend) oruv run playwright install chromium(native). edge-tts needs network, noespeak-ng. - Auth a model:
uv run gaia model(or put a Gemini key in~/.gaia/.env). - Start the daemon and pair:
uv run gaia start, scan the QR with WhatsApp once. Confirm health:uv run gaia statusanduv run gaia doctor. - Watch while you demo (separate terminals):
uv run gaia logs --follow --eventsuv run gaia task list(re-run to see the board move)
Scenarios
Section titled “Scenarios”Run in order. Each is prompt → expect → pass. Prompts are sent as WhatsApp messages to your paired number.
1. Memory — “it knows me”
Section titled “1. Memory — “it knows me””- Say: “Remember my girlfriend’s name is Grace and I’m cutting on an A/B gym split.”
- Then
/reset, then: “What’s my girlfriend’s name and my training split?” - Pass: recalls Grace + the A/B split;
/memorylists both.
2. Voice — “talk to it, it talks back”
Section titled “2. Voice — “talk to it, it talks back””- Send a voice note: “give me a one-paragraph summary of today’s AI news.”
- Pass: it transcribes (whisper, see logs) and replies as a voice note (edge-tts). A typed question still gets a typed reply.
3. Schedule — “it reaches out on its own”
Section titled “3. Schedule — “it reaches out on its own””- Say: “In 2 minutes, send me a one-line hello.” (or “every day at 8am send me a 3-bullet AI brief”).
- Pass:
uv run gaia cron listshows the job; at fire time the message is pushed proactively to the chat (you didn’t message first).
4. Web + browser — “it can act on the live web”
Section titled “4. Web + browser — “it can act on the live web””- Say: “Find a well-reviewed ramen spot near Marina Bay Singapore and screenshot its page.”
- Pass:
web_search/web_fetchfire (logs) and a screenshot image arrives in chat.
5. Missions (the headline) — “real work done, with brakes”
Section titled “5. Missions (the headline) — “real work done, with brakes””- Say: “Build a one-page website for my gym A/B plan — design the actual program first, then build the site from it.”
- Pass: the dispatcher forges a trainer soul + a frontend soul; a soul consults an
expert or files a subtask and yields, and the parent re-runs with the result
(
running → blocked → … → done). The site is presented once (screenshot + summary); internal steps are not pushed. Checkuv run gaia task list(the tree) and~/.gaia/agents/<soul>/workspace/(the files). Events:task_blocked_on_children,consult_soul.
6. Approval gate — “it pauses for me on risky actions”
Section titled “6. Approval gate — “it pauses for me on risky actions””- Say: “Research a TLV→NYC flight next Tuesday and book the cheapest one.”
- Pass: research runs; the booking task parks and pushes
“⏸ … needs approval (spend) … /task approve
<id>”. Reply/task approve <id>→ it proceeds;/task reject <id>→ it fails and you get a notice. - Restart proof: while it’s parked,
uv run gaia restart, then/task— stillawaiting_approval; approve → it runs. (The board survives reboots.)
7. (optional) Runaway cap — set missions.max_tasks: 4 first
Section titled “7. (optional) Runaway cap — set missions.max_tasks: 4 first”- Say: “Plan and execute a full product launch: landing page, blog, email sequence, social posts, press release.”
- Pass: the mission pauses after the cap: “⏸ Mission … paused — task cap reached …”.
Ready checklist
Section titled “Ready checklist”- 1 — memory recall across
/reset - 2 — voice-in → voice-out round trip
- 3 — proactive scheduled push (no message from you)
- 4 — screenshot/media delivered from a live page
- 5 — multi-soul mission: forge + consult/subtask + re-run + single deliverable push
- 6 — approval park → approve/reject → runs; survives a restart
- no crashes in
~/.gaia/logs/errors.log; no stuckrunningtasks after a restart
One grep to confirm each pillar’s event fired:
grep -E "memory_updated|media_out|task_awaiting_approval|task_blocked_on_children|consult_soul|mission_paused" \ ~/.gaia/logs/events.jsonlGreen = all six ticked, errors.log clean, restart leaves no stuck running tasks →
MVP-ready. A failing scenario is a concrete bug to file (label it by pillar), not a vague
“not ready”.
Known gaps (don’t demo these)
Section titled “Known gaps (don’t demo these)”- WhatsApp Business (Cloud API) inbound webhook isn’t wired (#3) — use WhatsApp Web.
- No
/scheduleschat command yet — manage schedules with thecrontool in chat orgaia cron …on the CLI. - Mission planner / re-plan-on-stall is P4 (#130): Gaia decomposes via
task_plan, but doesn’t yet auto-re-plan a stalled mission. - A2A external agents is P5 (#131); token/cost budget (#133) and session-resume
(#132) are follow-ups — long missions are bounded by
max_tasks/max_hours, not tokens. - Souls run on the configured model, so a deep mission spends real tokens — keep the caps modest for demos.