A visual map of the stack: edge static site, Koyeb inference, Infisical secrets, and the sft-wagmi training loop that feeds Wagmi CPU and GPU tiers.
This page is the short stack map with diagrams. For the full story (frugal ops, sovereignty, GDPR, AI Act, economics), see The Web Site is the Demo.
Why this shape? One TypeScript codebase stays portable. Public pages are cheap at the edge. Inference is tiered: a small CPU model for everyone, a large GPU model when authentication makes it worth the cost. Models are trained in our sft-wagmi repo, baked into Docker images, and configured through Infisical so dev, staging, and prod never diverge in secret.
flowchart TB
visitor["Visitor or signed-in user"]
subgraph edge ["Edge and runtime"]
cf["Cloudflare Pages static and API proxy"]
web["Koyeb web: Next.js plus CPU llama-server"]
gpu_svc["Koyeb GPU: wagmi-sft-14b scale-to-zero"]
end
subgraph app ["Application"]
ui["React 19, Assistant UI, next-intl"]
api["API routes: chat, auth, health"]
end
subgraph data ["Data and configuration"]
inf["Infisical secrets"]
pg[("PostgreSQL via Drizzle")]
sb["Supabase email OTP"]
end
subgraph factory ["Model factory sft-wagmi"]
dexm_ds["dexm-one-page JSONL export"]
hf_space["HF Space pipeline.py"]
hub["Private GGUF on Hugging Face"]
hub_dh["Docker Hub images"]
end
visitor --> cf
cf -->|"/api proxy"| web
visitor --> web
ui --> api
api --> pg
api --> sb
inf -.-> web
inf -.-> gpu_svc
api -->|anonymous| web
api -->|authenticated| gpu_svc
dexm_ds --> hf_space
hf_space --> hub
hub --> hub_dh
hub_dh --> web
hub_dh --> gpu_svc
| Merit | What it buys us |
|---|---|
| Frugal | Static marketing on Cloudflare; GPU billed only when auth traffic wakes it |
| Sovereign | Our code, our weights, OpenAI-compatible endpoints we can swap |
| Inspectable | Blog, datasets, and deploy workflows live in public repos |
| Safe defaults | Zod validation, rate limits, BM25 RAG on the small tier, release and AI Act gates |
Chat picks an inference tier from the Supabase session, not from a hidden client flag.
flowchart TD
start["POST /api/chat"]
session{"Valid Supabase session?"}
anon["Anonymous tier"]
auth["Authenticated tier"]
rag["BM25 RAG on wagmi-skills and ai.txt"]
cpu["CPU wagmi-sft 1.5B on loopback 127.0.0.1"]
cpu_first["CPU-first reply while GPU wakes"]
wake["GET LLM_GPU_WAKE_URL or mesh probe"]
gpu_ready{"GPU model available?"}
gpu["GPU wagmi-sft-14b premium tier"]
start --> session
session -->|no| anon --> rag --> cpu
session -->|yes| auth
auth --> cpu_first --> cpu
auth --> wake --> gpu_ready
gpu_ready -->|yes| gpu
gpu_ready -->|no| cpu
Anonymous: always CPU (wagmi-sft inside the web container). Authenticated: CPU-first by default (CHAT_AUTH_CPU_FIRST), then upgrade to GPU when /v1/models shows the auth model ready. UI labels make the active tier visible.
Weights are not edited by hand. Content and guardrails become JSONL in dexm-one-page, then sft-wagmi trains, evaluates, and exports GGUF for Docker bakes.
flowchart LR
content["Blog, wagmi-skills, ai.txt, Obsidian notes"]
gen["generate-wagmi-sft-dataset.ts"]
jsonl["datasets/wagmi-sft train and eval JSONL"]
sync["pnpm dataset:wagmi sync or refresh"]
space["sft-wagmi HF Space Gradio"]
pipeline["pipeline.py preflight train eval redteam export"]
recurring["Cursor SDK recurring daily and weekly"]
gguf["GGUF weights on Hub"]
docker["Docker build bake into web and GPU images"]
koyeb["Koyeb deploy"]
content --> gen --> jsonl --> sync --> space
recurring -.->|orchestrates| space
space --> pipeline --> gguf --> docker --> koyeb
sft-wagmi (README) runs Unsloth plus TRL on two profiles: small (1.5B, anonymous tier) and auth (14B, tool-capable tier). Steps include eval_sft, eval_sft_rag, redteam, and export-merged before GGUF conversion. Recurring automation (automation/cursor-sdk, scripts/hf/recurring_runner.py) schedules light daily qwen/small runs and heavier weekly qwen/auth jobs on Hugging Face infrastructure, with pass or fail gates before a build is release-worthy.
On the site repo:
pnpm run dataset:wagmi:refresh # regenerate and sync JSONL into sft-wagmi/data/
| Layer | Choice |
|---|---|
| App | Next.js 16, React 19, TypeScript strict, Tailwind, Radix |
| Chat | Vercel AI SDK, Assistant UI, local-rag.ts BM25 grounding |
| Secrets | Infisical EU (dev / staging / prod) |
| Data | PostgreSQL, Drizzle, Supabase OTP auth |
| Content | Content Collections, Markdown in content/blog/ |
| Images | Docker Hub (jeanbapt/deal-ex-machina-web, GPU images) |
| Staging | Koyeb web plus llama-gpu, scale-to-zero where configured |
| Production front | Cloudflare Pages static plus Functions proxy to Koyeb |
| Quality | Biome, Vitest, Playwright, Lighthouse CI, AI Act gate in deploy workflow |