Back to Blog
tech/

Deal ex Machina: the technical stack

A visual map of the stack: edge static site, Koyeb inference, Infisical secrets, and the sft-wagmi training loop that feeds Wagmi CPU and GPU tiers.

This page is the short stack map with diagrams. For the full story (frugal ops, sovereignty, GDPR, AI Act, economics), see The Web Site is the Demo.

Why this shape? One TypeScript codebase stays portable. Public pages are cheap at the edge. Inference is tiered: a small CPU model for everyone, a large GPU model when authentication makes it worth the cost. Models are trained in our sft-wagmi repo, baked into Docker images, and configured through Infisical so dev, staging, and prod never diverge in secret.


Stack overview

flowchart TB
  visitor["Visitor or signed-in user"]

  subgraph edge ["Edge and runtime"]
    cf["Cloudflare Pages static and API proxy"]
    web["Koyeb web: Next.js plus CPU llama-server"]
    gpu_svc["Koyeb GPU: wagmi-sft-14b scale-to-zero"]
  end

  subgraph app ["Application"]
    ui["React 19, Assistant UI, next-intl"]
    api["API routes: chat, auth, health"]
  end

  subgraph data ["Data and configuration"]
    inf["Infisical secrets"]
    pg[("PostgreSQL via Drizzle")]
    sb["Supabase email OTP"]
  end

  subgraph factory ["Model factory sft-wagmi"]
    dexm_ds["dexm-one-page JSONL export"]
    hf_space["HF Space pipeline.py"]
    hub["Private GGUF on Hugging Face"]
    hub_dh["Docker Hub images"]
  end

  visitor --> cf
  cf -->|"/api proxy"| web
  visitor --> web
  ui --> api
  api --> pg
  api --> sb
  inf -.-> web
  inf -.-> gpu_svc
  api -->|anonymous| web
  api -->|authenticated| gpu_svc
  dexm_ds --> hf_space
  hf_space --> hub
  hub --> hub_dh
  hub_dh --> web
  hub_dh --> gpu_svc
MeritWhat it buys us
FrugalStatic marketing on Cloudflare; GPU billed only when auth traffic wakes it
SovereignOur code, our weights, OpenAI-compatible endpoints we can swap
InspectableBlog, datasets, and deploy workflows live in public repos
Safe defaultsZod validation, rate limits, BM25 RAG on the small tier, release and AI Act gates

Wagmi routing: sessions, CPU, and GPU

Chat picks an inference tier from the Supabase session, not from a hidden client flag.

flowchart TD
  start["POST /api/chat"]
  session{"Valid Supabase session?"}
  anon["Anonymous tier"]
  auth["Authenticated tier"]
  rag["BM25 RAG on wagmi-skills and ai.txt"]
  cpu["CPU wagmi-sft 1.5B on loopback 127.0.0.1"]
  cpu_first["CPU-first reply while GPU wakes"]
  wake["GET LLM_GPU_WAKE_URL or mesh probe"]
  gpu_ready{"GPU model available?"}
  gpu["GPU wagmi-sft-14b premium tier"]

  start --> session
  session -->|no| anon --> rag --> cpu
  session -->|yes| auth
  auth --> cpu_first --> cpu
  auth --> wake --> gpu_ready
  gpu_ready -->|yes| gpu
  gpu_ready -->|no| cpu

Anonymous: always CPU (wagmi-sft inside the web container). Authenticated: CPU-first by default (CHAT_AUTH_CPU_FIRST), then upgrade to GPU when /v1/models shows the auth model ready. UI labels make the active tier visible.


Training loop: dexm-one-page to production

Weights are not edited by hand. Content and guardrails become JSONL in dexm-one-page, then sft-wagmi trains, evaluates, and exports GGUF for Docker bakes.

flowchart LR
  content["Blog, wagmi-skills, ai.txt, Obsidian notes"]
  gen["generate-wagmi-sft-dataset.ts"]
  jsonl["datasets/wagmi-sft train and eval JSONL"]
  sync["pnpm dataset:wagmi sync or refresh"]
  space["sft-wagmi HF Space Gradio"]
  pipeline["pipeline.py preflight train eval redteam export"]
  recurring["Cursor SDK recurring daily and weekly"]
  gguf["GGUF weights on Hub"]
  docker["Docker build bake into web and GPU images"]
  koyeb["Koyeb deploy"]

  content --> gen --> jsonl --> sync --> space
  recurring -.->|orchestrates| space
  space --> pipeline --> gguf --> docker --> koyeb

sft-wagmi (README) runs Unsloth plus TRL on two profiles: small (1.5B, anonymous tier) and auth (14B, tool-capable tier). Steps include eval_sft, eval_sft_rag, redteam, and export-merged before GGUF conversion. Recurring automation (automation/cursor-sdk, scripts/hf/recurring_runner.py) schedules light daily qwen/small runs and heavier weekly qwen/auth jobs on Hugging Face infrastructure, with pass or fail gates before a build is release-worthy.

On the site repo:

pnpm run dataset:wagmi:refresh   # regenerate and sync JSONL into sft-wagmi/data/

Core technologies (quick index)

LayerChoice
AppNext.js 16, React 19, TypeScript strict, Tailwind, Radix
ChatVercel AI SDK, Assistant UI, local-rag.ts BM25 grounding
SecretsInfisical EU (dev / staging / prod)
DataPostgreSQL, Drizzle, Supabase OTP auth
ContentContent Collections, Markdown in content/blog/
ImagesDocker Hub (jeanbapt/deal-ex-machina-web, GPU images)
StagingKoyeb web plus llama-gpu, scale-to-zero where configured
Production frontCloudflare Pages static plus Functions proxy to Koyeb
QualityBiome, Vitest, Playwright, Lighthouse CI, AI Act gate in deploy workflow

Related reading