Deal ex Machina: the technical stack

This page is the short stack map with diagrams. For the full story (frugal ops, sovereignty, GDPR, AI Act, economics), see The Web Site is the Demo.

Why this shape? One TypeScript codebase stays portable. Public pages are cheap at the edge. Inference is tiered: a small CPU model for everyone, a large GPU model when authentication makes it worth the cost. Models are trained in our sft-wagmi repo, baked into Docker images, and configured through Infisical so dev, staging, and prod never diverge in secret.

Stack overview

flowchart TB
  visitor["Visitor or signed-in user"]

  subgraph edge ["Edge and runtime"]
    cf["Cloudflare Pages static and API proxy"]
    web["Koyeb web: Next.js plus CPU llama-server"]
    gpu_svc["Koyeb GPU: wagmi-sft-14b scale-to-zero"]
  end

  subgraph app ["Application"]
    ui["React 19, Assistant UI, next-intl"]
    api["API routes: chat, auth, health"]
  end

  subgraph data ["Data and configuration"]
    inf["Infisical secrets"]
    pg[("PostgreSQL via Drizzle")]
    sb["Supabase email OTP"]
  end

  subgraph factory ["Model factory sft-wagmi"]
    dexm_ds["dexm-one-page JSONL export"]
    hf_space["HF Space pipeline.py"]
    hub["Private GGUF on Hugging Face"]
    hub_dh["Docker Hub images"]
  end

  visitor --> cf
  cf -->|"/api proxy"| web
  visitor --> web
  ui --> api
  api --> pg
  api --> sb
  inf -.-> web
  inf -.-> gpu_svc
  api -->|anonymous| web
  api -->|authenticated| gpu_svc
  dexm_ds --> hf_space
  hf_space --> hub
  hub --> hub_dh
  hub_dh --> web
  hub_dh --> gpu_svc

Merit	What it buys us
Frugal	Static marketing on Cloudflare; GPU billed only when auth traffic wakes it
Sovereign	Our code, our weights, OpenAI-compatible endpoints we can swap
Inspectable	Blog, datasets, and deploy workflows live in public repos
Safe defaults	Zod validation, rate limits, BM25 RAG on the small tier, release and AI Act gates

Wagmi routing: sessions, CPU, and GPU

Chat picks an inference tier from the Supabase session, not from a hidden client flag.

flowchart TD
  start["POST /api/chat"]
  session{"Valid Supabase session?"}
  anon["Anonymous tier"]
  auth["Authenticated tier"]
  rag["BM25 RAG on wagmi-skills and ai.txt"]
  cpu["CPU wagmi-sft 1.5B on loopback 127.0.0.1"]
  cpu_first["CPU-first reply while GPU wakes"]
  wake["GET LLM_GPU_WAKE_URL or mesh probe"]
  gpu_ready{"GPU model available?"}
  gpu["GPU wagmi-sft-14b premium tier"]

  start --> session
  session -->|no| anon --> rag --> cpu
  session -->|yes| auth
  auth --> cpu_first --> cpu
  auth --> wake --> gpu_ready
  gpu_ready -->|yes| gpu
  gpu_ready -->|no| cpu

Anonymous: always CPU (wagmi-sft inside the web container). Authenticated: CPU-first by default (CHAT_AUTH_CPU_FIRST), then upgrade to GPU when /v1/models shows the auth model ready. UI labels make the active tier visible.

Training loop: dexm-one-page to production

Weights are not edited by hand. Content and guardrails become JSONL in dexm-one-page, then sft-wagmi trains, evaluates, and exports GGUF for Docker bakes.

flowchart LR
  content["Blog, wagmi-skills, ai.txt, Obsidian notes"]
  gen["generate-wagmi-sft-dataset.ts"]
  jsonl["datasets/wagmi-sft train and eval JSONL"]
  sync["pnpm dataset:wagmi sync or refresh"]
  space["sft-wagmi HF Space Gradio"]
  pipeline["pipeline.py preflight train eval redteam export"]
  recurring["Cursor SDK recurring daily and weekly"]
  gguf["GGUF weights on Hub"]
  docker["Docker build bake into web and GPU images"]
  koyeb["Koyeb deploy"]

  content --> gen --> jsonl --> sync --> space
  recurring -.->|orchestrates| space
  space --> pipeline --> gguf --> docker --> koyeb

sft-wagmi (README) runs Unsloth plus TRL on two profiles: small (1.5B, anonymous tier) and auth (14B, tool-capable tier). Steps include eval_sft, eval_sft_rag, redteam, and export-merged before GGUF conversion. Recurring automation (automation/cursor-sdk, scripts/hf/recurring_runner.py) schedules light daily qwen/small runs and heavier weekly qwen/auth jobs on Hugging Face infrastructure, with pass or fail gates before a build is release-worthy.

On the site repo:

pnpm run dataset:wagmi:refresh   # regenerate and sync JSONL into sft-wagmi/data/

Core technologies (quick index)

Layer	Choice
App	Next.js 16, React 19, TypeScript strict, Tailwind, Radix
Chat	Vercel AI SDK, Assistant UI, `local-rag.ts` BM25 grounding
Secrets	Infisical EU (`dev` / `staging` / `prod`)
Data	PostgreSQL, Drizzle, Supabase OTP auth
Content	Content Collections, Markdown in `content/blog/`
Images	Docker Hub (`jeanbapt/deal-ex-machina-web`, GPU images)
Staging	Koyeb `web` plus `llama-gpu`, scale-to-zero where configured
Production front	Cloudflare Pages static plus Functions proxy to Koyeb
Quality	Biome, Vitest, Playwright, Lighthouse CI, AI Act gate in deploy workflow

Stack overview

Wagmi routing: sessions, CPU, and GPU

Training loop: dexm-one-page to production

Core technologies (quick index)

Related reading