This site itself. Six days, 31 commits, 853 lines of typed source, Lighthouse 100 on the three reports archived in the repo. The page documents the build — what the agents did, what stayed human.
Context
The portfolio was built with AI agents as the primary writing surface — mostly Claude Code, plus several MCPs. The architecture decisions, the design direction, the debugging calls, and the judgment on gray areas are mine. The page exists because the workflow that produced the four case studies sitting next to it deserves the same documentation as the systems described in them.
Constraints at start:
- Solo build, six-day window. Scope: multi-page site, four client case studies, a curated public samples repo.
- A locked design direction, validated by cross-checking four LLMs (Claude, Gemini, Grok, Mistral) on the same brief, then synthesized into a single document —
CLAUDE.md— that every subsequent agent session reads on entry. - Zero invented metrics. Every number on screen is either measurable or marked
—until its source is wired.
Stack
| Layer | Choice | Why this one |
|---|---|---|
| Framework | Next.js 16 + React 19 + TypeScript strict | Same stack as three of the client projects on this site — knowledge transfer |
| Styling | Tailwind v4, CSS-first via @theme | Design tokens centralized, no tailwind.config.js |
| Motion | Lenis 1.3 driven by the GSAP ticker (single RAF loop) + Motion | Smooth scroll without fighting the React lifecycle |
| Content | Markdown via react-markdown + remark-gfm | Case studies as flat .md files, parsed at build |
| Typography | PP Neue Montreal + PP Neue Montreal Mono (Pangram, self-hosted) + Inter | Same foundry for sans + mono, unified system |
| Hosting | Scaleway VPS + Docker + Traefik | Same infra pattern as the other client projects |
| Build agent | Claude Code (Opus 4.7 / 4.8, xhigh effort) | Primary execution surface |
| MCP servers | GitHub, 21st.dev Magic, shadcn, Chrome DevTools | Project-scoped in .mcp.json |
Method
Plan-first, subagent-parallelized, review-automated, human-gated on gray areas.
Each session starts by reading CLAUDE.md — a 126-line document with 10 hard rules, 10 explicit anti-patterns, the voice ("verbs over adjectives, numbers over claims"), the layout principles, and the dimensions to not optimize for (mobile-first, AAA accessibility, bundle below 100kb — explicitly off the table). The rules are non-negotiable: no light mode, no emojis, no skill bars, no "Currently building" sections.
A typical session unfolds in five phases: scope confirmation, plan (no code yet), parallel subagent dispatch, automated review (Playwright + Lighthouse + pre-commit checks), and manual decision on the gray areas before commit.
Direction-creative work was cross-validated by four LLMs in parallel before locking. Each got the same brief and pushed back differently. The synthesis filtered consensus from contested judgment calls that became human decisions.
Verification
The risk of an AI-orchestrated build is silent AI slop — code that compiles, looks fine, breaks at production load. The mitigations are mechanical:
Every commit follows conventional commits. Commitlint blocks non-conforming pushes.
Every interactive route gets a Lighthouse pass before merge. Performance, best-practices, and SEO scored 100 on every archived report. Accessibility climbed 98 → 100 on the home grid after the review agent caught a heading-order regression and an aria-label content mismatch — both fixed mid-session (8585d63, f6854bb).
Every visual change closes with a Playwright screenshot at 1440px to confirm the rendered DOM matches the spec. The convention caught a doubled // // build heading prefix before it hit main — fixed cleanly in the same session.
Every code path handling external data was read for spread-order issues. One was caught: 2d1653f — react-markdown forwarding node/props could have overwritten rel="noopener noreferrer" on external links.
Six review-caught bugs across 31 commits is roughly one in five. That ratio is the cost of moving fast with agents; the review loop is what keeps it from compounding.
The samples repo dbnathan/nathan-code-samples added one more layer: every file referenced by a case study permalink was extracted via the GitHub MCP at the exact source SHA, scanned against 15 regex patterns plus manual review, and individually approved or redacted before push. Of 24 files, 23 passed; one had a sample-CSV email scrubbed at line 757, with line count preserved so case-study line fragments still resolve.
Gray-area calls
Decisions the agents surfaced and a human chose. The interesting ones:
- Version displayed in the status bar. Claude Code defaulted to
nathan@v0.1.0reading frompackage.json. The pivot was defining aNATHAN_VERSIONconstant —5.2.0— anchored in a future/changelog, on the basis that "every metric must be real" (Rule 7) means real to its true source, not real to the first available proxy. - Permalink strategy for private client repos. Case study permalinks pointed to repos that 404 for the public, which would have collapsed the verifiable-architecture thesis. The chosen path: build a public samples repo curated from the privates, with security scrubs.
- Inclusion of BackStageRate samples. The eight Python files are the strongest AI-Product-Engineer evidence in the portfolio. They are also the most contractually ambiguous. Decision: include them in the private samples repo immediately; condition the public flip on explicit client sign-off.
- Tone of the AI-workflow disclosure. Three levels were considered: soft (mention only on this page), mid (a
// buildparagraph on each case study), hard (full pivot, manifesto). Mid was chosen as the only level honest without asking the reader to suspend disbelief. - The doubled
// // buildheading. The implementation plan wrote## // buildin the markdown source, missing that the case-study renderer auto-prepends//to every h2. Caught at the Playwright screenshot stage. Fix landed as a one-line markdown change in all four files, byte-identical via md5 verification, before any commit hitmain.
Outcomes
| Metric | Value |
|---|---|
| Project span | 6 days (2026-05-26 → 2026-06-01) |
| Total commits | 31 |
| Conventional breakdown | 19 feat · 4 fix · 4 chore · 2 refactor · 2 docs |
| Active development days | 3 — 26 commits on a single day |
| TS / TSX / CSS source | 853 LOC |
| Tracked files | 73 |
| Case studies | 4, totalling 5,313 words and 45 GitHub permalinks |
| Pages built | 14 static + 3 dynamic routes |
| Lighthouse perf / a11y / best-practices / SEO | 100 / 100 / 100 / 100 across three archived reports |
| FCP / LCP / CLS | 0.2-0.3s / 0.6-0.8s / 0.000 |
| Code samples extracted, scrubbed, pushed | 24 files · 96% clean first pass · 1 redacted, line count preserved |
Retrospective
Worked: the CLAUDE.md document. Locking ten hard rules and ten anti-patterns before any code was written meant subsequent sessions refused entire categories of generic AI output — purple gradients, skill bars, "Currently building" sections, emoji status indicators — without negotiation. The two times the rules tightened mid-project, they tightened by adding rigor.
Worked: parallel subagent review caught real bugs. Six in 31 is not nothing, and the cadence held.
Would do differently: the four .claude/agents/*.md files were created as stubs and never populated with their actual role descriptions. The behavior worked anyway because CLAUDE.md carries the agent expectations in prose, but the formal files should match. Discipline I didn't budget for.
Would do differently: bundle sizes weren't emitted by the Next 16 Turbopack build table. A single @next/bundle-analyzer run would have replaced an [unmeasurable] with a real number. Cheap fix, didn't make the budget.
Would do differently: only three Lighthouse reports were archived. A per-page audit run would make the "Lighthouse 100 reproducible" claim verifiable on every route, not three samples. Same tool, more rigorous output.
The site is not done — /repl, /status, /changelog, /incidents, /availability, and /llms.txt are sketched in the CLAUDE.md route table and not yet built. It is shippable in the sense that what's documented and styled is real and measurable.