BackStageRate

Private client repository. Permalinks below point to security-scrubbed excerpts in dbnathan/nathan-code-samples.

A B2B SaaS for employee engagement surveys with an AI analytics pipeline on top. Direct competitor positioning to OfficeVibe / Supermood. Three-container Docker architecture (React frontend + Node API + Python AI service) running on a 32GB Scaleway VPS. Built over five focused weeks, two production tags shipped, and a complete pre-prod audit that surfaced four critical issues nobody had noticed.

Context

The product is a survey engine that asks employees questions on a recurring cadence, computes engagement scores, and runs an AI pipeline over the verbatim answers. Constraints:

MongoDB Atlas M0 free tier. 512MB ceiling. The system stress-tested at 537 employees, 22 surveys, 72,786 responses, and 14,322 pre-aggregations still under quota — because all dashboard data is materialized at write time, not computed on read.
GDPR hard rules. No manager-level alert below 5 respondents. No cross-dimension analysis below 3. Every verbatim is regex-anonymized (email, French phone numbers) before it ever reaches Claude.
Solo dev, no CI/CD. Manual rsync + docker compose build. No test automation (Jest/pytest both absent). All quality control runs through manual Playwright passes and a pre-release audit document.
Inherited migration. Came from a Vercel + Supabase + AWS SDK stack that had been ripped out before the first commit in this repo. The visible git history starts at the V2 production build — five weeks of prior work are squashed into the initial commit.

Stack

Layer	Choice	Why this one
Frontend	React 18 + CRA + Tailwind + DaisyUI + Recharts	Inherited; kept for delivery speed
Backend	Node 20 + Express 4 + Mongoose 8	Pragmatic; already running in V1
Database	MongoDB Atlas M0 (free)	Cost zero until 512MB — pre-aggregations compensate
Auth	JWT (`jose`) + bcrypt, 24h expiry	Reduced from 7d during the hardening pass
AI service	Python 3.11 + FastAPI + `anthropic>=0.49` (Claude Sonnet 4) + scikit-learn 1.3	Isolated runtime to contain LLM costs and latency
Payments	Stripe 17 (Basic / Pro, test mode pending KYC)	Standard B2B SaaS
Hosting	Scaleway VPS (32GB / 6 cores) + Docker + nginx + Let's Encrypt	Fixed cost, full control
Monitoring	`monitor-health.sh` cron + Docker healthchecks	Minimum viable solo — no APM, no Sentry
Email	Brevo (SDK legacy)	Inherited — sender unverified, see retrospective
Exports	pdfkit (chose over puppeteer — Alpine incompatibility)	One of the cleaner forced pivots

The AI pipeline

This is the part of the project that's hardest to do well and easiest to do badly. The pipeline runs once per completed survey, gated behind a "Pro plan" feature flag.

Seven steps, each isolated for failure containment, all stored atomically:

Collect verbatims from the database by sentSurveyId.
Anonymize with two regex passes — emails and French phone numbers — in 12 lines of Python. The shortest, most-important file in the codebase.
Sentiment batch. Chunks of 20 verbatims numbered in a single prompt, one JSON response per chunk. Cuts the per-verbatim API cost by ~20x. Gap analysis catches verbatims whose sentiment contradicts the quantitative rating (sentiment_service.py).
Theme clustering via a single prompt that clusters, summarizes, and recommends in one pass (theme_service.py).
Drivers analysis. RandomForestRegressor(n_estimators=100, max_depth=5) over the per-employee scores, with X.corrwith(y) to recover the sign of each correlation — feature importance alone tells you the magnitude but not the direction (driver_service.py).
Risk + weak signal detection from the previous stages.
Atomic upsert into ai_analyses, indexed by sentSurveyId so reruns are idempotent.

The whole pipeline is orchestrated in one file, commented step by step. If something fails in the middle, the upsert never happens — the next call retries from the top cleanly.

The cost of running an LLM in production

I instrumented every Claude call with an action label and tracked the spend across 212 real API calls. The numbers (in EUR, model claude-sonnet-4-20250514):

Action	Per-call cost
Full AI analysis (12 sentiment batches + 1 themes + 1 drivers)	€0.234
PDF report generation	€0.011
Chatbot message (simple → complex)	€0.003 → €0.008
Recommendation generation	€0.018

Projected monthly cost for an intensive client (4 analyses + 2 reports + 300 chats + 4 reco runs): €2.41/month. Total spend over the entire measurement window: €2.86.

That number alone has changed how I think about LLM products. The narrative around AI cost is dominated by the worst-case prices of unbatched, full-context, top-tier API calls. Real applied costs, with batching and a sentiment-vs-themes split, sit two orders of magnitude lower. This is the kind of thing that's worth measuring before pricing a product.

The plan-gating incident

Discovered during the V3 pre-release audit, not in production — the user base was tiny enough that no real customer had hit it yet, but every freshly-signed-up Pro account was silently degrading itself to Basic.

Root cause

The JWT issued by registerStepOne contained { email, id, userType } but not companyId. The endpoint /api/v2/plans/me required companyId to resolve the user's plan and returned 401 without it. On the frontend, PlanContext caught the 401 and fell back silently to currentPlan: 'basic' — instead of bubbling up the error.

Net effect: paid Pro features were gated off for new accounts. Old sessions kept working because their JWT pre-dated the change.

Fix

Single commit, cf5c9ea. Introduces a helper resolveCompanyId(req) that looks up companyId via Admin.findById(req.user.id).companies[0] instead of trusting it from the JWT. Applied to /plans and to /stripe (which had the same bug on subscription routes — same root cause, same fix).

Bonus discovery in the same audit: two backend routes were doing the requireFeature('benchmarks_internal') check in the frontend only. A motivated user could have called the API directly. Patched in the same commit, before it ever got noticed externally.

What's worth reading

analyze.py — the seven-step pipeline, commented
anonymizer.py — 12 GDPR-critical lines
driver_service.py — RandomForest + signed correlation
engagementScoreService.js — adaptive weighting (70/30/0 without AI, 50/20/30 with) with graceful fallback
analyticsService.js — pre-aggregation bulkWrite with fire-and-forget alert hook
server.js — security posture (helmet, sanitize, three rate-limit tiers)

Retrospective

Five things I would change today:

MongoDB M0 is too tight. Pre-aggregations buy headroom but one more dimension or one multi-tenant client and the quota cracks. M10 at ~$57/month is the obvious move the moment Stripe goes live.
Tests should exist. Zero automated tests for 21 routes, 19 controllers, 10 services, 16 collections. The pre-release audit found four critical issues a baseline supertest + pytest harness would have caught for free. The reason there are none is honest — I prioritized shipping over coverage — but that math flips fast.
Cost instrumentation in the codebase, not external. The €2.86 figure exists because I logged every call manually outside the application. That data should live in a table next to ai_analyses and be exposed in the admin UI. I'm building a product whose unit economics depend on LLM spend; not measuring it inside the product is a gap I'd close first.
The Brevo sender has been unverified for two months. The reminder engine — one of the most visible Pro features — ships emails from a personal Gmail address. SPF/DKIM are at zero, deliverability is silently degraded, and this is a product risk more than a technical one. It needs the client to do the DNS work, which is a different kind of blocker than I'm used to handling.
CRA → Vite. The frontend carries four icon libraries (heroicons + lucide + react-tagcloud + wordcloud) and 20 orphan components flagged by the audit. A migration to Vite plus a one-afternoon cleanup would cut cold start and bundle size meaningfully. The reason it hasn't happened is purely "later".

build

Workflow: spec → plan-first session → parallel subagents → automated review → manual call on the gray areas. Production decisions, architecture, debugging, and incident response are mine. Code generation is the agent's. The portfolio itself documents this workflow in /projects/portfolio.

//Context

//Stack

//The AI pipeline

//The cost of running an LLM in production

//The plan-gating incident