projects

BackStageRate

status
live
domain
app.backstagerate.com
period
2026-04 → present (~6 weeks at writing)
versions
v3.0.0 (2026-04-15) · v3.1.0 (2026-05-12)
commits
11 visible · history scrubbed for leaked secrets
client
anonymized (B2B SaaS sponsor)

A B2B SaaS for employee engagement surveys with an AI analytics pipeline on top. Direct competitor positioning to OfficeVibe / Supermood. Three-container Docker architecture (React frontend + Node API + Python AI service) running on a 32GB Scaleway VPS. Built over five focused weeks, two production tags shipped, and a complete pre-prod audit that surfaced four critical issues nobody had noticed.

Context

The product is a survey engine that asks employees questions on a recurring cadence, computes engagement scores, and runs an AI pipeline over the verbatim answers. Constraints:

  • MongoDB Atlas M0 free tier. 512MB ceiling. The system stress-tested at 537 employees, 22 surveys, 72,786 responses, and 14,322 pre-aggregations still under quota — because all dashboard data is materialized at write time, not computed on read.
  • GDPR hard rules. No manager-level alert below 5 respondents. No cross-dimension analysis below 3. Every verbatim is regex-anonymized (email, French phone numbers) before it ever reaches Claude.
  • Solo dev, no CI/CD. Manual rsync + docker compose build. No test automation (Jest/pytest both absent). All quality control runs through manual Playwright passes and a pre-release audit document.
  • Inherited migration. Came from a Vercel + Supabase + AWS SDK stack that had been ripped out before the first commit in this repo. The visible git history starts at the V2 production build — five weeks of prior work are squashed into the initial commit.

Stack

LayerChoiceWhy this one
FrontendReact 18 + CRA + Tailwind + DaisyUI + RechartsInherited; kept for delivery speed
BackendNode 20 + Express 4 + Mongoose 8Pragmatic; already running in V1
DatabaseMongoDB Atlas M0 (free)Cost zero until 512MB — pre-aggregations compensate
AuthJWT (jose) + bcrypt, 24h expiryReduced from 7d during the hardening pass
AI servicePython 3.11 + FastAPI + anthropic>=0.49 (Claude Sonnet 4) + scikit-learn 1.3Isolated runtime to contain LLM costs and latency
PaymentsStripe 17 (Basic / Pro, test mode pending KYC)Standard B2B SaaS
HostingScaleway VPS (32GB / 6 cores) + Docker + nginx + Let's EncryptFixed cost, full control
Monitoringmonitor-health.sh cron + Docker healthchecksMinimum viable solo — no APM, no Sentry
EmailBrevo (SDK legacy)Inherited — sender unverified, see retrospective
Exportspdfkit (chose over puppeteer — Alpine incompatibility)One of the cleaner forced pivots

The AI pipeline

This is the part of the project that's hardest to do well and easiest to do badly. The pipeline runs once per completed survey, gated behind a "Pro plan" feature flag.

Seven steps, each isolated for failure containment, all stored atomically:

  1. Collect verbatims from the database by sentSurveyId.
  2. Anonymize with two regex passes — emails and French phone numbers — in 12 lines of Python. The shortest, most-important file in the codebase.
  3. Sentiment batch. Chunks of 20 verbatims numbered in a single prompt, one JSON response per chunk. Cuts the per-verbatim API cost by ~20x. Gap analysis catches verbatims whose sentiment contradicts the quantitative rating (sentiment_service.py).
  4. Theme clustering via a single prompt that clusters, summarizes, and recommends in one pass (theme_service.py).
  5. Drivers analysis. RandomForestRegressor(n_estimators=100, max_depth=5) over the per-employee scores, with X.corrwith(y) to recover the sign of each correlation — feature importance alone tells you the magnitude but not the direction (driver_service.py).
  6. Risk + weak signal detection from the previous stages.
  7. Atomic upsert into ai_analyses, indexed by sentSurveyId so reruns are idempotent.

The whole pipeline is orchestrated in one file, commented step by step. If something fails in the middle, the upsert never happens — the next call retries from the top cleanly.

The cost of running an LLM in production

I instrumented every Claude call with an action label and tracked the spend across 212 real API calls. The numbers (in EUR, model claude-sonnet-4-20250514):

ActionPer-call cost
Full AI analysis (12 sentiment batches + 1 themes + 1 drivers)€0.234
PDF report generation€0.011
Chatbot message (simple → complex)€0.003 → €0.008
Recommendation generation€0.018

Projected monthly cost for an intensive client (4 analyses + 2 reports + 300 chats + 4 reco runs): €2.41/month. Total spend over the entire measurement window: €2.86.

That number alone has changed how I think about LLM products. The narrative around AI cost is dominated by the worst-case prices of unbatched, full-context, top-tier API calls. Real applied costs, with batching and a sentiment-vs-themes split, sit two orders of magnitude lower. This is the kind of thing that's worth measuring before pricing a product.

The plan-gating incident

Discovered during the V3 pre-release audit, not in production — the user base was tiny enough that no real customer had hit it yet, but every freshly-signed-up Pro account was silently degrading itself to Basic.

Root cause

The JWT issued by registerStepOne contained { email, id, userType } but not companyId. The endpoint /api/v2/plans/me required companyId to resolve the user's plan and returned 401 without it. On the frontend, PlanContext caught the 401 and fell back silently to currentPlan: 'basic' — instead of bubbling up the error.

Net effect: paid Pro features were gated off for new accounts. Old sessions kept working because their JWT pre-dated the change.

Fix

Single commit, cf5c9ea. Introduces a helper resolveCompanyId(req) that looks up companyId via Admin.findById(req.user.id).companies[0] instead of trusting it from the JWT. Applied to /plans and to /stripe (which had the same bug on subscription routes — same root cause, same fix).

Bonus discovery in the same audit: two backend routes were doing the requireFeature('benchmarks_internal') check in the frontend only. A motivated user could have called the API directly. Patched in the same commit, before it ever got noticed externally.

What's worth reading

Retrospective

Five things I would change today:

  • MongoDB M0 is too tight. Pre-aggregations buy headroom but one more dimension or one multi-tenant client and the quota cracks. M10 at ~$57/month is the obvious move the moment Stripe goes live.
  • Tests should exist. Zero automated tests for 21 routes, 19 controllers, 10 services, 16 collections. The pre-release audit found four critical issues a baseline supertest + pytest harness would have caught for free. The reason there are none is honest — I prioritized shipping over coverage — but that math flips fast.
  • Cost instrumentation in the codebase, not external. The €2.86 figure exists because I logged every call manually outside the application. That data should live in a table next to ai_analyses and be exposed in the admin UI. I'm building a product whose unit economics depend on LLM spend; not measuring it inside the product is a gap I'd close first.
  • The Brevo sender has been unverified for two months. The reminder engine — one of the most visible Pro features — ships emails from a personal Gmail address. SPF/DKIM are at zero, deliverability is silently degraded, and this is a product risk more than a technical one. It needs the client to do the DNS work, which is a different kind of blocker than I'm used to handling.
  • CRA → Vite. The frontend carries four icon libraries (heroicons + lucide + react-tagcloud + wordcloud) and 20 orphan components flagged by the audit. A migration to Vite plus a one-afternoon cleanup would cut cold start and bundle size meaningfully. The reason it hasn't happened is purely "later".

build

Workflow: spec → plan-first session → parallel subagents → automated review → manual call on the gray areas. Production decisions, architecture, debugging, and incident response are mine. Code generation is the agent's. The portfolio itself documents this workflow in /projects/portfolio.