A B2B SaaS for employee engagement surveys with an AI analytics pipeline on top. Direct competitor positioning to OfficeVibe / Supermood. Three-container Docker architecture (React frontend + Node API + Python AI service) running on a 32GB Scaleway VPS. Built over five focused weeks, two production tags shipped, and a complete pre-prod audit that surfaced four critical issues nobody had noticed.
Context
The product is a survey engine that asks employees questions on a recurring cadence, computes engagement scores, and runs an AI pipeline over the verbatim answers. Constraints:
- MongoDB Atlas M0 free tier. 512MB ceiling. The system stress-tested at 537 employees, 22 surveys, 72,786 responses, and 14,322 pre-aggregations still under quota — because all dashboard data is materialized at write time, not computed on read.
- GDPR hard rules. No manager-level alert below 5 respondents. No cross-dimension analysis below 3. Every verbatim is regex-anonymized (email, French phone numbers) before it ever reaches Claude.
- Solo dev, no CI/CD. Manual
rsync + docker compose build. No test automation (Jest/pytest both absent). All quality control runs through manual Playwright passes and a pre-release audit document. - Inherited migration. Came from a Vercel + Supabase + AWS SDK stack that had been ripped out before the first commit in this repo. The visible git history starts at the V2 production build — five weeks of prior work are squashed into the initial commit.
Stack
| Layer | Choice | Why this one |
|---|---|---|
| Frontend | React 18 + CRA + Tailwind + DaisyUI + Recharts | Inherited; kept for delivery speed |
| Backend | Node 20 + Express 4 + Mongoose 8 | Pragmatic; already running in V1 |
| Database | MongoDB Atlas M0 (free) | Cost zero until 512MB — pre-aggregations compensate |
| Auth | JWT (jose) + bcrypt, 24h expiry | Reduced from 7d during the hardening pass |
| AI service | Python 3.11 + FastAPI + anthropic>=0.49 (Claude Sonnet 4) + scikit-learn 1.3 | Isolated runtime to contain LLM costs and latency |
| Payments | Stripe 17 (Basic / Pro, test mode pending KYC) | Standard B2B SaaS |
| Hosting | Scaleway VPS (32GB / 6 cores) + Docker + nginx + Let's Encrypt | Fixed cost, full control |
| Monitoring | monitor-health.sh cron + Docker healthchecks | Minimum viable solo — no APM, no Sentry |
| Brevo (SDK legacy) | Inherited — sender unverified, see retrospective | |
| Exports | pdfkit (chose over puppeteer — Alpine incompatibility) | One of the cleaner forced pivots |
The AI pipeline
This is the part of the project that's hardest to do well and easiest to do badly. The pipeline runs once per completed survey, gated behind a "Pro plan" feature flag.
Seven steps, each isolated for failure containment, all stored atomically:
- Collect verbatims from the database by
sentSurveyId. - Anonymize with two regex passes — emails and French phone numbers — in 12 lines of Python. The shortest, most-important file in the codebase.
- Sentiment batch. Chunks of 20 verbatims numbered in a single prompt, one JSON response per chunk. Cuts the per-verbatim API cost by ~20x. Gap analysis catches verbatims whose sentiment contradicts the quantitative rating (
sentiment_service.py). - Theme clustering via a single prompt that clusters, summarizes, and recommends in one pass (
theme_service.py). - Drivers analysis.
RandomForestRegressor(n_estimators=100, max_depth=5)over the per-employee scores, withX.corrwith(y)to recover the sign of each correlation — feature importance alone tells you the magnitude but not the direction (driver_service.py). - Risk + weak signal detection from the previous stages.
- Atomic upsert into
ai_analyses, indexed bysentSurveyIdso reruns are idempotent.
The whole pipeline is orchestrated in one file, commented step by step. If something fails in the middle, the upsert never happens — the next call retries from the top cleanly.
The cost of running an LLM in production
I instrumented every Claude call with an action label and tracked the spend across 212 real API calls. The numbers (in EUR, model claude-sonnet-4-20250514):
| Action | Per-call cost |
|---|---|
| Full AI analysis (12 sentiment batches + 1 themes + 1 drivers) | €0.234 |
| PDF report generation | €0.011 |
| Chatbot message (simple → complex) | €0.003 → €0.008 |
| Recommendation generation | €0.018 |
Projected monthly cost for an intensive client (4 analyses + 2 reports + 300 chats + 4 reco runs): €2.41/month. Total spend over the entire measurement window: €2.86.
That number alone has changed how I think about LLM products. The narrative around AI cost is dominated by the worst-case prices of unbatched, full-context, top-tier API calls. Real applied costs, with batching and a sentiment-vs-themes split, sit two orders of magnitude lower. This is the kind of thing that's worth measuring before pricing a product.
The plan-gating incident
Discovered during the V3 pre-release audit, not in production — the user base was tiny enough that no real customer had hit it yet, but every freshly-signed-up Pro account was silently degrading itself to Basic.
Root cause
The JWT issued by registerStepOne contained { email, id, userType } but not companyId. The endpoint /api/v2/plans/me required companyId to resolve the user's plan and returned 401 without it. On the frontend, PlanContext caught the 401 and fell back silently to currentPlan: 'basic' — instead of bubbling up the error.
Net effect: paid Pro features were gated off for new accounts. Old sessions kept working because their JWT pre-dated the change.
Fix
Single commit, cf5c9ea. Introduces a helper resolveCompanyId(req) that looks up companyId via Admin.findById(req.user.id).companies[0] instead of trusting it from the JWT. Applied to /plans and to /stripe (which had the same bug on subscription routes — same root cause, same fix).
Bonus discovery in the same audit: two backend routes were doing the requireFeature('benchmarks_internal') check in the frontend only. A motivated user could have called the API directly. Patched in the same commit, before it ever got noticed externally.
What's worth reading
analyze.py— the seven-step pipeline, commentedanonymizer.py— 12 GDPR-critical linesdriver_service.py— RandomForest + signed correlationengagementScoreService.js— adaptive weighting (70/30/0 without AI, 50/20/30 with) with graceful fallbackanalyticsService.js— pre-aggregationbulkWritewith fire-and-forget alert hookserver.js— security posture (helmet, sanitize, three rate-limit tiers)
Retrospective
Five things I would change today:
- MongoDB M0 is too tight. Pre-aggregations buy headroom but one more dimension or one multi-tenant client and the quota cracks. M10 at ~$57/month is the obvious move the moment Stripe goes live.
- Tests should exist. Zero automated tests for 21 routes, 19 controllers, 10 services, 16 collections. The pre-release audit found four critical issues a baseline
supertest + pytestharness would have caught for free. The reason there are none is honest — I prioritized shipping over coverage — but that math flips fast. - Cost instrumentation in the codebase, not external. The €2.86 figure exists because I logged every call manually outside the application. That data should live in a table next to
ai_analysesand be exposed in the admin UI. I'm building a product whose unit economics depend on LLM spend; not measuring it inside the product is a gap I'd close first. - The Brevo sender has been unverified for two months. The reminder engine — one of the most visible Pro features — ships emails from a personal Gmail address. SPF/DKIM are at zero, deliverability is silently degraded, and this is a product risk more than a technical one. It needs the client to do the DNS work, which is a different kind of blocker than I'm used to handling.
- CRA → Vite. The frontend carries four icon libraries (heroicons + lucide + react-tagcloud + wordcloud) and 20 orphan components flagged by the audit. A migration to Vite plus a one-afternoon cleanup would cut cold start and bundle size meaningfully. The reason it hasn't happened is purely "later".
build
Workflow: spec → plan-first session → parallel subagents → automated review → manual call on the gray areas. Production decisions, architecture, debugging, and incident response are mine. Code generation is the agent's. The portfolio itself documents this workflow in /projects/portfolio.