Date: 2026-05-31
Status: Design approved by Ralf, pending written-spec review
Repo: ~/projects/agent-metrics (HumanHours, Next.js 16 + Supabase EU + Stripe monorepo)
1. Summary
Add an outside-in company-research engine to HumanHours. Given one or many company domains, the engine returns an enriched company record: company profile, estimated headcount per role, fully-loaded labour cost per role, total annual labour cost, and an AI-automation business case, each with a confidence score and cited sources.
Enriched companies live in a per-org library. Users keep, view, download (CSV/JSON), and pull them via the API to drive outreach campaigns. That outreach use case, file + API access to enriched companies, is the primary product goal. The CFO-style business case rides along on the same record.
This is a native rebuild of the 7-workflow n8n research engine built by Chris Anejo and Esther Ayodele. n8n is retired. We do not port 1:1; we keep what works (the pipeline shape, the validated ~80% logic) and fix the technical debt (see §3).
2. Scope
In scope (v1)
- Single + bulk company enrichment from a domain.
- Per-org company library with ownership, tags, and an external_id for outreach matching.
- Download as file (CSV + JSON) and a list/pull API for outreach.
- Confidence per datapoint, rolled up cost-weighted.
- Lookup-based quota and pricing (new "research lookups" dimension, hard cap per plan, upgrade to raise, no overage billing).
- A code-based accuracy eval gating >=80% against a versioned ground-truth set.
Out of scope (v1, explicitly deferred)
- PowerPoint / branded deck export (outreach needs file + API, not a deck). Later.
- Tying the estimate into the existing track/measure loop (the role -> task_type "vooraf = achteraf, 87% verified" moat). Later, designed not to be blocked by v1.
- Lifecycle phases and full two-dimensional ROI (value creation). Light/later.
- Licensed enrichment APIs (Apollo/Coresignal/PDL, KVK/Companies House). Adapter seam designed in now; integration later.
- Seniority (junior/medior/senior) as an exposed dimension. Schema is seniority-ready (see §5) but v1 uses one blended loaded wage per role.
3. Decisions log
Approved with Ralf on 2026-05-31:
- v1 = standalone enrichment product. Library + download + API for outreach; the measure-loop integration comes later.
- Data source = port the n8n approach (LLM research + official labour statistics +
seeded survey reference tables), with an adapter seam for licensed APIs later.
Global, not country-limited: wages resolve on-demand via grounded research for
any country/role (Sonar fetches the prevailing wage with a source), cached per
country/role in
wagesso it is paid once. Seeded reference tables are an optional fast-path, never a restriction to a fixed country set. - Bulk infra = existing pg_cron + claim-based worker (
SKIP LOCKED), no Inngest. - Lookup billing = consume a lookup on a new company or a user-requested refresh. Existing enriched companies are always free to view/download/API. Stale data shows an "X days old, refresh?" badge; refresh is user-initiated. No auto-refresh charging. No overage billing: each plan has a hard monthly lookup cap; when it is reached the enrich/refresh endpoints return an upgrade-required error. The user upgrades to raise the cap. Bundles: Free 10 one-time, Pro 100/mo, Agency 500/mo pooled, Enterprise custom.
- Seniority = blended now, schema seniority-ready.
- Research grounding = Perplexity Sonar (
perplexity/sonarvia OpenRouter), real web search with citations, not a model-only call. This matches the original Asana intent; the n8n node was misnamed and actually ran Claude without web grounding.
Improvements over the n8n flows (not a 1:1 port)
- Deterministic core, LLM only at the edges. All math (wage lookup, loaded-cost,
headcount distribution, automation savings, aggregation, confidence rollup) moves to
pure, unit-tested functions in
packages/core. The LLM does only two bounded jobs: grounded research/extraction, and the business-case narrative. - Citable reference data instead of hardcoded tables. n8n hid wage figures and employer factors in Code nodes with live stat-calls disabled. We make them versioned, seeded reference tables with a source per datapoint. This is the Methodology moat.
- Cache-first that actually runs. n8n left cache-read nodes disabled, so every run re-ran the LLM. Cache-first is a tested first-class path here.
- Confidence as a real source-tier model. Never silently floor a wage to 0 (the weakest dimension in n8n validation). Explicit tiers, cost-weighted rollup.
- Code-based eval in CI against a versioned ground-truth fixture, not Google Sheets with half-disabled LLM comparisons.
- One clean path, one provider. Drop duplicate/disabled nodes, HTTP-vs-subworkflow
variants, and the two-credential mess (n8n used both "HumanHours" and "Zorgservice XL"
OpenRouter creds). One OpenRouter config routing to
perplexity/sonar+ Claude.
4. Architecture
Client / API ──► /v1/companies (sync, 1) ┐
└──► /v1/companies/bulk (async, many) ──┤
▼
enrichment service
(apps/web/lib/enrichment)
│
┌─────────────────────────────────────────────┼─────────────────────────────┐
▼ ▼ ▼ ▼
company_research wages cache reference data deterministic core
cache (global, (global, per (seeded, versioned: (packages/core/enrichment)
per domain, TTL) country/role/year) wages, employer factors, pure fns: normalise roles,
│ │ role distributions, loaded-cost, aggregate,
│ │ automation rates) automation savings,
▼ ▼ confidence rollup
Sonar grounded official stats
research + Claude (Eurostat/CBS/ECB)
extraction (LLM edge) + seeded fallback
│
▼
org_companies (per-org library, RLS)
company_lookups (billing ledger)
Background bulk: pg_cron triggers /api/cron/enrichment-worker every ~1 min, which
claims queued enrichment_job_items with SKIP LOCKED, runs the same enrichment
service per domain, writes to org_companies, and updates job counts with retry/backoff.
Deterministic core vs LLM edges
- LLM edge 1, grounded research:
perplexity/sonarsearches the web and returns company facts with citations. Claude then extracts these into the fixed JSON shape. Output: legal name, country, industry, headcount estimate, departments/roles, channels, languages, and a source URL per material fact. - Deterministic core (
packages/core/enrichment/): role normalisation to a canonical set, headcount distribution from a seeded industry->role-distribution benchmark (LLM fills only gaps), wage lookup (cache-first onwages, then official stat, then seeded reference), loaded-cost via one canonical employer-factor function, annual cost aggregation, automation savings via seeded role automation-rates, net-ROI (subtract agent cost), and cost-weighted confidence rollup. All unit-tested, reproducible. - LLM edge 2, narrative: Claude writes the business-case summary, top automation opportunity, and risk factors from the deterministic numbers. Text only, never math.
5. Data model
New migrations in packages/db/migrations/, RLS on every org-scoped table, following
existing conventions (uuid PKs, updated_at trigger, generated columns where it helps).
company_research (global cache, service-role only, not org-scoped)
One research result per domain, reused across all orgs. This is the cost-saver; it is never exposed directly and does not affect billing.
id uuid pk,domain text uniquecompany_data jsonb(profile + sources),roles jsonb(per-role headcount + seniority_mix?),wages jsonb,totals jsonb,business_case jsonb,confidence jsonbmodel_versions jsonb(which Sonar/Claude/reference-data versions produced this)researched_at timestamptz,ttl_days int(default: company 30)sources jsonb(cited URLs)
roles[] and wages jsonb shapes are seniority-ready: wage_data can later carry
{ blended, junior, medior, senior } and a role can carry seniority_mix. v1 populates
blended only.
org_companies (per-org library, RLS)
What the user owns, views, downloads, and pulls via API.
id uuid pk,org_id uuid fk,domain textresearch_id uuid fk -> company_researchexternal_id text null(customer's own id for outreach matching),tags text[]added_at timestamptz,last_refreshed_at timestamptz- unique
(org_id, domain)
wages (global cache, service-role only)
id uuid pk,country text,role text,year intwage_data jsonb(seniority-ready),source text,source_url text,confidence numericlast_researched timestamptz,ttl_days int(default: wages 90)- unique
(country, role, year)
enrichment_jobs (per-org, RLS)
id uuid pk,org_id uuid fk,status text(queued|running|done|failed)input_count int,accepted int,rejected int,duplicates_removed intcreated_at,started_at,finished_at timestamptz
enrichment_job_items
id uuid pk,job_id uuid fk,org_id uuid fk,domain textstatus text(queued|running|done|failed),attempts int,retry_after timestamptzerror text,created_at,updated_at- index on
(status, retry_after)for the claim query
company_lookups (billing ledger, append-only, per-org)
Drives metering, tier-bundle counting, and audit. One row per charged lookup.
id uuid pk,org_id uuid fk,domain text,kind text(new|refresh)billing_period text(e.g.2026-05),created_at timestamptz- index on
(org_id, billing_period)
Reference data (seeded, versioned, the Methodology)
Seeded via packages/db/scripts/ and/or migrations, each row cites a source:
- employer-cost factors per country (NL 1.40, DE 1.21, UK 1.25, US 1.30, FR 1.45, BE 1.35, ...)
- industry -> role-distribution benchmarks
- role automation-rates (cited, e.g. McKinsey MGI 2025)
- seeded national wage reference tables (fallback when live stats are unavailable)
refresh_log
Audit of any reference-data / cache-warming refresh runs.
Confidence model
Per datapoint a source tier (highest to lowest):
fetched+cited > official statistic > seeded reference table > LLM-inferred >
hard fallback. Each tier maps to a confidence band; the overall company confidence is
the cost-weighted average across roles/wages. Never silently emit 0; emit an explicit
low-confidence / unknown state instead.
6. API surface
All under /v1, Node runtime, authenticated with existing api_keys (argon2id) and
gated by plan-gate.ts. New scopes: companies:read, companies:write. Same
machine-readable error format { error: { code, message, field?, hint? } }.
POST /v1/companies— enrich one domain. Charges a lookup if the company is new to this org (or?refresh=true). Returns the enriched record, or202 + job_idif the research is cold and runs async.POST /v1/companies/bulk— body{ domains: [...] }or CSV upload. Normalises + dedupes, creates a job + queued items, returns202 { job_id, accepted, rejected, duplicates_removed }.GET /v1/companies— the org library, paginated, filter by tag/added_at,format=json|csv|ndjson. This is the outreach pull.GET /v1/companies/{domain}— one enriched record.POST /v1/companies/{domain}/refresh— user-requested refresh, charges a lookup.GET /v1/companies/export— whole (or filtered) library as a downloadable file.GET /v1/jobs/{id}— bulk job status (% complete, counts).
Lookup charging rules
- A lookup is consumed when a domain is added to an org's library (new) or refreshed.
- Serving from the global
company_researchcache still consumes a lookup for the org (it is new data to them); the cache only saves us the LLM cost. That margin is intentional. - Viewing, listing, downloading, and API-pulling existing library entries is always free.
- Hard cap: if the org's lookups this billing period have reached the plan bundle, the
enrich/refresh endpoints return
402 { error.code = "lookup_quota_exceeded", hint: "upgrade your plan" }. No overage is billed. Caps: Free 10 one-time, Pro 100/mo, Agency 500/mo pooled across the agency tree, Enterprise custom.
7. Bulk processing
enrichment_job_itemsis a queue./api/cron/enrichment-worker(pg_cron, ~1 min) claims a batch withUPDATE ... SET status='running' ... WHERE status='queued' AND (retry_after IS NULL OR retry_after <= now()) ORDER BY created_at LIMIT N FOR UPDATE SKIP LOCKED, runs the enrichment service per item, writes results, updates parent job counts, and sets the jobdonewhen no queued/running items remain.- Failures: exponential backoff (1/5/15 min), max 3 attempts, then
failedwith the error recorded. Configurable batch size and concurrency. - No auto-reindex of org libraries (per decision 4). Optional later: opportunistic global-cache warming for popular domains.
8. Pricing and quota (no overage)
- New "research lookups" dimension is a hard monthly cap per plan, not metered overage.
Counted from the
company_lookupsledger for the current billing period. - Tier caps in
plan-gate.ts: Free 10 one-time, Pro 100/mo, Agency 500/mo pooled across the agency tree, Enterprise custom. - When the cap is reached, enrich/refresh returns
402 lookup_quota_exceededwith an upgrade hint. Nohumanhours_lookupStripe meter and no overage cron: the user upgrades to raise the cap. (The existing event-overage meterhumanhours_eventis untouched.) - Cache marketed as a feature: "rerun free, you only pay once" (per-org ownership).
- Price migration Pro EUR 19 -> 49 and Agency EUR 99 -> 249 with 12-month grandfathering is its own guarded sub-step: create new Stripe price objects, keep existing customers on grandfathered prices, migrate new signups first. Treated as a careful billing-ops task, sequenced after the quota lands.
9. Frontend
New authed section app/(app)/companies/, using the existing dark cockpit design
system (mint #4ade80 accent, Geist/Geist Mono).
- Library (
/companies): table of enriched companies with confidence badges and staleness badges, tags, bulk-upload (paste list or CSV), download buttons (CSV/JSON), per-row refresh. - Detail (
/companies/{domain}): company profile, labour cost per role, the AI business case, and source attribution per datapoint. - Pricing page (
app/(marketing)/pricing): add the "research lookups" and "bulk" columns and the migrated tier prices. No PowerPoint export in v1.
10. Docs, eval, quality
- Docs (
docs/): API docs for the new/v1/companiesendpoints, the enriched-record schema, export formats, bulk job lifecycle, and lookup billing. The reference-data methodology gets its own doc (seed of the open "AI Task Baseline Vocabulary" standard). - Eval: port WF4 as a deterministic, code-based suite against a versioned ground-truth fixture (NL/UK/DE/US mix). Scores headcount/country/wage accuracy and gates overall >=80% in CI. Wage comparison strips the employer factor to compare like for like (ground truth is gross survey wage).
- Env: add the OpenRouter key (and any stat-API keys) to
apps/web/lib/env.tsZod schema so missing config fails fast, consistent with existing env validation. - Tests: unit tests for every
packages/core/enrichmentpure function; integration tests for the API routes and the claim-worker.
11. Build sequence (phases)
- Data model + migrations (company_research, org_companies, wages, enrichment_jobs/items, company_lookups, refresh_log) + RLS + seeded reference data.
- Enrichment service: provider config (Sonar + Claude via OpenRouter), grounded research
- extraction, deterministic core, cache-first, confidence rollup. Eval suite to >=80%.
- Sync API:
POST /v1/companies,GETlist/detail, refresh, export (CSV/JSON), new api-key scopes, lookup charging + ledger. - Bulk:
POST /v1/companies/bulk+ pg_cron claim-worker +GET /v1/jobs/{id}. - Pricing/quota: lookup caps in plan-gate (hard cap + 402 upgrade, no overage), pricing page columns. Then the guarded EUR 49/249 migration + grandfathering.
- Frontend: library + detail + upload + download + badges.
- Docs + methodology doc + SDK touch-ups.
12. Open questions
- Exact Sonar model tier (
sonarvssonar-pro) and per-lookup cost ceiling to stay within ~EUR 0.10-0.30 / company. - Which official statistics to call live (Eurostat
lc_lci_lev, CBS, ECB FX) vs rely on seeded reference tables, per country, balancing reliability against the >=80% gate. - CSV column schema for the outreach export (which fields downstream campaigns need).
- Whether bulk on Pro is capped (e.g. max 10/job) or Agency-only, an open pricing decision carried over from the vault pricing note.