Enrichment Engine, Phase 1: Data Model + Seeded Reference Data, Implementation Plan · HumanHours docs

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Lay the complete Supabase schema and versioned reference-data foundation for the HumanHours company-enrichment engine, plus the shared Zod types, so later phases (enrichment service, API, bulk, pricing, frontend) build on a settled data model.

Architecture: Three new SQL migrations (core cache tables, job/billing tables, reference-data tables) following the repo's existing migration conventions, plus Zod schemas in packages/types for the enriched-company record (seniority-ready), and a verify script that asserts the schema and seed data. No application logic, no LLM calls, no API routes in this phase.

Tech Stack: Supabase Postgres (RLS, public. schema, gen_random_uuid(), RLS helpers is_org_member/has_org_role_in), the migrate.ts idempotent runner, Zod 3, Vitest, pg + tsx scripts.

Context the engineer needs

Migration runner: pnpm --filter @agent-metrics/db migrate applies every .sql in packages/db/migrations/ not yet recorded in public._am_migrations. It needs DATABASE_URL (the Supabase "Direct connection" URI, port 5432) in apps/web/.env.local. Migrations are immutable once applied; never edit an applied file, always add a new numbered one. Latest existing file is 0038_dashboard_aggregates.sql, so new files are 0039, 0040, 0041.
RLS conventions (see packages/db/migrations/0023_webhooks.sql): every table gets alter table ... enable row level security. Org-scoped read: using (public.is_org_member(org_id)). Admin write: using (public.has_org_role_in(org_id, array['owner','admin'])). Service-role (the API and scripts) bypasses RLS entirely, so global/service-only tables get RLS enabled with no member policy.
Seed-via-migration is an established pattern (0022_seed_task_types_full.sql seeds reference rows). We use it for reference data so the "methodology" is versioned in git.
Plan-name caveat: packages/types/src/index.ts OrgPlanSchema is ["free","starter","growth","scale"], but DB migrations use free/pro/agency/enterprise. Do NOT resolve this here. Phase 1 introduces nothing plan-named. The pricing phase plan will reconcile it.
Tables created-but-populated-later: role_distributions, automation_rates, and wage_reference are created in this phase with their final shape, but their numeric rows are sourced and seeded in Phase 2 (the data-source/methodology task), where live Eurostat/CBS and cited benchmarks are wired in. This is a deliberate phase boundary, not a placeholder: the verify script asserts these tables exist and are empty, and Phase 2's plan will assert they are populated.

File structure for this phase

Create: packages/db/migrations/0039_enrichment_core.sql — company_research, wages, org_companies + RLS.
Create: packages/db/migrations/0040_enrichment_jobs.sql — enrichment_jobs, enrichment_job_items, company_lookups, refresh_log + RLS + indexes.
Create: packages/db/migrations/0041_enrichment_reference.sql — canonical_roles, employer_factors, role_distributions, automation_rates, wage_reference + seed for the first two.
Create: packages/types/src/enrichment.ts — Zod schemas for the enriched record (seniority-ready).
Modify: packages/types/src/index.ts — re-export ./enrichment.
Create: packages/types/src/enrichment.test.ts — Vitest unit tests for the schemas.
Create: packages/db/scripts/verify-enrichment.ts — asserts schema + seed via pg.

Task 1: Migration 0039, core cache + library tables

Files:

Create: packages/db/migrations/0039_enrichment_core.sql
Step 1: Write the migration

-- 0039_enrichment_core.sql
-- Company-enrichment engine, core tables.
--
-- company_research is a GLOBAL cache: one outside-in research result per
-- domain, reused across every org. It is service-role-only (RLS on, no member
-- policy) and never billed against directly; org ownership lives in
-- org_companies. wages is a GLOBAL per-(country,role,year) cache of the loaded
-- hourly cost. org_companies is the PER-ORG library the customer owns, views,
-- downloads and pulls via the API.
 
-- 1. Global research cache. company_data/roles/wages/totals/business_case/
-- confidence are jsonb so the engine can evolve the shape without migrations.
-- The jsonb shapes are seniority-ready: a wage entry may later carry
-- {blended, junior, medior, senior}; v1 populates blended only.
create table if not exists public.company_research (
  id uuid primary key default gen_random_uuid(),
  domain text unique not null,
  company_data jsonb not null default '{}'::jsonb,   -- profile + per-fact source urls
  roles jsonb not null default '[]'::jsonb,          -- [{role, headcount, seniority_mix?, confidence}]
  wages jsonb not null default '[]'::jsonb,           -- [{country, role, wage_data, source, confidence}]
  totals jsonb not null default '{}'::jsonb,          -- {annual_cost_eur, potential_saving_eur, ...}
  business_case jsonb not null default '{}'::jsonb,   -- llm narrative + headline numbers
  confidence jsonb not null default '{}'::jsonb,      -- {overall, by_dimension}
  sources jsonb not null default '[]'::jsonb,         -- cited urls
  model_versions jsonb not null default '{}'::jsonb,  -- {research, extraction, reference_data}
  researched_at timestamptz not null default now(),
  ttl_days integer not null default 30,
  created_at timestamptz not null default now()
);
 
-- 2. Global wage cache. Keyed by country/role/year. wage_data is seniority-ready.
create table if not exists public.wages (
  id uuid primary key default gen_random_uuid(),
  country text not null,
  role text not null,
  year integer not null,
  wage_data jsonb not null default '{}'::jsonb,       -- {blended_hourly_eur, gross_hourly_eur, employer_factor, hours_per_year}
  source text,
  source_url text,
  confidence numeric(5, 2),
  last_researched timestamptz not null default now(),
  ttl_days integer not null default 90,
  created_at timestamptz not null default now(),
  unique (country, role, year)
);
 
-- 3. Per-org library. One row per (org, domain) the org has enriched. Points at
-- the shared research row. external_id lets a customer match a row back to their
-- own CRM/outreach record; tags are free-form for campaign segmentation.
create table if not exists public.org_companies (
  id uuid primary key default gen_random_uuid(),
  org_id uuid not null references public.organizations(id) on delete cascade,
  domain text not null,
  research_id uuid references public.company_research(id) on delete set null,
  external_id text,
  tags text[] not null default '{}',
  added_at timestamptz not null default now(),
  last_refreshed_at timestamptz not null default now(),
  unique (org_id, domain)
);
 
create index if not exists idx_org_companies_org_added
  on public.org_companies (org_id, added_at desc);
create index if not exists idx_org_companies_org_tags
  on public.org_companies using gin (tags);
 
-- 4. RLS. company_research and wages are service-role-only (no member policy).
-- org_companies is readable by any org member; all writes go through the API
-- under the service role, so no member write policy is defined.
alter table public.company_research enable row level security;
alter table public.wages enable row level security;
alter table public.org_companies enable row level security;
 
drop policy if exists "members read org_companies" on public.org_companies;
create policy "members read org_companies"
  on public.org_companies for select
  using (public.is_org_member(org_id));

Step 2: Apply the migration

Run: pnpm --filter @agent-metrics/db migrate Expected: a line [apply] 0039_enrichment_core.sql and Done. Applied 1 new migration. (later tasks apply more at once; that is fine).

Step 3: Commit

git add packages/db/migrations/0039_enrichment_core.sql
git commit -m "feat(db): enrichment core tables (company_research, wages, org_companies)"

Task 2: Migration 0040, jobs + billing-ledger tables

Files:

Create: packages/db/migrations/0040_enrichment_jobs.sql
Step 1: Write the migration

-- 0040_enrichment_jobs.sql
-- Bulk job queue + the lookup billing ledger.
--
-- enrichment_jobs is one bulk submission; enrichment_job_items is its queue,
-- drained by a pg_cron worker (Phase 4) that claims rows with FOR UPDATE SKIP
-- LOCKED. company_lookups is an append-only ledger: one row each time an org
-- consumes a lookup (a new company or a user-requested refresh). It drives the
-- monthly hard-cap count and audit. There is NO overage meter; reaching the cap
-- blocks further lookups until upgrade. refresh_log audits cache-warming runs.
 
create table if not exists public.enrichment_jobs (
  id uuid primary key default gen_random_uuid(),
  org_id uuid not null references public.organizations(id) on delete cascade,
  status text not null default 'queued'
    check (status in ('queued', 'running', 'done', 'failed')),
  input_count integer not null default 0,
  accepted integer not null default 0,
  rejected integer not null default 0,
  duplicates_removed integer not null default 0,
  processed_count integer not null default 0,
  error_count integer not null default 0,
  created_at timestamptz not null default now(),
  started_at timestamptz,
  finished_at timestamptz
);
 
create index if not exists idx_enrichment_jobs_org_created
  on public.enrichment_jobs (org_id, created_at desc);
 
create table if not exists public.enrichment_job_items (
  id uuid primary key default gen_random_uuid(),
  job_id uuid not null references public.enrichment_jobs(id) on delete cascade,
  org_id uuid not null references public.organizations(id) on delete cascade,
  domain text not null,
  status text not null default 'queued'
    check (status in ('queued', 'running', 'done', 'failed')),
  attempts integer not null default 0,
  retry_after timestamptz,
  error text,
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
);
 
-- The claim query orders by created_at and filters on (status, retry_after).
create index if not exists idx_enrichment_job_items_claim
  on public.enrichment_job_items (status, retry_after, created_at);
create index if not exists idx_enrichment_job_items_job
  on public.enrichment_job_items (job_id);
 
-- Billing ledger. billing_period is 'YYYY-MM' so the cap query is a cheap
-- equality scan. kind distinguishes a brand-new company from a refresh.
create table if not exists public.company_lookups (
  id uuid primary key default gen_random_uuid(),
  org_id uuid not null references public.organizations(id) on delete cascade,
  domain text not null,
  kind text not null check (kind in ('new', 'refresh')),
  billing_period text not null,
  created_at timestamptz not null default now()
);
 
create index if not exists idx_company_lookups_org_period
  on public.company_lookups (org_id, billing_period);
 
create table if not exists public.refresh_log (
  id uuid primary key default gen_random_uuid(),
  run_kind text not null,                  -- 'cache_warm' | 'reference_refresh'
  companies_refreshed integer not null default 0,
  wages_refreshed integer not null default 0,
  errors integer not null default 0,
  detail jsonb not null default '{}'::jsonb,
  created_at timestamptz not null default now()
);
 
-- RLS. jobs/items/lookups are readable by org members (usage display); writes
-- via service role. refresh_log is service-role-only.
alter table public.enrichment_jobs enable row level security;
alter table public.enrichment_job_items enable row level security;
alter table public.company_lookups enable row level security;
alter table public.refresh_log enable row level security;
 
drop policy if exists "members read enrichment_jobs" on public.enrichment_jobs;
create policy "members read enrichment_jobs"
  on public.enrichment_jobs for select
  using (public.is_org_member(org_id));
 
drop policy if exists "members read enrichment_job_items" on public.enrichment_job_items;
create policy "members read enrichment_job_items"
  on public.enrichment_job_items for select
  using (public.is_org_member(org_id));
 
drop policy if exists "members read company_lookups" on public.company_lookups;
create policy "members read company_lookups"
  on public.company_lookups for select
  using (public.is_org_member(org_id));

Step 2: Apply the migration

Run: pnpm --filter @agent-metrics/db migrate Expected: [apply] 0040_enrichment_jobs.sql and a Done. line.

Step 3: Commit

git add packages/db/migrations/0040_enrichment_jobs.sql
git commit -m "feat(db): enrichment jobs queue + lookup billing ledger"

Task 3: Migration 0041, reference-data tables + seed

Files:

Create: packages/db/migrations/0041_enrichment_reference.sql
Step 1: Write the migration

-- 0041_enrichment_reference.sql
-- Versioned, citable reference data: the "methodology". Each row names its
-- source so every number on a report is traceable. These tables are
-- service-role-only (the enrichment service reads them server-side).
--
-- This migration seeds canonical_roles and employer_factors (stable, known
-- values). role_distributions, automation_rates and wage_reference are created
-- here with their final shape but populated in Phase 2, where live statistics
-- and cited benchmarks are sourced. employer_factors values below are the
-- loaded-cost multipliers carried over from the n8n engine and are marked
-- provisional pending Phase 2 cross-check against Eurostat lc_lci.
 
-- 1. Canonical role vocabulary. The enrichment service normalises any raw
-- department/role name onto one of these keys. isco is the EU occupation code
-- used when querying official wage statistics.
create table if not exists public.canonical_roles (
  role text primary key,
  display_name text not null,
  isco text,
  created_at timestamptz not null default now()
);
 
insert into public.canonical_roles (role, display_name, isco) values
  ('customer_service', 'Customer Service', '4222'),
  ('sales',            'Sales',            '2433'),
  ('marketing',        'Marketing',        '2431'),
  ('operations',       'Operations',       '3341'),
  ('finance_admin',    'Finance & Admin',  '4311'),
  ('human_resources',  'Human Resources',  '2423'),
  ('engineering_it',   'Engineering & IT', '2512'),
  ('product',          'Product',          '2421'),
  ('data_analytics',   'Data & Analytics', '2511'),
  ('logistics',        'Logistics & Warehouse', '4321'),
  ('legal',            'Legal',            '2611'),
  ('support_helpdesk', 'IT Support / Helpdesk', '3512'),
  ('management',       'Management',       '1219'),
  ('other',            'Other',            null)
on conflict (role) do nothing;
 
-- 2. Employer-cost loading factor per country. gross hourly wage * factor =
-- fully-loaded employer hourly cost. version + source make it auditable.
create table if not exists public.employer_factors (
  country text primary key,
  factor numeric(4, 2) not null,
  hours_per_year integer not null default 1720,
  source text not null,
  source_url text,
  version text not null,
  valid_year integer not null,
  created_at timestamptz not null default now()
);
 
insert into public.employer_factors (country, factor, hours_per_year, source, source_url, version, valid_year) values
  ('NL', 1.40, 1720, 'n8n engine carry-over, provisional pending Eurostat lc_lci cross-check', null, 'v0-provisional', 2025),
  ('DE', 1.21, 1720, 'n8n engine carry-over, provisional pending Eurostat lc_lci cross-check', null, 'v0-provisional', 2025),
  ('UK', 1.25, 1720, 'n8n engine carry-over, provisional pending ONS cross-check',             null, 'v0-provisional', 2025),
  ('US', 1.30, 1720, 'n8n engine carry-over, provisional pending BLS cross-check',             null, 'v0-provisional', 2025),
  ('FR', 1.45, 1720, 'n8n engine carry-over, provisional pending Eurostat lc_lci cross-check', null, 'v0-provisional', 2025),
  ('BE', 1.35, 1720, 'n8n engine carry-over, provisional pending Eurostat lc_lci cross-check', null, 'v0-provisional', 2025)
on conflict (country) do nothing;
 
-- 3. Industry -> role headcount distribution. share is the fraction of total
-- headcount in that role for that industry. Populated in Phase 2.
create table if not exists public.role_distributions (
  id uuid primary key default gen_random_uuid(),
  industry text not null,
  role text not null references public.canonical_roles(role),
  share numeric(5, 4) not null,
  source text not null,
  version text not null,
  created_at timestamptz not null default now(),
  unique (industry, role, version)
);
 
-- 4. Role automation-rate: the fraction of a role's hours automatable. Cited.
-- Populated in Phase 2.
create table if not exists public.automation_rates (
  role text not null references public.canonical_roles(role),
  rate numeric(4, 3) not null,
  source text not null,
  source_url text,
  version text not null,
  created_at timestamptz not null default now(),
  primary key (role, version)
);
 
-- 5. Seeded gross-wage reference (fallback when live stats are unavailable).
-- Populated in Phase 2 from national salary surveys / official statistics.
create table if not exists public.wage_reference (
  id uuid primary key default gen_random_uuid(),
  country text not null,
  role text not null references public.canonical_roles(role),
  year integer not null,
  gross_hourly_eur numeric(8, 2) not null,
  source text not null,
  source_url text,
  version text not null,
  created_at timestamptz not null default now(),
  unique (country, role, year, version)
);
 
-- RLS: reference data is service-role-only (no member policy).
alter table public.canonical_roles enable row level security;
alter table public.employer_factors enable row level security;
alter table public.role_distributions enable row level security;
alter table public.automation_rates enable row level security;
alter table public.wage_reference enable row level security;

Step 2: Apply the migration

Run: pnpm --filter @agent-metrics/db migrate Expected: [apply] 0041_enrichment_reference.sql and a Done. line.

Step 3: Manually confirm the seed landed

Run: pnpm --filter @agent-metrics/db exec tsx scripts/verify-enrichment.ts (created in Task 5; if running tasks in order, do this confirmation as part of Task 5). Expected: canonical_roles=14, employer_factors=6.

Step 4: Commit

git add packages/db/migrations/0041_enrichment_reference.sql
git commit -m "feat(db): seeded reference-data tables (roles, employer factors) + methodology shells"

Task 4: Zod schemas for the enriched record

Files:

Create: packages/types/src/enrichment.ts
Modify: packages/types/src/index.ts (add re-export)
Test: packages/types/src/enrichment.test.ts
Step 1: Write the failing test

// packages/types/src/enrichment.test.ts
import { describe, expect, it } from "vitest";
 
import { EnrichedCompanySchema, WageDataSchema, RoleEstimateSchema } from "./enrichment";
 
describe("WageDataSchema", () => {
  it("accepts a blended-only wage (v1 shape)", () => {
    const parsed = WageDataSchema.parse({
      blended_hourly_eur: 38.5,
      gross_hourly_eur: 27.5,
      employer_factor: 1.4,
      hours_per_year: 1720,
    });
    expect(parsed.blended_hourly_eur).toBe(38.5);
    expect(parsed.junior_hourly_eur).toBeUndefined();
  });
 
  it("is seniority-ready (optional junior/medior/senior)", () => {
    const parsed = WageDataSchema.parse({
      blended_hourly_eur: 38.5,
      gross_hourly_eur: 27.5,
      employer_factor: 1.4,
      hours_per_year: 1720,
      junior_hourly_eur: 28,
      medior_hourly_eur: 38,
      senior_hourly_eur: 52,
    });
    expect(parsed.senior_hourly_eur).toBe(52);
  });
 
  it("rejects a negative wage", () => {
    expect(() =>
      WageDataSchema.parse({
        blended_hourly_eur: -1,
        gross_hourly_eur: 27.5,
        employer_factor: 1.4,
        hours_per_year: 1720,
      }),
    ).toThrow();
  });
});
 
describe("RoleEstimateSchema", () => {
  it("carries headcount, a confidence band and an optional seniority mix", () => {
    const parsed = RoleEstimateSchema.parse({
      role: "customer_service",
      headcount: 12,
      confidence: "official_statistic",
      seniority_mix: { junior: 0.5, medior: 0.35, senior: 0.15 },
    });
    expect(parsed.headcount).toBe(12);
    expect(parsed.confidence).toBe("official_statistic");
  });
 
  it("rejects an unknown confidence tier", () => {
    expect(() =>
      RoleEstimateSchema.parse({ role: "sales", headcount: 3, confidence: "vibes" }),
    ).toThrow();
  });
});
 
describe("EnrichedCompanySchema", () => {
  it("parses a minimal enriched record", () => {
    const parsed = EnrichedCompanySchema.parse({
      domain: "example.com",
      company_name: "Example BV",
      country: "NL",
      industry: "saas",
      headcount_estimate: 40,
      roles: [{ role: "sales", headcount: 5, confidence: "llm_inferred" }],
      confidence: { overall: 0.62 },
      sources: ["https://example.com/about"],
    });
    expect(parsed.domain).toBe("example.com");
    expect(parsed.roles).toHaveLength(1);
  });
});

Step 2: Run the test to verify it fails

Run: pnpm --filter @agent-metrics/types exec vitest run src/enrichment.test.ts Expected: FAIL, cannot resolve ./enrichment (module does not exist yet).

Step 3: Write the schema module

// packages/types/src/enrichment.ts
// Shared shapes for the company-enrichment engine. Seniority-ready: wage and
// role records carry optional junior/medior/senior fields that v1 leaves unset.
 
import { z } from "zod";
 
// Source tier for any datapoint, highest trust first. Drives confidence rollup.
export const ConfidenceTierSchema = z.enum([
  "fetched_cited",
  "official_statistic",
  "seeded_reference",
  "llm_inferred",
  "hard_fallback",
]);
export type ConfidenceTier = z.infer<typeof ConfidenceTierSchema>;
 
export const SeniorityMixSchema = z
  .object({
    junior: z.number().min(0).max(1),
    medior: z.number().min(0).max(1),
    senior: z.number().min(0).max(1),
  })
  .strict();
export type SeniorityMix = z.infer<typeof SeniorityMixSchema>;
 
export const WageDataSchema = z
  .object({
    blended_hourly_eur: z.number().nonnegative(),
    gross_hourly_eur: z.number().nonnegative(),
    employer_factor: z.number().positive(),
    hours_per_year: z.number().int().positive(),
    // Seniority-ready, optional in v1.
    junior_hourly_eur: z.number().nonnegative().optional(),
    medior_hourly_eur: z.number().nonnegative().optional(),
    senior_hourly_eur: z.number().nonnegative().optional(),
  })
  .strict();
export type WageData = z.infer<typeof WageDataSchema>;
 
export const RoleEstimateSchema = z
  .object({
    role: z.string(),
    headcount: z.number().nonnegative(),
    confidence: ConfidenceTierSchema,
    seniority_mix: SeniorityMixSchema.optional(),
  })
  .strict();
export type RoleEstimate = z.infer<typeof RoleEstimateSchema>;
 
export const RoleWageSchema = z
  .object({
    country: z.string(),
    role: z.string(),
    wage_data: WageDataSchema,
    source: z.string().optional(),
    source_url: z.string().url().optional(),
    confidence: ConfidenceTierSchema,
  })
  .strict();
export type RoleWage = z.infer<typeof RoleWageSchema>;
 
export const BusinessCaseSchema = z
  .object({
    annual_labour_cost_eur: z.number().nonnegative().optional(),
    potential_saving_eur: z.number().nonnegative().optional(),
    net_saving_eur: z.number().optional(),
    summary: z.string().optional(),
    top_automation_opportunity: z.string().optional(),
    risk_factors: z.array(z.string()).optional(),
  })
  .strict();
export type BusinessCase = z.infer<typeof BusinessCaseSchema>;
 
// The full enriched record. Kept permissive (loose nested objects) so the
// engine can extend it; the API serialises a typed subset to clients.
export const EnrichedCompanySchema = z
  .object({
    domain: z.string(),
    company_name: z.string().optional(),
    country: z.string().optional(),
    industry: z.string().optional(),
    headcount_estimate: z.number().nonnegative().optional(),
    roles: z.array(RoleEstimateSchema).default([]),
    wages: z.array(RoleWageSchema).default([]),
    business_case: BusinessCaseSchema.optional(),
    confidence: z.object({ overall: z.number().min(0).max(1) }).passthrough(),
    sources: z.array(z.string()).default([]),
  })
  .strict();
export type EnrichedCompany = z.infer<typeof EnrichedCompanySchema>;

Step 4: Re-export from the package index

In packages/types/src/index.ts, add this line directly under the existing import { z } from "zod"; line (top of file):

export * from "./enrichment";

Step 5: Run the test to verify it passes

Run: pnpm --filter @agent-metrics/types exec vitest run src/enrichment.test.ts Expected: PASS, all assertions green.

Step 6: Typecheck the package

Run: pnpm --filter @agent-metrics/types exec tsc --noEmit Expected: no errors.

Step 7: Commit

git add packages/types/src/enrichment.ts packages/types/src/enrichment.test.ts packages/types/src/index.ts
git commit -m "feat(types): seniority-ready enriched-company Zod schemas"

Task 5: Schema + seed verify script

Files:

Create: packages/db/scripts/verify-enrichment.ts
Step 1: Write the verify script

// packages/db/scripts/verify-enrichment.ts
// Asserts the Phase 1 enrichment schema applied and the seed data landed.
// Run after `pnpm --filter @agent-metrics/db migrate`:
//   pnpm --filter @agent-metrics/db exec tsx scripts/verify-enrichment.ts
// Exits non-zero on the first failed assertion.
 
import { existsSync } from "node:fs";
import { dirname, join, resolve } from "node:path";
import { fileURLToPath } from "node:url";
 
import { config as loadDotenv } from "dotenv";
import pg from "pg";
 
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const REPO_ROOT = resolve(__dirname, "..", "..", "..");
for (const path of [
  join(REPO_ROOT, "apps", "web", ".env.local"),
  join(REPO_ROOT, "packages", "db", ".env"),
  join(REPO_ROOT, ".env.local"),
  join(REPO_ROOT, ".env"),
]) {
  if (existsSync(path)) loadDotenv({ path, override: false });
}
 
const databaseUrl = process.env.DATABASE_URL;
if (!databaseUrl) {
  console.error("DATABASE_URL is not set.");
  process.exit(1);
}
 
const client = new pg.Client({
  connectionString: databaseUrl,
  ssl: { rejectUnauthorized: false },
});
await client.connect();
 
const failures: string[] = [];
 
async function tableExists(name: string): Promise<boolean> {
  const { rows } = await client.query<{ exists: boolean }>(
    "select to_regclass($1) is not null as exists",
    [`public.${name}`],
  );
  return rows[0]?.exists ?? false;
}
 
async function rlsEnabled(name: string): Promise<boolean> {
  const { rows } = await client.query<{ relrowsecurity: boolean }>(
    "select relrowsecurity from pg_class where oid = to_regclass($1)",
    [`public.${name}`],
  );
  return rows[0]?.relrowsecurity ?? false;
}
 
async function count(name: string): Promise<number> {
  const { rows } = await client.query<{ n: string }>(
    `select count(*)::text as n from public.${name}`,
  );
  return Number(rows[0]?.n ?? "0");
}
 
const expectedTables = [
  "company_research",
  "wages",
  "org_companies",
  "enrichment_jobs",
  "enrichment_job_items",
  "company_lookups",
  "refresh_log",
  "canonical_roles",
  "employer_factors",
  "role_distributions",
  "automation_rates",
  "wage_reference",
];
 
for (const t of expectedTables) {
  if (!(await tableExists(t))) failures.push(`missing table: ${t}`);
  else if (!(await rlsEnabled(t))) failures.push(`RLS not enabled: ${t}`);
}
 
// Seeded reference data present.
if ((await tableExists("canonical_roles")) && (await count("canonical_roles")) !== 14) {
  failures.push(`canonical_roles expected 14 rows, got ${await count("canonical_roles")}`);
}
if ((await tableExists("employer_factors")) && (await count("employer_factors")) !== 6) {
  failures.push(`employer_factors expected 6 rows, got ${await count("employer_factors")}`);
}
 
// Phase-2-populated tables must exist but start empty.
for (const t of ["role_distributions", "automation_rates", "wage_reference"]) {
  if ((await tableExists(t)) && (await count(t)) !== 0) {
    failures.push(`${t} should be empty in Phase 1, got ${await count(t)} rows`);
  }
}
 
await client.end();
 
if (failures.length > 0) {
  console.error("VERIFY FAILED:");
  for (const f of failures) console.error("  - " + f);
  process.exit(1);
}
 
console.log("Enrichment Phase 1 schema verified:");
console.log("  tables: " + expectedTables.length + " present, RLS on");
console.log("  canonical_roles=14, employer_factors=6");
console.log("  role_distributions/automation_rates/wage_reference present and empty");

Step 2: Run migrations then the verify script

Run: pnpm --filter @agent-metrics/db migrate && pnpm --filter @agent-metrics/db exec tsx scripts/verify-enrichment.ts Expected: migrate prints [skip] for already-applied files and applies any new ones; verify prints Enrichment Phase 1 schema verified: and exits 0.

Step 3: Typecheck the db package

Run: pnpm --filter @agent-metrics/db exec tsc --noEmit Expected: no errors.

Step 4: Commit

git add packages/db/scripts/verify-enrichment.ts
git commit -m "test(db): verify enrichment Phase 1 schema + seed"

Self-review (completed by plan author)

Spec coverage: §5 data model tables all created (company_research, org_companies, wages, enrichment_jobs/items, company_lookups, refresh_log) + the reference-data/methodology tables; confidence-tier model encoded in ConfidenceTierSchema; seniority-ready jsonb + Zod. §6/§8 (API, quota) are out of scope for Phase 1 by design; company_lookups shape is in place for them. Items deferred to Phase 2 (role_distributions/automation_rates/wage_reference values, live stats) are explicitly flagged, not silently dropped.
Placeholder scan: no TBD/TODO in delivered code; the "populated in Phase 2" tables are a stated phase boundary with verify assertions (must exist + be empty), not placeholders.
Type consistency: ConfidenceTierSchema values match between the Zod module and the confidence jsonb usage description; table/column names used in the verify script match the three migrations exactly; org_companies.research_id references company_research(id).

Out of scope / next plans

Phase 2: enrichment service (Sonar + Claude via OpenRouter), deterministic packages/core/enrichment pure functions, cache-first, confidence rollup, and seeding role_distributions/automation_rates/wage_reference from cited sources + the >=80% eval suite.
Phase 3: sync API (/v1/companies + export) + lookup charging against company_lookups + hard-cap (402) + new api-key scopes.
Phase 4: bulk (/v1/companies/bulk) + pg_cron claim-worker + /v1/jobs/{id}.
Phase 5: pricing/quota (plan-gate caps Free 10 / Pro 100 / Agency 500, no overage) + pricing page + the EUR 49/249 migration. Reconcile the OrgPlanSchema naming first.
Phase 6: frontend library + detail + upload + download + badges.
Phase 7: docs + methodology doc + SDK touch-ups.