Claims · Receipts · Source of Truth

Every public number,
backed by a receipt.

Every quantitative claim on this site — benchmarks, win-rates, retention scores, prices — is backed by code, a paper section, or a frozen experimental log. This page is a contract. If a number on our site doesn't reconcile to its source, that's a bug we want to know about.

Last reviewed: 2026-05-07 Report a discrepancy: modelbrewai@gmail.com

CRMA drift, ablation, and retention numbers.

Every CL number is a 3-seed average from our internal protocol on Mistral-7B (5 sequential domains: medical → legal → finance → code → equity research) and a same-weights inference-time ablation on Gemma-2-9B. Per-seed MODULAR and NAIVE ranges are disjoint at every seed. All forgetting numbers are conditional on correct inference-time routing — we say so on every page that cites them.

CL-01 −0.17% ± 0.17 MODULAR drift across 5 sequential domains (Mistral-7B, 3 seeds).

Pages: continual.html · all-features.html · index.html
Source: 3-seed average across seeds 0, 42, 1234. Frozen experiment logs in private research repo. Reproduced in paper3-zero-forgetting; full method in our preprint.
Last verified: 2026-05-06

CL-02 +42.96% ± 5.5 NAIVE sequential-LoRA forgetting (same protocol, 3 seeds).

Pages: continual.html · all-features.html
Source: Naive baseline run alongside CL-01 on identical seeds, identical domain order. Per-seed MODULAR and NAIVE ranges disjoint at every seed.
Last verified: 2026-05-06

CL-03 +1.95% ± 0.64 FROZEN-backbone drift (same protocol, 3 seeds).

Pages: continual.html · all-features.html
Source: Frozen-backbone baseline (no CRMA, no per-task adapter). Same seeds, same domain order. CL-03 is the gap our CRMA backbone closes.
Last verified: 2026-05-06

CL-04 98/100 with CRMA vs 38/100 without on the Gemma-2-9B inference ablation. Wilson 95% CI [93.0%, 99.5%] vs [29.0%, 47.8%].

Pages: continual.html · all-features.html · index.html
Source: Same-weights ablation: 100 held-out questions, identical Gemma-2-9B checkpoint, only CRMA toggled at inference. First-author rated. A blinded two-rater audit is on our roadmap and will replace this number when it lands.
Last verified: 2026-05-06

CL-05 18/18 Saul-7B legal sub-domain retention.

Pages: continual.html · all-features.html · finetuning.html
Source: Per-sub-domain retention check after sequential CL on Saul-7B. First-author rated. Same J5 hedge as CL-04: blinded two-rater audit is on our roadmap.
Last verified: 2026-05-06

CL-06 Mixing-matrix spectral norm ‖M‖₂ held at 1.0 within float32 precision across 867 logged training steps. Max deviation < 1.2×10⁻⁷.

Pages: continual.html · all-features.html
Source: Logged at every step of the Gemma-2-9B 5-domain run. Birkhoff bound holds by construction (parameterization in utils/crma.py); the log is the empirical confirmation.
Last verified: 2026-05-06

CL-07 Validated on 5 production models across 4 architecture families.

Pages: all-features.html
Source: 5 models: Mistral-7B, Saul-7B (Mistral-derived), Qwen3-8B, Gemma-2-9B, Llama-3.1-8B. 4 architecture families: Mistral (covers Saul + Mistral), Qwen, Gemma, Llama. Defined in backend/server.py ALLOWED_MODELS (lines 1778-1785).
Last verified: 2026-05-07

02 · Dataset cleaner

Cleaner throughput & memory benchmarks.

Numbers below come from tools/bench_cleaner.py, run on a single CPU worker against two real corpora (OASST1 and a 100k military-text corpus). Same code that processes your dataset in production. No skip-on-cap, no silent passes.

CLN-01 211 rows / sec, 475 s end-to-end, peak RSS 863 MB on a 100,000-row military corpus.

Pages: optimizer.html
Source: Single-worker run of tools/bench_cleaner.py. OPSEC + jailbreak + judge-rubric all on. Output preserved as a frozen log; rerun when the cleaner pipeline materially changes.
Last verified: 2026-04-29

CLN-02 Linear scale-out: add workers, throughput scales linearly. No distributed-state setup.

Pages: optimizer.html
Source: Each row is processed independently by stateless validators + an idempotent autofix pass (see backend/cleaner/autofix.py). Workers share no per-row state, so wall-clock is bounded by max(workers).
Last verified: 2026-04-29

CLN-03 Score-floor + revert-on-degrade: post-clean quality score is never lower than pre-clean.

Pages: security.html · optimizer.html
Source: Enforced per-row in backend/cleaner/. If the rewrite would lower the row's score, the original is kept. Idempotency property test: tests/test_cleaner_idempotency.py.
Last verified: 2026-05-07

CLN-04 MinHash + LSH dedup: 128 permutations, 3-word shingles, O(n) at 100k+ rows.

Pages: optimizer.html
Source: Configuration set in cleaner dedup module. Verified by RapidFuzz ratio on candidate pairs returned by the LSH bucket.
Last verified: 2026-04-29

CLN-05 Detoxify (BERT, unbiased-small) at 0.80 / 0.55 thresholds; skips above 200 rows with a visible warning.

Pages: optimizer.html
Source: Threshold constants in cleaner toxicity module. Cap warning is surfaced on the scan summary; never silent.
Last verified: 2026-04-29

03 · Pricing

What you pay, where it's hard-coded.

Pricing claims are easy to verify: the source of truth is one Python file. Every dollar amount on the marketing site reconciles to a constant in backend/pricing.py or backend/db.py.

PRC-01 Fine-tuning: $3.99 per million tokens, all 7–9B models (advanced tier).

Pages: finetuning.html · all-features.html · index.html
Source: backend/pricing.py:25 — ("ft", "advanced"): 399 (cents per million tokens).
Last verified: 2026-05-07

PRC-02 Free tier: 3 training runs per day on TinyLlama, no card required.

Pages: finetuning.html · all-features.html
Source: backend/server.py:2700 — "Free tier limit: 3 runs/day reached." atomic-conditional INSERT in backend/db.py:2189.
Last verified: 2026-05-07

PRC-03 75 credits ($7.50) at signup, no card.

Pages: finetuning.html · all-features.html
Source: backend/db.py:1016 — SIGNUP_BONUS_CENTS = 750. Granted via add_credits on user create (backend/db.py:1033).
Last verified: 2026-05-07

PRC-04 Minimum credit purchase: $20. Credits never expire.

Pages: finetuning.html · all-features.html
Source: backend/pricing.py:6 docstring — "Minimum credit purchase: $20 (credits never expire)." Stripe checkout enforces the minimum.
Last verified: 2026-05-07

PRC-05 Clean-with-AI: $5 per 200 rows (50 credits per 200 rows).

Pages: all-features.html · finetuning.html
Source: Pricing-table row in deploy/all-features.html:4291 and deploy/finetuning.html:486: 50 credits per 200 rows. 1 credit = $0.10.
Last verified: 2026-05-07

PRC-06 Per-account daily ceiling on cleaner-AI spend: $50/day soft cap.

Pages: security.html
Source: _cleaner_ai_daily_cap_cents() in backend/routes/cleaner.py; rollup in backend/db.py:2632-2661; atomic charge in backend/db.py:2814 (charge_for_cleanai_run_under_daily_cap). Default ceiling: 5000 cents = $50.
Last verified: 2026-05-07

PRC-07 Inference (combined input + output): $1.00 per million tokens, advanced tier; free on TinyLlama.

Pages: finetuning.html · all-features.html
Source: backend/pricing.py:34-37 — PRICE_PER_M_INFERENCE_TOKENS = {"advanced": 100, "free": 0}.
Last verified: 2026-05-07

04 · Model coverage

What you can fine-tune today.

MOD-01 6 supported models: TinyLlama (free), Mistral-7B, Saul-7B, Qwen3-8B, Gemma-2-9B, Llama-3.1-8B.

Pages: finetuning.html · all-features.html
Source: backend/server.py:1778-1785 — ALLOWED_MODELS dict. Live at GET /supported_models and GET /models.
Last verified: 2026-05-07

MOD-02 Production GPU: A100 on Modal.

Pages: all-features.html
Source: Every @app.function in modal_deploy.py sets gpu="A100". Modal Pro tier (confirmed 2026-04-05).
Last verified: 2026-05-07

MOD-03 Continual learning is in beta — not self-serve.

Pages: continual.html · finetuning.html · all-features.html
Source: No CL price published on any marketing surface (the previous $4.99/M tier was pulled site-wide on 2026-05-06, commit 737337b). CL endpoints exist but require manual access grant.
Last verified: 2026-05-07

05 · Disclosure

What we don't claim — on purpose.

A short list of things our marketing pages do not say, because we don't have the receipts to back them yet. We'd rather list the absence than fudge the citation.

Not claimed today

No priority-GPU SLA. Modal A100 is shared scheduling. We don't sell or promise dedicated capacity.
No SOC 2 / HIPAA / BAA. Small team, pre-revenue. We'll publish those when (and only when) they're audited and signed.
No third-party-blinded benchmark numbers. The 98/100 and 18/18 scores (CL-04, CL-05) are first-author rated. A blinded two-rater audit is on the roadmap.
No PFT (parameter-efficient fine-tuning) advantage claim. CRMA is for continual learning. We do not claim CRMA fine-tunes a single domain better than naive LoRA.
No multi-region deployment. Single Modal region; single Turso primary.
No vector DB / RAG product. Fine-tuning is the product. RAG is the comparison baseline.
No status-page uptime number. A public status page is on the roadmap; until then, don't infer a 99.x% from absence.

06 · Update policy

How we keep this page honest.

Update policy

Numbers update when underlying experiments are re-run or code changes the source of truth. Stale claims are removed within 7 days of detection. The "last verified" date on each card tells you the most recent date we re-checked the citation against the cited file or log.

If you find a number on our marketing site that doesn't reconcile to its receipt — or a number anywhere on our site that we haven't put on this page — please let us know.

modelbrewai@gmail.com

Every public number,backed by a receipt.

On this page