Public Claims & Receipts
Claims · Receipts · Source of Truth

Every public number,
backed by a receipt.

Every quantitative claim on this site — benchmarks, win-rates, retention scores, prices — is backed by code, a paper section, or a frozen experimental log. This page is a contract. If a number on our site doesn't reconcile to its source, that's a bug we want to know about.

Last reviewed: 2026-05-07 Report a discrepancy: modelbrewai@gmail.com

CRMA drift, ablation, and retention numbers.

Every CL number is a 3-seed average from our internal protocol on Mistral-7B (5 sequential domains: medical → legal → finance → code → equity research) and a same-weights inference-time ablation on Gemma-2-9B. Per-seed MODULAR and NAIVE ranges are disjoint at every seed. All forgetting numbers are conditional on correct inference-time routing — we say so on every page that cites them.

CL-01 −0.17% ± 0.17 MODULAR drift across 5 sequential domains (Mistral-7B, 3 seeds).
Pages
continual.html · all-features.html · index.html
Source
3-seed average across seeds 0, 42, 1234. Frozen experiment logs in private research repo. Reproduced in paper3-zero-forgetting; full method in our preprint.
Last verified
2026-05-06
CL-02 +42.96% ± 5.5 NAIVE sequential-LoRA forgetting (same protocol, 3 seeds).
Pages
continual.html · all-features.html
Source
Naive baseline run alongside CL-01 on identical seeds, identical domain order. Per-seed MODULAR and NAIVE ranges disjoint at every seed.
Last verified
2026-05-06
CL-03 +1.95% ± 0.64 FROZEN-backbone drift (same protocol, 3 seeds).
Pages
continual.html · all-features.html
Source
Frozen-backbone baseline (no CRMA, no per-task adapter). Same seeds, same domain order. CL-03 is the gap our CRMA backbone closes.
Last verified
2026-05-06
CL-04 98/100 with CRMA vs 38/100 without on the Gemma-2-9B inference ablation. Wilson 95% CI [93.0%, 99.5%] vs [29.0%, 47.8%].
Pages
continual.html · all-features.html · index.html
Source
Same-weights ablation: 100 held-out questions, identical Gemma-2-9B checkpoint, only CRMA toggled at inference. First-author rated. A blinded two-rater audit is on our roadmap and will replace this number when it lands.
Last verified
2026-05-06
CL-05 18/18 Saul-7B legal sub-domain retention.
Pages
continual.html · all-features.html · finetuning.html
Source
Per-sub-domain retention check after sequential CL on Saul-7B. First-author rated. Same J5 hedge as CL-04: blinded two-rater audit is on our roadmap.
Last verified
2026-05-06
CL-06 Mixing-matrix spectral norm ‖M‖₂ held at 1.0 within float32 precision across 867 logged training steps. Max deviation < 1.2×10⁻⁷.
Pages
continual.html · all-features.html
Source
Logged at every step of the Gemma-2-9B 5-domain run. Birkhoff bound holds by construction (parameterization in utils/crma.py); the log is the empirical confirmation.
Last verified
2026-05-06
CL-07 Validated on 5 production models across 4 architecture families.
Pages
all-features.html
Source
5 models: Mistral-7B, Saul-7B (Mistral-derived), Qwen3-8B, Gemma-2-9B, Llama-3.1-8B. 4 architecture families: Mistral (covers Saul + Mistral), Qwen, Gemma, Llama. Defined in backend/server.py ALLOWED_MODELS (lines 1778-1785).
Last verified
2026-05-07

Cleaner throughput & memory benchmarks.

Numbers below come from tools/bench_cleaner.py, run on a single CPU worker against two real corpora (OASST1 and a 100k military-text corpus). Same code that processes your dataset in production. No skip-on-cap, no silent passes.

CLN-01 211 rows / sec, 475 s end-to-end, peak RSS 863 MB on a 100,000-row military corpus.
Pages
optimizer.html
Source
Single-worker run of tools/bench_cleaner.py. OPSEC + jailbreak + judge-rubric all on. Output preserved as a frozen log; rerun when the cleaner pipeline materially changes.
Last verified
2026-04-29
CLN-02 Linear scale-out: add workers, throughput scales linearly. No distributed-state setup.
Pages
optimizer.html
Source
Each row is processed independently by stateless validators + an idempotent autofix pass (see backend/cleaner/autofix.py). Workers share no per-row state, so wall-clock is bounded by max(workers).
Last verified
2026-04-29
CLN-03 Score-floor + revert-on-degrade: post-clean quality score is never lower than pre-clean.
Pages
security.html · optimizer.html
Source
Enforced per-row in backend/cleaner/. If the rewrite would lower the row's score, the original is kept. Idempotency property test: tests/test_cleaner_idempotency.py.
Last verified
2026-05-07
CLN-04 MinHash + LSH dedup: 128 permutations, 3-word shingles, O(n) at 100k+ rows.
Pages
optimizer.html
Source
Configuration set in cleaner dedup module. Verified by RapidFuzz ratio on candidate pairs returned by the LSH bucket.
Last verified
2026-04-29
CLN-05 Detoxify (BERT, unbiased-small) at 0.80 / 0.55 thresholds; skips above 200 rows with a visible warning.
Pages
optimizer.html
Source
Threshold constants in cleaner toxicity module. Cap warning is surfaced on the scan summary; never silent.
Last verified
2026-04-29

What you pay, where it's hard-coded.

Pricing claims are easy to verify: the source of truth is one Python file. Every dollar amount on the marketing site reconciles to a constant in backend/pricing.py or backend/db.py.

PRC-01 Fine-tuning: $3.99 per million tokens, all 7–9B models (advanced tier).
Pages
finetuning.html · all-features.html · index.html
Source
backend/pricing.py:25("ft", "advanced"): 399 (cents per million tokens).
Last verified
2026-05-07
PRC-02 Free tier: 3 training runs per day on TinyLlama, no card required.
Pages
finetuning.html · all-features.html
Source
backend/server.py:2700 — "Free tier limit: 3 runs/day reached." atomic-conditional INSERT in backend/db.py:2189.
Last verified
2026-05-07
PRC-03 75 credits ($7.50) at signup, no card.
Pages
finetuning.html · all-features.html
Source
backend/db.py:1016SIGNUP_BONUS_CENTS = 750. Granted via add_credits on user create (backend/db.py:1033).
Last verified
2026-05-07
PRC-04 Minimum credit purchase: $20. Credits never expire.
Pages
finetuning.html · all-features.html
Source
backend/pricing.py:6 docstring — "Minimum credit purchase: $20 (credits never expire)." Stripe checkout enforces the minimum.
Last verified
2026-05-07
PRC-05 Clean-with-AI: $5 per 200 rows (50 credits per 200 rows).
Pages
all-features.html · finetuning.html
Source
Pricing-table row in deploy/all-features.html:4291 and deploy/finetuning.html:486: 50 credits per 200 rows. 1 credit = $0.10.
Last verified
2026-05-07
PRC-06 Per-account daily ceiling on cleaner-AI spend: $50/day soft cap.
Pages
security.html
Source
_cleaner_ai_daily_cap_cents() in backend/routes/cleaner.py; rollup in backend/db.py:2632-2661; atomic charge in backend/db.py:2814 (charge_for_cleanai_run_under_daily_cap). Default ceiling: 5000 cents = $50.
Last verified
2026-05-07
PRC-07 Inference (combined input + output): $1.00 per million tokens, advanced tier; free on TinyLlama.
Pages
finetuning.html · all-features.html
Source
backend/pricing.py:34-37PRICE_PER_M_INFERENCE_TOKENS = {"advanced": 100, "free": 0}.
Last verified
2026-05-07

What you can fine-tune today.

MOD-01 6 supported models: TinyLlama (free), Mistral-7B, Saul-7B, Qwen3-8B, Gemma-2-9B, Llama-3.1-8B.
Pages
finetuning.html · all-features.html
Source
backend/server.py:1778-1785ALLOWED_MODELS dict. Live at GET /supported_models and GET /models.
Last verified
2026-05-07
MOD-02 Production GPU: A100 on Modal.
Pages
all-features.html
Source
Every @app.function in modal_deploy.py sets gpu="A100". Modal Pro tier (confirmed 2026-04-05).
Last verified
2026-05-07
MOD-03 Continual learning is in beta — not self-serve.
Pages
continual.html · finetuning.html · all-features.html
Source
No CL price published on any marketing surface (the previous $4.99/M tier was pulled site-wide on 2026-05-06, commit 737337b). CL endpoints exist but require manual access grant.
Last verified
2026-05-07

What we don't claim — on purpose.

A short list of things our marketing pages do not say, because we don't have the receipts to back them yet. We'd rather list the absence than fudge the citation.

Not claimed today

  • No priority-GPU SLA. Modal A100 is shared scheduling. We don't sell or promise dedicated capacity.
  • No SOC 2 / HIPAA / BAA. Small team, pre-revenue. We'll publish those when (and only when) they're audited and signed.
  • No third-party-blinded benchmark numbers. The 98/100 and 18/18 scores (CL-04, CL-05) are first-author rated. A blinded two-rater audit is on the roadmap.
  • No PFT (parameter-efficient fine-tuning) advantage claim. CRMA is for continual learning. We do not claim CRMA fine-tunes a single domain better than naive LoRA.
  • No multi-region deployment. Single Modal region; single Turso primary.
  • No vector DB / RAG product. Fine-tuning is the product. RAG is the comparison baseline.
  • No status-page uptime number. A public status page is on the roadmap; until then, don't infer a 99.x% from absence.

How we keep this page honest.

Update policy

Numbers update when underlying experiments are re-run or code changes the source of truth. Stale claims are removed within 7 days of detection. The "last verified" date on each card tells you the most recent date we re-checked the citation against the cited file or log.

If you find a number on our marketing site that doesn't reconcile to its receipt — or a number anywhere on our site that we haven't put on this page — please let us know.

modelbrewai@gmail.com