Claims · Receipts · Source of Truth
Every public number,
backed by a receipt.
Every quantitative claim on this site — benchmarks, win-rates, retention scores, prices — is backed by
code, a paper section, or a frozen experimental log. This page is a contract.
If a number on our site doesn't reconcile to its source, that's a bug we want to know about.
Last reviewed: 2026-05-07
Report a discrepancy: modelbrewai@gmail.com
01 · Continual learning
CRMA drift, ablation, and retention numbers.
Every CL number is a 3-seed average from our internal protocol on Mistral-7B (5 sequential domains: medical → legal → finance → code → equity research) and a same-weights inference-time ablation on Gemma-2-9B. Per-seed MODULAR and NAIVE ranges are disjoint at every seed. All forgetting numbers are conditional on correct inference-time routing — we say so on every page that cites them.
CL-01
−0.17% ± 0.17 MODULAR drift across 5 sequential domains (Mistral-7B, 3 seeds).
- Pages
- continual.html · all-features.html · index.html
- Source
- 3-seed average across seeds 0, 42, 1234. Frozen experiment logs in private research repo. Reproduced in paper3-zero-forgetting; full method in our preprint.
- Last verified
- 2026-05-06
CL-02
+42.96% ± 5.5 NAIVE sequential-LoRA forgetting (same protocol, 3 seeds).
- Pages
- continual.html · all-features.html
- Source
- Naive baseline run alongside CL-01 on identical seeds, identical domain order. Per-seed MODULAR and NAIVE ranges disjoint at every seed.
- Last verified
- 2026-05-06
CL-03
+1.95% ± 0.64 FROZEN-backbone drift (same protocol, 3 seeds).
- Pages
- continual.html · all-features.html
- Source
- Frozen-backbone baseline (no CRMA, no per-task adapter). Same seeds, same domain order. CL-03 is the gap our CRMA backbone closes.
- Last verified
- 2026-05-06
CL-04
98/100 with CRMA vs 38/100 without on the Gemma-2-9B inference ablation. Wilson 95% CI [93.0%, 99.5%] vs [29.0%, 47.8%].
- Pages
- continual.html · all-features.html · index.html
- Source
- Same-weights ablation: 100 held-out questions, identical Gemma-2-9B checkpoint, only CRMA toggled at inference. First-author rated. A blinded two-rater audit is on our roadmap and will replace this number when it lands.
- Last verified
- 2026-05-06
CL-05
18/18 Saul-7B legal sub-domain retention.
- Pages
- continual.html · all-features.html · finetuning.html
- Source
- Per-sub-domain retention check after sequential CL on Saul-7B. First-author rated. Same J5 hedge as CL-04: blinded two-rater audit is on our roadmap.
- Last verified
- 2026-05-06
CL-06
Mixing-matrix spectral norm ‖M‖₂ held at 1.0 within float32 precision across 867 logged training steps. Max deviation < 1.2×10⁻⁷.
- Pages
- continual.html · all-features.html
- Source
- Logged at every step of the Gemma-2-9B 5-domain run. Birkhoff bound holds by construction (parameterization in
utils/crma.py); the log is the empirical confirmation.
- Last verified
- 2026-05-06
CL-07
Validated on 5 production models across 4 architecture families.
- Pages
- all-features.html
- Source
- 5 models: Mistral-7B, Saul-7B (Mistral-derived), Qwen3-8B, Gemma-2-9B, Llama-3.1-8B. 4 architecture families: Mistral (covers Saul + Mistral), Qwen, Gemma, Llama. Defined in
backend/server.py ALLOWED_MODELS (lines 1778-1785).
- Last verified
- 2026-05-07
02 · Dataset cleaner
Cleaner throughput & memory benchmarks.
Numbers below come from tools/bench_cleaner.py, run on a single CPU worker against two real corpora (OASST1 and a 100k military-text corpus). Same code that processes your dataset in production. No skip-on-cap, no silent passes.
CLN-01
211 rows / sec, 475 s end-to-end, peak RSS 863 MB on a 100,000-row military corpus.
- Pages
- optimizer.html
- Source
- Single-worker run of
tools/bench_cleaner.py. OPSEC + jailbreak + judge-rubric all on. Output preserved as a frozen log; rerun when the cleaner pipeline materially changes.
- Last verified
- 2026-04-29
CLN-02
Linear scale-out: add workers, throughput scales linearly. No distributed-state setup.
- Pages
- optimizer.html
- Source
- Each row is processed independently by stateless validators + an idempotent autofix pass (see
backend/cleaner/autofix.py). Workers share no per-row state, so wall-clock is bounded by max(workers).
- Last verified
- 2026-04-29
CLN-03
Score-floor + revert-on-degrade: post-clean quality score is never lower than pre-clean.
- Pages
- security.html · optimizer.html
- Source
- Enforced per-row in
backend/cleaner/. If the rewrite would lower the row's score, the original is kept. Idempotency property test: tests/test_cleaner_idempotency.py.
- Last verified
- 2026-05-07
CLN-04
MinHash + LSH dedup: 128 permutations, 3-word shingles, O(n) at 100k+ rows.
- Pages
- optimizer.html
- Source
- Configuration set in cleaner dedup module. Verified by RapidFuzz ratio on candidate pairs returned by the LSH bucket.
- Last verified
- 2026-04-29
CLN-05
Detoxify (BERT, unbiased-small) at 0.80 / 0.55 thresholds; skips above 200 rows with a visible warning.
- Pages
- optimizer.html
- Source
- Threshold constants in cleaner toxicity module. Cap warning is surfaced on the scan summary; never silent.
- Last verified
- 2026-04-29
03 · Pricing
What you pay, where it's hard-coded.
Pricing claims are easy to verify: the source of truth is one Python file. Every dollar amount on the marketing site reconciles to a constant in backend/pricing.py or backend/db.py.
PRC-01
Fine-tuning: $3.99 per million tokens, all 7–9B models (advanced tier).
- Pages
- finetuning.html · all-features.html · index.html
- Source
backend/pricing.py:25 — ("ft", "advanced"): 399 (cents per million tokens).
- Last verified
- 2026-05-07
PRC-02
Free tier: 3 training runs per day on TinyLlama, no card required.
- Pages
- finetuning.html · all-features.html
- Source
backend/server.py:2700 — "Free tier limit: 3 runs/day reached." atomic-conditional INSERT in backend/db.py:2189.
- Last verified
- 2026-05-07
PRC-03
75 credits ($7.50) at signup, no card.
- Pages
- finetuning.html · all-features.html
- Source
backend/db.py:1016 — SIGNUP_BONUS_CENTS = 750. Granted via add_credits on user create (backend/db.py:1033).
- Last verified
- 2026-05-07
PRC-04
Minimum credit purchase: $20. Credits never expire.
- Pages
- finetuning.html · all-features.html
- Source
backend/pricing.py:6 docstring — "Minimum credit purchase: $20 (credits never expire)." Stripe checkout enforces the minimum.
- Last verified
- 2026-05-07
PRC-05
Clean-with-AI: $5 per 200 rows (50 credits per 200 rows).
- Pages
- all-features.html · finetuning.html
- Source
- Pricing-table row in
deploy/all-features.html:4291 and deploy/finetuning.html:486: 50 credits per 200 rows. 1 credit = $0.10.
- Last verified
- 2026-05-07
PRC-06
Per-account daily ceiling on cleaner-AI spend: $50/day soft cap.
- Pages
- security.html
- Source
_cleaner_ai_daily_cap_cents() in backend/routes/cleaner.py; rollup in backend/db.py:2632-2661; atomic charge in backend/db.py:2814 (charge_for_cleanai_run_under_daily_cap). Default ceiling: 5000 cents = $50.
- Last verified
- 2026-05-07
PRC-07
Inference (combined input + output): $1.00 per million tokens, advanced tier; free on TinyLlama.
- Pages
- finetuning.html · all-features.html
- Source
backend/pricing.py:34-37 — PRICE_PER_M_INFERENCE_TOKENS = {"advanced": 100, "free": 0}.
- Last verified
- 2026-05-07
04 · Model coverage
What you can fine-tune today.
MOD-01
6 supported models: TinyLlama (free), Mistral-7B, Saul-7B, Qwen3-8B, Gemma-2-9B, Llama-3.1-8B.
- Pages
- finetuning.html · all-features.html
- Source
backend/server.py:1778-1785 — ALLOWED_MODELS dict. Live at GET /supported_models and GET /models.
- Last verified
- 2026-05-07
MOD-02
Production GPU: A100 on Modal.
- Pages
- all-features.html
- Source
- Every
@app.function in modal_deploy.py sets gpu="A100". Modal Pro tier (confirmed 2026-04-05).
- Last verified
- 2026-05-07
MOD-03
Continual learning is in beta — not self-serve.
- Pages
- continual.html · finetuning.html · all-features.html
- Source
- No CL price published on any marketing surface (the previous $4.99/M tier was pulled site-wide on 2026-05-06, commit
737337b). CL endpoints exist but require manual access grant.
- Last verified
- 2026-05-07
05 · Disclosure
What we don't claim — on purpose.
A short list of things our marketing pages do not say, because we don't have the receipts to back them yet. We'd rather list the absence than fudge the citation.
Not claimed today
- No priority-GPU SLA. Modal A100 is shared scheduling. We don't sell or promise dedicated capacity.
- No SOC 2 / HIPAA / BAA. Small team, pre-revenue. We'll publish those when (and only when) they're audited and signed.
- No third-party-blinded benchmark numbers. The 98/100 and 18/18 scores (CL-04, CL-05) are first-author rated. A blinded two-rater audit is on the roadmap.
- No PFT (parameter-efficient fine-tuning) advantage claim. CRMA is for continual learning. We do not claim CRMA fine-tunes a single domain better than naive LoRA.
- No multi-region deployment. Single Modal region; single Turso primary.
- No vector DB / RAG product. Fine-tuning is the product. RAG is the comparison baseline.
- No status-page uptime number. A public status page is on the roadmap; until then, don't infer a 99.x% from absence.
06 · Update policy
How we keep this page honest.
Update policy
Numbers update when underlying experiments are re-run or code changes the source of truth.
Stale claims are removed within 7 days of detection. The "last verified" date on each card
tells you the most recent date we re-checked the citation against the cited file or log.
If you find a number on our marketing site that doesn't reconcile to its receipt — or a number anywhere
on our site that we haven't put on this page — please let us know.
modelbrewai@gmail.com