ModelBrew
BILATERAL VERIFICATION · 220 cells reviewed · 7.7% disagree-rate · published 2026-05-07

How ModelBrew compares.
The table, with receipts.

Two independent research agents read the public materials of ten competitors. They each filed a feature matrix. We diffed them cell by cell and dropped every row where they disagreed. What survived is the table below. Every YES cell hovers a citation.

0
FEATURES COMPARED
0
COMPETITORS SURVEYED
0
VERIFIED MODEL FAMILIES
0
CELLS BILATERAL-DIFFED
The matrix

The bilateral-verified table.

Read order: green ✓ = both researchers verified YES with a citation. Orange ◐ = MB-PARTIAL with an honest caveat. Silent dash = competitor's docs did not publicly disclose the feature at research time (this is not a claim of absence). Hover any ✓ to see the source. Anchor IDs (#feature-A1) let you deep-link any row.

Feature ModelBrew Predibase* Together HF AutoTrain Fireworks* OpenAI* Argilla Cleanlab DeepEval Snorkel Galileo*
A · Continual learning without forgetting
Spectral / structural CL guaranteeCRMA constrained-residual mixing; patent pending CRMA in utils/crma.py + paper at paper/crma_modular_cl_arxiv.tex. Architectural constraint exists in code; no public formal proof yet — partial.
Multi-seed reproducibility numbers published3 seeds at 7B; per-seed ranges disjoint Source: scripts/stage2_mistral_3seed.py · numbers cited at /claims
Add new domain without breaking old ones5-domain Mistral-7B chain at 26/31 zero-forget Source: utils/cl_config.py + sequential CL_p2 chain; result cited at /claims
B · Cleaner contractual invariants
Score-floor revertPost-clean score never below pre-clean Source: backend/cleaner/llm_rewrite.py:270 · property-tested at tests/cleaner/test_K1_invariants.py
DeterminismSame input → same output, property-tested Source: backend/tests/cleaner/test_K1_invariants.py — property-tested invariant
MonotonicityIssue count never increases after clean Source: backend/tests/cleaner/test_K1_invariants.py — monotonicity invariant
Per-row revert-on-degradeIf a row's score drops, revert that row only Source: backend/cleaner/service.py + autofix.py — per-row revert in production
C · LLM judge defenses
Prompt-injection hardenNFKC + zero-width strip + role-flip + polarity splice Source: backend/cleaner/prompt_safety.py + geval_judge.py
Chat-template token stripQwen3 <think>, Llama-3 <|eom_id|>, Phi-4 <|im_sep|> Source: backend/cleaner/autofix.py + tests/cleaner/test_autofix_chat_template_tokens.py
Atomic per-user daily AI cost capSQL transaction across 5 cleaner-AI routes Source: backend/db.py — atomic across 5 cleaner-AI routes (Wave 5 W5 S1)
D · Public trust signals
Public security page WITH retention numbers cited to codeOperational retention days, not just compliance logos Source: /security — TLS, AES-256, retention tied to backend/db.py security.snorkel.ai — SOC 2 + HIPAA + NIST CSF (no retention-days numbers cited)
Public claims / receipts pageEvery public number sourced to a code file or commit Source: /claims — every numeric claim → code path / paper section
Public live status pageAPI health surface — live or incident-style Source: /status — live Modal /health fetch (manual updates for incidents) status.fireworks.ai — incident-style + uptime % status.openai.com — incident-style + uptime %
E · Architectural transparency
Open arXiv paperFirst-party research artifact Source: paper/crma_modular_cl_arxiv.tex — CRMA paper, arXiv-ready Cleanlab confident-learning research lineage Snorkel (arXiv 1711.10160) + DryBell (1812.00417)
Patent pending or grantedDisclosed publicly CRMA — US provisional filed 2026-02-28
Public benchmark resultsFirst-party numbers, 3-seed Mistral-7B 26/31 zero-forget Source: /claims — 5-domain Mistral-7B chain at 26/31 zero-forget, 3-seed reproducibility, per-seed ranges disjoint. Paper at paper/crma_modular_cl_arxiv.tex. LoRA Land paper (arXiv 2405.00732)
Multi-architecture verification (≥4 model families)Not just "supports many models" TinyLlama, Mistral-7B, Saul-7B, Qwen3-8B, Gemma-2-9B, Llama-3.1-8B = 6 families LoRA Land — 10 base models tested
F · Engineering quality signals
Atomic SQL transactions for billing-capRace-free under concurrent jobs Source: backend/db.py — atomic across 5 cleaner-AI routes (Wave 5)
Property-test invariants in test suiteNot unit-tests-of-examples; property tests Source: backend/tests/cleaner/test_K1_invariants.py + test_score_floor_invariant.py
IDOR existence-oracle defense403 → 404 conversion + timing symmetry Wave 5 W5 S1 fix — backend/tests/cleaner/test_p0_rt_idor.py + test_s3_2_file_id_idor.py
Modal-side upload MIME / magic-byte guardValidate file-type at ingest Source: backend/cleaner/validators.py — MIME + magic-byte present, but inspects only first JSONL line (P2 backlog).
G · Inference & API surface
OpenAI-compatible /v1/chat/completionsDrop-in replacement for the OpenAI SDK; Bearer-key compatible Source: backend/server.py:5930 — full OpenAI envelope, scoped per run_id. docs.together.ai — OpenAI-compatible chat completions docs.fireworks.ai — chat completions API platform.openai.com — native
Adapter export ZIP — train here, run anywhereLoRA + CRMA bundle download via GET /download/{run_id} Source: backend/server.py:4371 + utils/inject.py — standard PEFT format, loads in vLLM / TGI / Transformers. HuggingFace AutoTrain — adapter pushed to the Hub by default
HF Hub README auto-export with IP-leak guardYAML frontmatter + sanitized hparams allow-list + XSS escape Source: backend/server.py:4287 + backend/readme_export.py_assert_safe_overlap hard-fails on patent-pending field names.
Per-API-key allowed-runs / allowed-models + RPM/TPM scopingLeast-privilege keys, fine-grained throttling per key Source: backend/db.py api_key_can_access_run / api_key_can_access_model + backend/server.py:8470 PATCH /me/api-keys/{id}
Streaming SSE generationToken-by-token via /v1/{run_id}/generate?stream=true Source: modal_deploy.py:2004 generate_text_stream + backend/server.py:5336 docs.together.ai — streaming on OpenAI envelope docs.fireworks.ai — streaming supported platform.openai.com — streaming native
CL chain visualization tree UIParent-child task graph rendered in dashboard Source: frontend/src/components/CLChainTree.tsx + backend/server.py:8123 GET /chain/{run_id}
H · Operational quality & data rights
Auto-refund on stuck runs (90-min sweeper)Money back automatically without a support ticket Source: tests/test_stuck_sweep_refund.py + backend/db.py refund_stuck_run() — 5-min cron, 90-min active-job window, pro-rata refund via _auto_refund with correlation_id trace.
GDPR data export + self-delete endpointsUser-portable JSON export + account self-deletion, no support ticket Source: backend/server.py:8880 GET /account/data-export + backend/server.py:8846 DELETE /account — CSV-injection sanitized at :8867.
Domain PII detectors with checksum validationDEA / NPI / CUSIP / SWIFT / ABA + 4 more — 9 detectors, defense + finance + legal Source: backend/cleaner/detectors_typed.py — 9 detectors with checksum validation (DEA, NPI, CUSIP, SWIFT/BIC, ABA, MRN, ICD-10, bar number, Bates).
Military OPSEC redaction togglesCUI / FOUO / SECRET markings, MGRS grid, EDIPI, DTG, lat/long, network refs Source: backend/cleaner/validators.py (opsec_* codes) — 6 categories, defense-customer-shaped feature.

A1 — Spectral / structural CL guarantee

CRMA constrained-residual mixing; patent pending
ModelBrew

A2 — Multi-seed reproducibility numbers

3 seeds at 7B
ModelBrew

A3 — Add new domain without breaking old ones

5-domain Mistral-7B chain at 26/31 zero-forget
ModelBrew

B1 — Score-floor revert

Post-clean ≥ pre-clean
ModelBrew

B2 — Determinism

Property-tested invariant
ModelBrew

B3 — Monotonicity

Issue count never increases
ModelBrew

B4 — Per-row revert-on-degrade

ModelBrew

C1 — Prompt-injection harden

NFKC + zero-width strip + role-flip + polarity splice
ModelBrew

C2 — Chat-template token strip

Qwen3 <think>, Llama-3 <|eom_id|>, Phi-4 <|im_sep|>
ModelBrew

C3 — Atomic per-user daily AI cost cap

SQL transaction across 5 routes
ModelBrew

D1 — Security page WITH retention numbers cited to code

Operational retention days, not just compliance logos
ModelBrew
Snorkel (page only)

D2 — Public claims / receipts page

Every number sourced to code
ModelBrew

D3 — Public live status page

ModelBrew
Fireworks
OpenAI

E1 — Open arXiv paper

ModelBrew
Cleanlab
Snorkel

E2 — Patent pending or granted

ModelBrew

E3 — Public benchmark results

3-seed Mistral-7B 26/31 zero-forget
ModelBrew
Predibase

E4 — Multi-architecture verification (≥4 families)

ModelBrew
Predibase

F1 — Atomic SQL transactions for billing-cap

ModelBrew

F2 — Property-test invariants in test suite

ModelBrew

F3 — IDOR existence-oracle defense

403 → 404 + timing symmetry
ModelBrew

F4 — Modal-side upload MIME / magic-byte guard

ModelBrew

G1 — OpenAI-compatible /v1/chat/completions

Drop-in for the OpenAI SDK
ModelBrew
Together
Fireworks
OpenAI

G2 — Adapter export ZIP (LoRA + CRMA bundle)

Train here, run anywhere
ModelBrew
HF AutoTrain

G3 — HF Hub README auto-export with IP-leak guard

Sanitized hparams allow-list
ModelBrew

G4 — Per-API-key allowed-runs/models + RPM/TPM

Least-privilege keys
ModelBrew

G5 — Streaming SSE generation

Token-by-token, ?stream=true
ModelBrew
Together
Fireworks
OpenAI

G6 — CL chain visualization tree UI

Parent-child task graph
ModelBrew

H1 — Auto-refund on stuck runs (90-min sweeper)

Money back automatically
ModelBrew

H2 — GDPR data export + self-delete

User-portable, no support ticket
ModelBrew

H3 — Domain PII detectors with checksum

DEA, NPI, CUSIP, SWIFT, ABA + 4 more
ModelBrew

H4 — Military OPSEC redaction toggles

CUI/FOUO/SECRET, MGRS, EDIPI, DTG, lat/long, network refs
ModelBrew
Where ModelBrew is uniquely positioned

Three statements we will defend with code.

For visitors who scanned past the table: these are the rows where ModelBrew is the only platform among the ten with a bilateral-verified YES. Each card lifts a row from the matrix above and explains what it means in plain terms.

A · Continual learning · rows A1–A3

Add a new domain. Keep the old ones. 26 of 31 zero-forget across a five-domain Mistral-7B chain.

CRMA is a constrained residual mixing adapter: freeze the base, add a small per-domain module, route via a learned mixer that respects a spectral constraint. We chain five domains in sequence — TCCC, ATTCK, logistics, cyber, and a held-out probe — then re-test the first one after the fifth.

Result: 26 of 31 questions correct with zero catastrophic forgetting. Three-seed reproducibility. Patent pending US provisional 2026-02-28. None of the ten platforms in our research panel publicly disclose an equivalent sequential-CL benchmark.

1# constrained-residual mixing — public sketch
2class CRMA(nn.Module):
3    def forward(self, x, domain_id):
4        base   = self.frozen_base(x)
5        adapt  = self.per_domain[domain_id](x)
6        gate   = self.mixer(x, domain_id)   # spectral-bounded
7        return base + gate * adapt
D2 · Receipts · row D2

Every public number cites a code file.

The /claims page maps every number on the marketing site back to a specific file path, line range, or commit hash. We did not find an equivalent claims-to-code receipts page on any of the ten competitor sites surveyed.

D1 · Security · row D1

Retention numbers tied to code lines.

Most security pages publish compliance logos. Ours publishes retention days cited to backend/db.py — sub-hour local copy, 7-day training-infra. Snorkel has a security page; theirs does not cite operational retention numbers to code.

F1 + C3 · Atomic billing-cap · rows F1, C3

Five routes, one SQL transaction. Cannot race past the daily AI cost cap.

The cleaner runs LLM-judge calls on user data. Each user has a daily AI cost cap. The cap is enforced inside a single SQL transaction shared by all five routes that can spend — so two parallel requests cannot both read "under cap" and both succeed. Property-tested, not just unit-tested.

1def add_credits(username, amount_cents, run_id=None, correlation_id=None):
2    with _conn() as c:                          # BEGIN IMMEDIATE
3        bal = c.execute("SELECT balance_cents FROM users WHERE username=?", (username,)).fetchone()
4        new = bal[0] + amount_cents
5        c.execute("UPDATE users SET balance_cents=? WHERE username=?", (new, username))
6        _billing_log("add_credits", username=username, delta=amount_cents, correlation_id=correlation_id)
7    return new                               # commit-or-rollback as a unit
Defensibility · for investors

Patent pending. Receipts on every claim. Six base models verified.

The moat is not "we have a model" — every platform has models. The moat is a patent-pending CL architecture with multi-seed reproducibility, a public claims-to-code page that competitors cannot replicate without similar engineering hygiene, and verified breadth across six base model families. We file in the gap where Predibase's LoRA Land stops (single-task at scale) and where the rest of the field has not started (sequential continual learning).

Closest competitor

ModelBrew vs Predibase, side by side.

Row E3 of the matrix is the only row where Predibase has a verified YES that ModelBrew downgrades to ◐. Here is the honest framing for both.

ModelBrew · E3

Sequential continual-learning benchmark
5-domain Mistral-7B chain. 26 of 31 zero-forget. Three seeds. The research question: "can you stack five domains and still answer about the first one?"

Honest caveat: no public head-to-head against Predibase or Together yet — the benchmark is sequential-CL, not single-task quality.

Predibase · E3

LoRA Land — 310 single-task adapters across 10 base models
arXiv 2405.00732. The research question: "how good can a single-task LoRA adapter get, at scale, across many tasks and many base models?"

Strong portfolio breadth. We cite it in good faith.

Different research questions. Predibase asks how good one adapter is at one task. ModelBrew asks whether you can stack five domains in sequence without forgetting. Both are real engineering. Neither is a substitute for the other.

Most comparison pages are marketing. This one is two independent agents who each read every competitor's public materials, filed a matrix, and refused to ship cells where they disagreed.
— Method note · .planning/COMPETITOR_FEATURE_MATRIX_MERGED.md · 2026-05-07
What we do not yet claim

Gaps, listed before you ask.

Pre-seed companies have gaps. The honest move is to publish them. These are cells where a competitor has a verified YES that ModelBrew does not yet match. Tracked in memory/parked_until_funding.md; revisited post-seed.

SOC 2 / HIPAA certification

Compliance audits cost more than a pre-seed runway. We list the engineering controls (TLS, AES-256, retention numbers) on /security but cannot ship the badges yet.

Snorkel · Fireworks · HuggingFace AT

Per-tenant Zero-Data-Retention toggle

Together and Fireworks let a customer flip ZDR on at the account level. ModelBrew's retention numbers are global (cited on /security); not yet per-tenant configurable.

Together · Fireworks

Enterprise VPC / on-prem deployment

The platforms with sales motions ship VPC/on-prem packages. We are self-serve only — Modal-hosted, single region.

Together · Snorkel · Fireworks

Multi-LoRA inference serving

Predibase LoRAX serves many adapters from a single GPU. ModelBrew currently serves one adapter per inference request. On the roadmap.

Predibase
How this page was built

Two researchers, one diff.

Comparison pages have a credibility problem. Vendors write them. We tried to remove the vendor by removing the single source. Each cell required two independent confirmations or it did not ship.

i.

Two agents read every competitor

One in expert-research mode, one in red-team mode. Neither read the other's matrix. They each filed a feature × vendor cell-grid with citations.

ii.

Cell-by-cell diff

Agree-YES → ship as ✓. Both said NPD → ship as silent dash. One YES, one UNVERIFIED → drop the cell. Disagree-rate landed at 7.7% across 220 cells.

iii.

Caveats kept

Three MB cells were marked partial because one agent flagged a caveat (no formal proof, no head-to-head, validator P2). Those ship as ◐, not ✓. Source: COMPETITOR_FEATURE_MATRIX_MERGED.md.

Common questions

The questions we keep getting.

These answers are also embedded as Schema.org FAQPage JSON-LD, so search engines and LLM crawlers can lift them directly.

Basics

What is ModelBrew?
ModelBrew is a fine-tuning platform for open-source language models, built around a continual-learning architecture called CRMA (Constrained Residual Mixing Adapter). You upload a dataset, optionally clean it with our LLM judge, train a LoRA adapter on one of six supported base models (TinyLlama, Mistral-7B, Saul-7B, Qwen3-8B, Gemma-2-9B, Llama-3.1-8B), and either chat with the result or stack additional domains on top via continual learning. Every numeric marketing claim is tied to a specific line of code on the public receipts page. Learn more at /security or /claims.
What is fine-tuning vs RAG vs prompting?
Prompting steers a model's behavior at inference using natural-language instructions. RAG (retrieval-augmented generation) retrieves passages from a vector store and injects them into the context window, so the model can quote facts it never memorized. Fine-tuning updates the model's weights, baking a behavior or domain pattern into the network itself. They are not interchangeable. In our open-source OS1 benchmark across 3 public Obsidian vaults, fine-tuning beat RAG 83.3% on inference questions (questions whose answers cannot be found in any one retrieved chunk). RAG still wins on extractive lookup. Receipts at /claims.
What is continual learning across domains?
Continual learning is training a model on Domain A, then Domain B, then Domain C in sequence — without it forgetting Domain A by the time it reaches C. The textbook failure mode is catastrophic forgetting. ModelBrew's CRMA architecture has been verified empirically on a 5-domain Mistral-7B chain at 26 of 31 zero-catastrophic-forgetting test items, with a 3-seed reproducibility sweep. The full method is in our preprint at paper/crma_modular_cl_arxiv.tex. None of the ten competitors in our research panel publicly disclosed an equivalent sequential-CL benchmark. Receipts at /claims.
What is CRMA?
CRMA stands for Constrained Residual Mixing Adapter. The intuition: when you train a second adapter on top of an existing model, gradient flow toward the new task can erase what the first adapter learned. CRMA constrains the residual mixing of new updates against a frozen reference, so the second domain's gradients add to capability rather than overwriting it. The architecture is openly disclosed in our preprint; the production hyperparameters and the specific implementation in utils/crma.py are protected. Patent pending (US provisional filed 2026-02-28). See paper/crma_modular_cl_arxiv.tex.
What is LoRA fine-tuning?
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that freezes the base model's weights and inserts small trainable low-rank matrices into the attention layers. Instead of updating billions of parameters, you update millions, which makes training cheaper, faster, and recoverable on commodity GPUs. ModelBrew uses LoRA via the PEFT library underneath every fine-tuning run, configured per-model so the trainable-parameter count is appropriate to the base model size. The resulting adapter is small enough to download and host yourself. See /claims for the supported model list and per-model parameter counts.

Comparing ModelBrew

What is the one-sentence summary of how ModelBrew differs?
ModelBrew is the only platform on this comparison that ships a sequential continual-learning benchmark across multiple domains and a public claims-to-code receipts page and a security page where retention numbers are tied to specific lines in backend/db.py. Most platforms have one of those. We have all three. See /claims and /security.
ModelBrew vs Predibase: which is better for fine-tuning?
Predibase has years of customer track and ships LoRAX multi-LoRA inference serving as a product. They publicly cite the LoRA Land paper (arXiv 2405.00732), 310 single-task adapters across 10 base models — genuine portfolio breadth. ModelBrew publicly cites a sequential continual-learning chain (Mistral-7B, 5 domains, 26 of 31 zero-forget) and a claims-to-code receipts page where every marketing number maps to a backend code file. Different research questions, different stage of company. If you need multi-LoRA inference at scale, look at Predibase. If you need stacked-domain training with code-level transparency, ModelBrew. See row E3.
ModelBrew vs Together AI?
Together AI publicly cites a deep inference stack — high-throughput open-model serving, batched generation, dedicated endpoints — and enterprise VPC deployment. That is a real strength for production inference at scale. ModelBrew publicly cites a continual-learning training architecture, multi-seed empirical reproducibility, and a claims-to-code receipts page. We do not yet ship inference at Together's scale. They do not publicly disclose a sequential continual-learning benchmark with comparable test conditions. Pick by workload: Together for high-throughput open-model inference, ModelBrew for training quality and post-train transparency. See /claims.
ModelBrew vs HuggingFace AutoTrain?
HuggingFace AutoTrain is the open-source default — broad model catalog, deep ecosystem integration, large community. ModelBrew supports a curated set of six base models (TinyLlama free; Mistral-7B, Saul-7B, Qwen3-8B, Gemma-2-9B, Llama-3.1-8B paid) and ships behind those a continual-learning architecture, an automatic stuck-run refund sweeper, and a public claims-to-code receipts page. AutoTrain optimizes for breadth; ModelBrew optimizes for contractual quality on a smaller surface. If your workflow already lives in the HuggingFace Hub, AutoTrain is the path of least resistance. If you want every dollar and every benchmark traced to code, see /claims.
ModelBrew vs OpenAI fine-tuning?
OpenAI's fine-tuning API supports their closed family (GPT-4 mini, GPT-4o, GPT-3.5). The resulting weights stay on OpenAI infrastructure — you cannot download them. ModelBrew fine-tunes six open-source base models (TinyLlama, Mistral-7B, Saul-7B, Qwen3-8B, Gemma-2-9B, Llama-3.1-8B), and the resulting LoRA adapter is downloadable via README export. Different categories: pick OpenAI if your stack is already locked into closed models; pick ModelBrew if you want portable weights, the option to self-host, and a continual-learning surface across multiple domains. See /claims.
What's the best alternative to RAG for domain knowledge?
Fine-tuning with continual learning is the strongest alternative to RAG when the questions you care about cannot be answered by retrieving any single document. In our open-source OS1 benchmark across three public Obsidian vaults, fine-tuning beat RAG 83.3% on inference questions — questions about author perspective, recurring values, cross-document patterns. RAG still wins on extractive lookup (about 58%), so the honest answer is hybrid: RAG for fresh facts, fine-tuning for behavior and synthesis. ModelBrew specializes in the fine-tuning half. Receipts at /claims.

Getting started

How do I fine-tune Llama / Mistral / Qwen3 on my data?
Four steps. (1) Sign up at app.modelbrew.ai — 75 credits ($7.50) free at signup, no card. (2) Upload a JSONL of prompt-response pairs. (3) Optionally clean it with our LLM judge ($5 per 200 rows, runs prompt-injection-hardened). (4) Pick a base model (Mistral-7B, Saul-7B, Qwen3-8B, Gemma-2-9B, or Llama-3.1-8B), set the LoRA hyperparameters (defaults are sane), and train at $3.99 per million tokens. After the run completes you can chat with the adapter or download it via README export. See /claims for source-cited pricing.
Can I fine-tune on my Obsidian or Notion notes?
Yes, with a small export step. Currently the platform expects JSONL of prompt-response pairs, so you export your vault to markdown and run a Q-and-A generation step before upload. Markdown ingest natively (OA1) is on our parked roadmap. The case for doing this at all is empirical: in our OS1 benchmark across three public Obsidian vaults (lyz-code blue-book, jRicciL digital-garden, deepaksood619.github.io), fine-tuning beat RAG 83.3% on inference questions. RAG-win-rates were 59.0% / 40.0% / 56.0% — all below the 60% kill threshold. Receipts at /claims.
How much does fine-tuning cost on ModelBrew?
Fine-tuning is $3.99 per million training tokens, flat across all six supported base models. Cleaning a dataset with the LLM judge is $5 per 200 rows. New accounts get 75 credits ($7.50) at signup, no card required, which is enough to fine-tune TinyLlama or run a small clean. The minimum credit purchase is $20, and credits never expire. All numbers are cited line-by-line to backend/pricing.py and backend/db.py on the public receipts page. See /claims#pricing.
Which model should I pick for my dataset size?
Rough guidance based on internal experiments: under 100 rows, start with TinyLlama (free) — anything larger will under-train and you'll be paying to memorize noise. 100–1000 rows, Mistral-7B-Instruct or Saul-7B-Instruct are strong general-purpose picks. Over 1000 rows, Qwen3-8B or Gemma-2-9B give you headroom for harder domains; Llama-3.1-8B is the right pick if your downstream serving is already on Llama. CRMA continual-learning chains (Mistral-7B, 5 domains, 26 of 31 zero-forget) used Mistral-7B as the base. See /claims for the full model list.

Trust & policy

Is my data safe with ModelBrew? Is ModelBrew SOC 2 compliant?
Honest answer: ModelBrew is pre-revenue and not yet SOC 2 or HIPAA certified — both are deferred until an enterprise customer asks for them in a contract, and both are listed on the public security page rather than hidden. What we do ship today: TLS 1.2+ on every endpoint, AES-256 at rest on Modal volumes and Turso replicas, sub-hour local-copy dataset retention, an explicit no-train pledge, LLM-judge prompt-injection hardening including polarity splice, and chat-template token stripping for Qwen3, Llama-3.1, and Phi-4. See /security.
Does ModelBrew train on my data?
No. We never train ModelBrew base models on your uploaded datasets. Your data is used only for the fine-tuning run you explicitly start, and the resulting LoRA adapter is yours to download. Upstream API responses (the LLM-judge cleaner output) are processed and discarded — they do not enter any training corpus. The default API tier is no-retention. The full pledge with retention numbers is on the public security page at /security#no-train.
What happens if my training run fails?
You get refunded automatically. A background sweeper checks every 5 minutes for stuck or failed runs and applies a pro-rata refund via the _auto_refund code path in backend/db.py — you do not have to file a ticket. The active-job window is 90 minutes; runs that have not produced a heartbeat by then are flagged stuck and refunded. Source: backend/db.py refund_stuck_run() and backend/server.py background recovery thread. Live API status at /status.
Is ModelBrew open source?
Partially, by design. The CRMA preprint is publicly available and the architecture is openly disclosed at paper/crma_modular_cl_arxiv.tex. The Python and JavaScript SDKs are open. The production repo containing utils/crma.py is private — that is where the hyperparameter configurations and exact implementation live. This split is deliberate: enough is published that the research is reviewable, and enough is protected that the commercial moat is real. See /claims for the full disclosure surface.

The receipts are the product.

If you only check one thing, check /claims. Every public number on this page is sourced there.

Methodology & honesty footnote

Comparison reflects publicly-disclosed features as of 2026-05-07, verified via two independent research passes (expert + red-team, neither read the other's matrix). Cells marked indicate the competitor's docs did not publicly disclose the feature at research time — this is not a claim of absence. We chose silent dashes, not red ✗ marks, because we cannot rule out that the feature exists and is simply not in their public surface.

Cells where one researcher said YES and the other could not independently verify were dropped — neither asserted nor implied. The dropped-cell ratio was 7.7% (17 of ~220 reviewed cells); the merged matrix is published at .planning/COMPETITOR_FEATURE_MATRIX_MERGED.md in the source repo.

Reachability * — These competitor pages were unreachable at research time and verification was limited to other public surfaces: Predibase security/trust portal (TLS cert expired); Fireworks privacy page (HTTP 404); OpenAI security-and-privacy and supervised-fine-tuning docs (HTTP 403 to automated fetch — verified via search excerpts); Galileo /security (HTTP 404); Snorkel /products (HTTP 404 — verified via security.snorkel.ai instead). Asterisks next to vendor names in the table indicate at least one of their pages was blocked at research time.

Omitted dimensions. We omit comparisons where pre-seed startups can't fairly compete with established vendors: customer count, years in market, named enterprise customers, SOC 2 / HIPAA certification status, ARR / revenue scale, sales-engineering availability, and dedicated-tenant deployment SKUs. These are real differentiators for buyers — we acknowledge them in gaps rather than treating them as feature parity.

Disagreement reporting. If you operate one of the 10 vendors compared and you disclose a feature publicly that we marked as , email modelbrewai@gmail.com with the URL and we'll update the row. We update on bilateral verification, not on marketing assertion.