What is catastrophic forgetting in LLM fine-tuning?

Catastrophic forgetting occurs when a language model loses previously learned knowledge during fine-tuning on new data. For example, training on medical data may cause the model to forget legal or financial knowledge it already had.

How does CRMA prevent catastrophic forgetting?

CRMA is a modular LoRA method with a spectrally bounded shared backbone. In benchmarks, modular LoRA on a CRMA backbone achieved -0.17% ± 0.17 loss-relative drift across 5 sequential domains on Mistral-7B (3 seeds), compared to +42.96% ± 5.5 for naive sequential training. Per-seed ranges are disjoint. All forgetting numbers are conditional on correct inference-time routing.

What is continual learning for language models?

Continual learning allows a single model to be trained on multiple domains sequentially (e.g., medical, legal, financial, code) without forgetting earlier domains. CRMA is a managed continual-learning API with built-in adapter management; near-zero forgetting numbers reported on our benchmarks are conditional on correct inference-time routing.

How much does CRMA fine-tuning cost?

ModelBrew offers a free tier with 3 runs per day on TinyLlama. Fine-tuning $3.99 per million tokens. Continual learning is currently in closed beta — contact us for access. Load $20 in credits to start — credits never expire.

Fine-tune without forgetting Try Free

PATENT PENDING

FINE-TUNING WITH BUILT-IN CONTINUAL LEARNING

The Alternative to RAG
Near-Zero Forgetting

RAG looks up answers every time. ModelBrew bakes knowledge directly into the model. Train across multiple sequential domains — your model keeps what it learns, with prior-task drift within measurement noise on our 3-seed Mistral-7B benchmark. No vector database, no retrieval pipeline.

Start Free — No Credit Card Free Dataset Optimizer →

3 free runs/day on TinyLlama. Pro from $3.99/M tokens. See pricing

modelbrew — mistral-7b — drift monitor

Drift Runs Domains

backbone drift -0.17% (3-seed avg)

baseline LoRA +43.0% forgetting

model Mistral-7B-Instruct

CRMA

Standard LoRA

-0.0%

MODULAR drift (3 seeds ± 0.17)

+0%

NAIVE forgetting (3 seeds ± 5.5)

98/100

Gemma inference ablation (vs 38/100 without)

18/18

Saul-7B legal sub-domains

Mistral-7B, 5 sequential domains, 3 seeds. Per-seed MODULAR and NAIVE ranges are disjoint at every seed. All forgetting numbers are conditional on correct inference-time routing.

1 US Patent Filed 7 Research Papers 5 Domains Benchmarked Mistral-7B Validated

Near-Zero Drift

−0.17%

MODULAR backbone drift across 5 domains on Mistral-7B (3 seeds ± 0.17)

API & SDKs

3 Lines to First Run

REST API. Upload data, pick a model, start training. That's it.

Inference Ablation

98/100 vs 38/100

Same Gemma-2-9B weights, 100 held-out questions, only CRMA toggled. Wilson 95% intervals disjoint.

Pricing

From $3.99/M tokens

Credits never expire. Free tier: 3 runs/day on TinyLlama.

Benchmarked

5 Domains, 3 Seeds

Medical, legal, finance, code, general. Reproducible results.

Use Cases

Any Multi-Domain Team

Cloud API or on-prem Docker. Your data never leaves your environment.

Catastrophic Forgetting

Continual Learning

How It Works

RAG retrieves. ModelBrew remembers.

RAG systems look up answers from documents every time — slow, fragile, and expensive to maintain. ModelBrew trains the knowledge directly into the model weights. No vector database. No chunking pipeline. Your model just knows it.

Step 1

📚

Upload your data

Medical notes, legal docs, code, anything.

Step 2

🛡

Train with CRMA

CRMA guards the model so it can learn without forgetting.

Step 3

✅

Done. Nothing lost.

Your model knows the new stuff AND still remembers the old stuff.

Without CRMA

Train on medical data

● Medical

Then train on legal data...

✗ Medical — gone

● Legal

Then train on code...

✗ Medical — gone

✗ Legal — gone

● Code

It only remembers the last thing you taught it.

With CRMA

Train on medical data

✓ Medical

Then train on legal data...

✓ Medical — still there

✓ Legal

Then train on code...

✓ Medical — still there

✓ Legal — still there

✓ Code

It remembers everything. Tested at 7B parameters, near-zero drift (−0.17% ± 0.17 across 3 seeds).

Use Cases

Built for teams that can't afford to forget.

Teams training models across multiple domains — without retraining from scratch every time.

Healthcare

Clinical NLP

Train on radiology reports, then clinical notes, then pathology — without forgetting prior specialties. Built by a healthcare practitioner who hit this problem firsthand.

Legal

Multi-Practice Firms

Fine-tune on contract review, then case law, then regulatory filings. Each practice area improves without degrading the others.

Finance

Cross-Asset Intelligence

Equities research, fixed income, credit analysis — one model that learns sequentially across asset classes without catastrophic forgetting.

Enterprise

Multi-Department AI

Support tickets, internal docs, product specs, HR policies. Add departments over time without retraining or managing dozens of separate models.

On-Prem / Private

Regulated Industries

When data can't leave your network, ModelBrew ships as a Docker container. Same API, same results — runs on your infrastructure with zero external calls.

ML Teams

Production Pipelines

Plug into existing CI/CD. Upload data per domain, choose standard FT or continual learning, track per-domain metrics and drift over time via API.

Free Tool

Dataset Optimizer

Clean your fine-tuning dataset before training. 60+ validator codes, AI-judge scoring with score-floor-gated rewrite, structural pair audit + judge-based polarity sample, tool-call validation, jailbreak + military-OPSEC + industry-specific PII detection — all in your browser. Free, no signup.

🔍

60+ validator codes

Format, schema, length, dedup (exact + near + semantic), encoding, GPT-slop, refusals, repetition, mislabel detection. Every flag points back to a row index.

⚖

AI judge + rewrite

Four-axis judge with calibration exemplars; optional 14-dim and G-Eval rubrics. Rewriter preserves every number, URL, named entity, and acronym — verified by a fact-diff before the row ships.

🔁

DPO / ORPO structural audit

Eight structural defect codes — identity pairs, near-duplicate chosen, both-refusals, both-too-short, extreme length bias, sycophantic chosen, refusal-as-chosen, missing prompt. The pair-level checks row-level scanning misses.

🛠

Tool-call validation

OpenAI tool_calls and Anthropic tool_use shape detection. Missing-required-arg and wrong-arg-type are critical; unknown-arg is a warning. Built for shipping agentic fine-tunes.

🛡

Jailbreak · OPSEC · typed PII

Eight jailbreak categories (prompt injection, role bypass, system extraction, encoding attacks). Six military OPSEC codes (MGRS, EDIPI, classification markings, DTG, lat/long, network refs). Nine industry-specific PII detectors (medical: MRN/DEA/ICD-10/NPI, financial: CUSIP/SWIFT/ABA, legal: bar number/Bates) on top of the standard 10-type regex PII pass.

🚀

Proven at 100,000 rows

250 rows / sec on a single worker, peak RSS under 1.5 GB. End-to-end scan of 100k OASST1 and 100k military corpora. Real benchmark, not a marketing number.

Try Dataset Optimizer — Free →

Supports JSONL, CSV, and JSON · Up to 50MB · No account needed

Quick Start

Three lines to your first training run.

# Fine-tune with CRMA — near-zero forgetting import requests response = requests.post( "https://fourwheels2512--crma-finetune-fastapi-app.modal.run/start_run", data={ "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "epochs": "3", "use_crma": "true", }, files={"file": open("my_data.jsonl", "rb")}, headers={"Authorization": "Bearer YOUR_TOKEN"} ) print(response.json()) # {"run_id": "abc123", "status": "running", "model": "TinyLlama-1.1B"} # → Downloads: crma_adapter.zip (PEFT-compatible, plug into Transformers)

Works with any JSONL dataset. Or use the web UI — no code needed. · Full API docs →

Active R&D

Three papers. Real experiments. Ongoing research.

CRMA comes from original research — not a wrapper around existing tools. We publish our methodology, run multi-seed experiments, and update the algorithm based on results. Patent pending (US provisional filed Feb 2026).

Analysis

Six CL Methods Tested — Six Failures

EWC, replay, gradient projection, knowledge distillation, O-LoRA, 10-component stacks. Best result: 58.4% forgetting. We tested them all so you don’t have to.

Preprint · v2-v7 experiments · TinyLlama & Mistral-7B

Read paper →

Results

Near-Zero Forgetting on Mistral-7B Across 5 Domains

Modular LoRA on a spectrally bounded CRMA backbone: −0.17% ± 0.17 MODULAR drift vs +42.96% ± 5.5 NAIVE forgetting across 3 seeds. Per-seed ranges disjoint. Validated on 5 models across 4 architecture families. Patent pending.

Preprint · 3 seeds · 5 domains · Mistral-7B & Gemma-2-9B

Read paper →

Current Research & Development

Validated

Multi-seed experiments across 3 random seeds on Mistral-7B. 5 real-world domains (medical, legal, financial, code, science). Results reproducible across seeds.

In Progress

Enhanced reasoning via self-distillation fine-tuning (SDFT). Scale testing beyond 7B. Head-to-head benchmark against O-LoRA and other academic CL methods.

Roadmap

Real-time continual learning (streaming updates). Agent fine-tuning with tool-use preservation. Automatic domain boundary detection.

Results

Numbers, not promises.

CRMA has been tested across multiple model scales and domains. Here's what the benchmarks show.

-0.17%

Backbone drift with CRMA

3-seed avg across 5 domains (Mistral-7B)

+43%

Forgetting without CRMA

3-seed avg, naive sequential training

98/100

Gemma inference ablation

Same weights, 100 questions, CRMA toggled. 38/100 without.

18/18

Saul-7B legal sub-domains

First-author-evaluated retention across 3 sequential legal sub-domains

Forgetting Rate

Backbone Drift After 5 Domains

3-seed average, Mistral-7B, 5 domains

Spectral Norm Invariant

‖M‖₂ held at 1.0 across 867 steps

Max deviation < 1.2 × 10⁻⁷ · Gemma-2-9B · Birkhoff bound holds by construction

Method	Forgetting	Overhead	Price/M tokens	CL Support
CRMA	-0.17% drift	None	$1-3	Built-in
Naive LoRA	+43% (7B) / +225% (1.1B)	None	Varies	No
OpenAI	No CL	N/A	$3-25	No
Mistral / Together	No CL	N/A	$0.48-9	No

How we measure: "Forgetting" = change in holdout loss on previously learned domains after training on new ones. Negative = the model got slightly better (ideal). Positive = knowledge was lost. Measured across 5 real-world domains (medical, legal, financial, code, science) on Mistral-7B, averaged over 3 random seeds.

Per-Domain Drift After 5 Sequential Domains

Each domain was trained sequentially. Drift measures how much earlier domains degraded after all 5 were trained. Negative = slight improvement (positive transfer).

Domain	CRMA	Frozen	Naive LoRA
Medical	−0.56%	+2.22%	+149.6%
Legal	−0.55%	+1.83%	+34.3%
Financial	+0.59%	+1.74%	+17.8%
Code	−0.51%	+2.78%	+13.0%
Science	+0.20%	+1.17%	+0.08%
3-seed Avg	−0.17%	+1.95%	+42.96%

Key insight: CRMA drift is on the order of an order of magnitude lower than FROZEN (∼1.95%), and two orders of magnitude lower than naive sequential LoRA (∼43%). The 3-seed average of the per-domain values reconciles to −0.17% at the bottom row. 3-seed average across seeds 0, 42, 1234; Mistral-7B.

View full benchmark data & methodology

CRMA Internal (Mistral-7B, 5 domains, 3-seed avg): CRMA Modular −0.17% ± 0.17 drift, Frozen +1.95% ± 0.64, Naive +42.96% ± 5.5. Per-seed MODULAR and NAIVE ranges are disjoint. No replay, no EWC, no knowledge distillation.

Gemma-2-9B inference ablation: 98/100 with CRMA (Wilson 95% CI [93.0%, 99.5%]) vs 38/100 without (Wilson 95% CI [29.0%, 47.8%]). Same weights, same questions, only CRMA toggled.

Pricing (April 2026): ModelBrew FT $3.99/M, all 7–9B models, with gradient visibility + built-in Dataset Optimizer. CL is in closed beta and not available for self-serve purchase at this time. OpenAI GPT-4.1 $3.00/M (no CL, FT only on their models). Together/Fireworks/OpenPipe $0.48-0.50/M (FT only, no cleaner, no CL). Mistral La Plateforme $1.00/M.

Head-to-head baselines: We have not run head-to-head comparisons against published CL methods (O-LoRA, InfLoRA, Lewandowski et al.) on our protocol. This is the single largest gap in our research; it is acknowledged openly in the paper. Our internal controls compare NAIVE vs FROZEN vs MODULAR on identical data.

CRMA results are from internal benchmarks using holdout evaluation. All forgetting-prevention numbers are conditional on correct inference-time routing.

Pricing

Pay only for what you use.

No subscriptions. Sign up and get 75 credits free ($7.50). Load $20 in credits when you're ready, pay only for tokens used. 3 free training runs per day on TinyLlama.

Free

75 credits at signup · no card

75 credits free at signup ($7.50)
3 runs per day on TinyLlama-1.1B
Fine-tuning mode
Download adapter ZIPs
Real-time training progress

Get Started

Pro

From $20

Load credits, pay only for tokens used

All models (Mistral-7B, Llama-3.1-8B, Saul-7B, Qwen3-8B, Gemma-2-9B)
Fine-tuning + continual learning
Priority GPU access
Cost estimates before each run
Credits never expire — balance rolls over

Buy Credits

All 7–9B models (Mistral-7B, Llama-3.1-8B, Saul-7B, Qwen3-8B, Gemma-2-9B)

Fine-Tuning	$3.99 / M tokens
Continual Learning	Closed beta — contact us
Clean with AI (Dataset Optimizer)	50 credits per 200 rows

Credits & Balance

Minimum credit purchase	$20
Credits roll over	Never expire
Failed jobs	Auto-refunded

Example: Fine-tune Mistral-7B on 500 medical Q&A pairs

Estimated tokens	~135K tokens
Rate (Fine-Tuning)	$3.99 / M tokens
Computed cost	$0.54
Deducted from balance	$0.54

Example: Continual learning on Mistral-7B — 5 domains

Continual learning is currently in closed beta and not available for self-serve purchase. Request access if you'd like to evaluate it on your data.

Refund Policy: If a training job fails due to a system error, your credits are automatically refunded — no action needed. Unused credits are non-refundable and non-transferable. All payments are processed securely by Stripe — we never see your card details. By purchasing credits, you agree to our Terms of Service.

Security

Built for regulated industries.

Production security. We lock down the API, storage, and runtime for healthcare, financial, and regulated teams.

Encryption at Rest

All model checkpoints and training data encrypted with AES-256 (Fernet). Secure delete enabled — no residual data on disk.

Security Headers

HSTS, X-Frame-Options DENY, Content-Type nosniff, XSS protection, strict Referrer-Policy, and Permissions-Policy on every response.

Audit Logging

Every API call logged with user, action, IP, and timestamp. Full audit trail for compliance reviews and incident response.

Role-Based Access

RBAC with granular permissions. Admin, user, and read-only roles. API keys separated from session tokens.

GDPR & Data Rights

One-click data export and account deletion. Your data, your control. Full compliance with data protection regulations.

Hardened Runtime

Non-root containers, health checks, safe model loading (no arbitrary code execution), and sanitized error responses.

About

Built by a practitioner, not a lab.

Near-zero catastrophic forgetting — validated on Mistral-7B and Gemma-2-9B across 5 sequential domains (3 seeds). ModelBrew AI makes continual fine-tuning practical, accessible, and pilot-ready (SFT path); preference-tuning surface (SimPO/DPO) in beta.

Company

ModelBrew AI

Based in Frederick, Maryland. We build mathematically constrained fine-tuning technology that lets AI teams train on new data without losing what their models already know. Our platform runs on serverless GPUs — no infrastructure to manage, no MLOps team required.

Founder

Kiran Nayudu

Healthcare practitioner who built CRMA after watching fine-tuned models forget critical knowledge with every training run. Background in regulated industries and hands-on ML engineering. Built CRMA from first experiment to deployed API.

Blog

Learn about fine-tuning and continual learning.

Technical articles about stable fine-tuning and why it matters for production AI.

Featured

Why RAG Falls Short — And What Happens When You Bake Knowledge Into the Model

Everyone is building RAG pipelines. We took a different path: train knowledge directly into the model weights, across sequential domains, with near-zero forgetting.

Comparison

DPO vs SimPO in 2026: Which Preference-Tuning Method Should You Use?

Side-by-side comparison of Direct Preference Optimization and SimPO — when each works, the trade-offs, and how ModelBrew picks the right one for your dataset.

Guide

What Is Fine-Tuning? Why It Matters and How It's Changing AI

Fine-tuning explained for a broader audience — real-world use cases in healthcare, legal, code, and finance.

Technical

What Are LoRA and QLoRA? A Practical Guide to Efficient Fine-Tuning

How LoRA and QLoRA made fine-tuning possible on consumer GPUs — and the stability problems they don’t solve.

Product

How CRMA Solves Continual Learning

Stable backbone, swappable domain adapters, near-zero forgetting. No replay buffers, no growing memory.

Analysis

Catastrophic Forgetting: The Silent Killer of Fine-Tuned Models

Why every fine-tuning run destroys prior knowledge, and what the research says about fixing it.

Comparison

CRMA vs LoRA: What's the Difference?

Side-by-side comparison of standard LoRA and CRMA — when you need each, and what happens when you don’t use CL.

Business

The Cost of Forgetting: Why Retraining From Scratch Is Unsustainable

The real-world compute, time, and quality costs of not having continual learning in your ML pipeline.

Contact

Get in touch.

Questions about CRMA, enterprise pricing, or fine-tuning? Reach out.

Reach us directly

✉ info@modelbrew.ai

𝐗 @MBrew26730

🤗 HuggingFace

💬 Reddit

ModelBrew AI
Frederick, Maryland

Vision

Roadmap

Live

Fine-Tuning

Live

Continual Learning

In Progress

Enhanced Reasoning

Future

Agent Training

Future

Real-Time CL

Ready?

Ditch the vector database. Teach your model directly.

Start with 3 free runs on TinyLlama. No credit card, no setup, no retrieval pipeline to manage.

Launch ModelBrew — Free

Legal Disclaimers & Legal Notices ▼

No Warranty. CRMA is provided "AS IS" without warranties. Not guaranteed to be uninterrupted or error-free.

Benchmarks. All metrics are from internal experiments under controlled conditions. Results are not guarantees — individual results vary by dataset, model, and configuration. Academic comparisons use different benchmarks.

AI Outputs. Fine-tuned models may produce inaccurate or harmful outputs. Users are responsible for validation. Not for medical, legal, or financial decisions without human review.

Liability. ModelBrew AI's total liability shall not exceed amount paid in the preceding 12 months. No liability for indirect or consequential damages.

IP. CRMA is protected by U.S. provisional patent (filed Feb 2026). Third-party names used for identification only.

Data. Your training data is used only for your job, stored temporarily, deleted after completion. We never train on your data. See Privacy Policy.

Research. Papers are pre-publication drafts, not yet peer-reviewed. Some experiments are single-seed.

Third-Party Services. Built on Modal, Stripe, and Hugging Face. We're not responsible for their outages. Stripe handles payments — we never see your card.

Governing Law. State of Maryland, USA. Exclusive jurisdiction: Frederick County courts.

By using CRMA you agree to these disclaimers, our Terms, and Privacy Policy. Contact: info@modelbrew.ai.

The Alternative to RAGNear-Zero Forgetting

RAG retrieves. ModelBrew remembers.

Upload your data

Train with CRMA

Done. Nothing lost.

Built for teams that can't afford to forget.

Clinical NLP

Multi-Practice Firms

Cross-Asset Intelligence

Multi-Department AI

Regulated Industries

Production Pipelines

Dataset Optimizer

60+ validator codes

AI judge + rewrite

DPO / ORPO structural audit

Tool-call validation

Jailbreak · OPSEC · typed PII

Proven at 100,000 rows

Three lines to your first training run.

Three papers. Real experiments. Ongoing research.

Current Research & Development

Numbers, not promises.

Per-Domain Drift After 5 Sequential Domains

Pay only for what you use.

All 7–9B models (Mistral-7B, Llama-3.1-8B, Saul-7B, Qwen3-8B, Gemma-2-9B)

Credits & Balance

Example: Fine-tune Mistral-7B on 500 medical Q&A pairs

Example: Continual learning on Mistral-7B — 5 domains

Built for regulated industries.

Encryption at Rest

Security Headers

Audit Logging

Role-Based Access

GDPR & Data Rights

Hardened Runtime

On-Premises Deployment Available

Built by a practitioner, not a lab.

ModelBrew AI

Kiran Nayudu

Learn about fine-tuning and continual learning.

Get in touch.

Reach us directly

Roadmap

Fine-Tuning

Continual Learning

Enhanced Reasoning

Agent Training

Real-Time CL

Ditch the vector database. Teach your model directly.

The Alternative to RAG
Near-Zero Forgetting