03 · Never forget — Patent-pending CRMA

A Continual-Learning Engine for Production Fine-tuning

A research-stage continual-learning engine for stacking domains onto the same model — medical → legal → finance → code — without erasing what came before. −0.17% drift on Mistral-7B across 5 domains (3 seeds), versus +43% with naive sequential LoRA, on our internal protocol. Closed beta — invite only. Built on patent-pending CRMA.

Request closed-beta access Read the research →

Numbers, not promises

CRMA at a glance.

Tested across multiple model scales, three random seeds, five real-world domains. Per-seed CRMA and naive ranges are disjoint at every seed.

−0.17%

Backbone drift with CRMA

3-seed avg, 5 domains, Mistral-7B

+43%

Forgetting without CRMA

Naive sequential LoRA, 3-seed avg

98/100

Gemma inference ablation

Same weights, 100 questions, CRMA toggled. 38/100 without.

18/18

Saul-7B legal sub-domains

First-author retention validated across 3 sequential CL transitions

Catastrophic Forgetting vs Continual Learning

Most fine-tunes erase the past. CRMA doesn't.

Train a model on medical data. Then train it on legal data. Then code. With naive LoRA, each new training run partially erases the last. With CRMA, every domain stacks — and the model can answer questions across all of them.

Without CRMA — Naive Sequential LoRA

Step 1 — Train on medical data

● Medical

Step 2 — Then train on legal data

✗ Medical — gone

● Legal

Step 3 — Then train on code

✗ Medical — gone

✗ Legal — gone

● Code

It only remembers the last thing you taught it.

With CRMA — Continual Learning

Step 1 — Train on medical data

✓ Medical

Step 2 — Then train on legal data

✓ Medical — still there

✓ Legal

Step 3 — Then train on code

✓ Medical — still there

✓ Legal — still there

✓ Code

It remembers everything. Tested at 7B parameters: −0.17% drift across 3 seeds.

Active R&D

Three papers. Real experiments. Ongoing research.

CRMA comes from original research — not a wrapper around existing tools. We publish our methodology, run multi-seed experiments, and update the algorithm based on results. Patent pending (US provisional filed Feb 2026).

Analysis

Six CL Methods Tested — Six Failures

EWC, replay, gradient projection, knowledge distillation, O-LoRA, 10-component stacks. Best result: 58.4% forgetting. We tested them all so you don't have to.

Preprint · v2-v7 experiments · TinyLlama & Mistral-7B

Read paper →

Results

Near-Zero Forgetting on Mistral-7B Across 5 Domains

Modular LoRA on a spectrally bounded CRMA backbone: −0.17% ± 0.17 MODULAR drift vs +42.96% ± 5.5 NAIVE forgetting across 3 seeds. Per-seed ranges disjoint. Validated on 5 models across 4 architecture families.

Preprint · 3 seeds · 5 domains · Mistral-7B & Gemma-2-9B

Read paper →

Background

Why Catastrophic Forgetting Happens

The mathematical and architectural reasons LLMs forget when you fine-tune them sequentially. The shared-parameter dilemma. Why dropout, regularization, and learning-rate tricks don't fix it.

Blog · Conceptual primer · Recommended start

Read post →

Comparison

CRMA vs LoRA — What Changes

LoRA and CRMA both add small trainable adapters. The difference is what happens when you stack a second domain. LoRA overwrites; CRMA composes through a spectrally bounded substrate.

Blog · Architecture comparison

Read post →

Current Research & Development

Validated

Multi-seed experiments across 3 random seeds on Mistral-7B. 5 real-world domains (medical, legal, financial, code, science). Results reproducible across seeds.

In Progress

Enhanced reasoning via self-distillation fine-tuning (SDFT). Scale testing beyond 7B. Head-to-head benchmark against O-LoRA and other academic CL methods.

Roadmap

Real-time continual learning (streaming updates). Agent fine-tuning with tool-use preservation. Automatic domain boundary detection.

Results

Per-domain breakdown after 5 sequential domains.

Each domain was trained sequentially. Drift measures how much earlier domains degraded after all 5 were trained. Negative = slight improvement (positive transfer).

Domain	CRMA	Frozen	Naive LoRA
Medical	−0.56%	+2.22%	+149.6%
Legal	−0.55%	+1.83%	+34.3%
Financial	+0.59%	+1.74%	+17.8%
Code	−0.51%	+2.78%	+13.0%
Science	+0.20%	+1.17%	+0.08%
3-seed Avg	−0.17%	+1.95%	+42.96%

Key insight: CRMA drift is on the order of an order of magnitude lower than FROZEN (~1.95%), and two orders of magnitude lower than naive sequential LoRA (~43%). 3-seed average across seeds 0, 42, 1234; Mistral-7B.

Method comparison at a glance

Method	Forgetting	Overhead	Price/M tokens	CL Support
CRMA	−0.17% drift	None	Closed beta — contact us	Built-in
Naive LoRA	+43% (7B) / +225% (1.1B)	None	Varies	No
OpenAI	No CL	N/A	$3–25	No
Mistral / Together	No CL	N/A	$0.48–9	No

How we measure: "Forgetting" = change in holdout loss on previously learned domains after training on new ones. Negative = the model got slightly better (ideal). Positive = knowledge was lost. Measured across 5 real-world domains (medical, legal, financial, code, science) on Mistral-7B, averaged over 3 random seeds.

View full benchmark methodology & caveats

CRMA Internal (Mistral-7B, 5 domains, 3-seed avg): CRMA Modular −0.17% ± 0.17 drift, Frozen +1.95% ± 0.64, Naive +42.96% ± 5.5. Per-seed MODULAR and NAIVE ranges are disjoint. No replay, no EWC, no knowledge distillation.

Gemma-2-9B inference ablation: 98/100 with CRMA (Wilson 95% CI [93.0%, 99.5%]) vs 38/100 without (Wilson 95% CI [29.0%, 47.8%]). Same weights, same questions, only CRMA toggled.

Spectral norm invariant: ‖M‖₂ held at 1.0 within float32 precision across 867 logged training steps spanning 5 sequential domains on Gemma-2-9B. Max deviation < 1.2 × 10⁻⁷. Birkhoff bound holds by construction.

Pricing context (April 2026): ModelBrew FT $3.99/M, all 7–9B models, with gradient visibility + built-in Dataset Optimizer. CL is in closed beta and not available for self-serve purchase at this time. OpenAI GPT-4.1 $3.00/M (no CL, FT only on their models). Together/Fireworks/OpenPipe $0.48-0.50/M (FT only, no cleaner, no CL). Mistral La Plateforme $1.00/M.

Head-to-head baselines: We have not run head-to-head comparisons against published CL methods (O-LoRA, InfLoRA, Lewandowski et al.) on our protocol. This is the single largest gap in our research; it is acknowledged openly in the paper. Our internal controls compare NAIVE vs FROZEN vs MODULAR on identical data.

CRMA results are from internal benchmarks using holdout evaluation. All forgetting-prevention numbers are conditional on correct inference-time routing.

Stop forgetting. Start stacking.

CRMA continual learning is currently in closed beta. Fine-tuning ($3.99/M, free tier on TinyLlama) is available now.

Request closed-beta access See fine-tuning pricing →