CRMA at a glance.
Tested across multiple model scales, three random seeds, five real-world domains. Per-seed CRMA and naive ranges are disjoint at every seed.
A research-stage continual-learning engine for stacking domains onto the same model — medical → legal → finance → code — without erasing what came before. −0.17% drift on Mistral-7B across 5 domains (3 seeds), versus +43% with naive sequential LoRA, on our internal protocol. Closed beta — invite only. Built on patent-pending CRMA.
Tested across multiple model scales, three random seeds, five real-world domains. Per-seed CRMA and naive ranges are disjoint at every seed.
Train a model on medical data. Then train it on legal data. Then code. With naive LoRA, each new training run partially erases the last. With CRMA, every domain stacks — and the model can answer questions across all of them.
CRMA comes from original research — not a wrapper around existing tools. We publish our methodology, run multi-seed experiments, and update the algorithm based on results. Patent pending (US provisional filed Feb 2026).
EWC, replay, gradient projection, knowledge distillation, O-LoRA, 10-component stacks. Best result: 58.4% forgetting. We tested them all so you don't have to.
Read paper →Modular LoRA on a spectrally bounded CRMA backbone: −0.17% ± 0.17 MODULAR drift vs +42.96% ± 5.5 NAIVE forgetting across 3 seeds. Per-seed ranges disjoint. Validated on 5 models across 4 architecture families.
Read paper →The mathematical and architectural reasons LLMs forget when you fine-tune them sequentially. The shared-parameter dilemma. Why dropout, regularization, and learning-rate tricks don't fix it.
Read post →LoRA and CRMA both add small trainable adapters. The difference is what happens when you stack a second domain. LoRA overwrites; CRMA composes through a spectrally bounded substrate.
Read post →Multi-seed experiments across 3 random seeds on Mistral-7B. 5 real-world domains (medical, legal, financial, code, science). Results reproducible across seeds.
Enhanced reasoning via self-distillation fine-tuning (SDFT). Scale testing beyond 7B. Head-to-head benchmark against O-LoRA and other academic CL methods.
Real-time continual learning (streaming updates). Agent fine-tuning with tool-use preservation. Automatic domain boundary detection.
Each domain was trained sequentially. Drift measures how much earlier domains degraded after all 5 were trained. Negative = slight improvement (positive transfer).
| Domain | CRMA | Frozen | Naive LoRA |
|---|---|---|---|
| Medical | −0.56% | +2.22% | +149.6% |
| Legal | −0.55% | +1.83% | +34.3% |
| Financial | +0.59% | +1.74% | +17.8% |
| Code | −0.51% | +2.78% | +13.0% |
| Science | +0.20% | +1.17% | +0.08% |
| 3-seed Avg | −0.17% | +1.95% | +42.96% |
Key insight: CRMA drift is on the order of an order of magnitude lower than FROZEN (~1.95%), and two orders of magnitude lower than naive sequential LoRA (~43%). 3-seed average across seeds 0, 42, 1234; Mistral-7B.
| Method | Forgetting | Overhead | Price/M tokens | CL Support |
|---|---|---|---|---|
| CRMA | −0.17% drift | None | Closed beta — contact us | Built-in |
| Naive LoRA | +43% (7B) / +225% (1.1B) | None | Varies | No |
| OpenAI | No CL | N/A | $3–25 | No |
| Mistral / Together | No CL | N/A | $0.48–9 | No |
How we measure: "Forgetting" = change in holdout loss on previously learned domains after training on new ones. Negative = the model got slightly better (ideal). Positive = knowledge was lost. Measured across 5 real-world domains (medical, legal, financial, code, science) on Mistral-7B, averaged over 3 random seeds.
CRMA Internal (Mistral-7B, 5 domains, 3-seed avg): CRMA Modular −0.17% ± 0.17 drift, Frozen +1.95% ± 0.64, Naive +42.96% ± 5.5. Per-seed MODULAR and NAIVE ranges are disjoint. No replay, no EWC, no knowledge distillation.
Gemma-2-9B inference ablation: 98/100 with CRMA (Wilson 95% CI [93.0%, 99.5%]) vs 38/100 without (Wilson 95% CI [29.0%, 47.8%]). Same weights, same questions, only CRMA toggled.
Spectral norm invariant: ‖M‖₂ held at 1.0 within float32 precision across 867 logged training steps spanning 5 sequential domains on Gemma-2-9B. Max deviation < 1.2 × 10⁻⁷. Birkhoff bound holds by construction.
Pricing context (April 2026): ModelBrew FT $3.99/M, all 7–9B models, with gradient visibility + built-in Dataset Optimizer. CL is in closed beta and not available for self-serve purchase at this time. OpenAI GPT-4.1 $3.00/M (no CL, FT only on their models). Together/Fireworks/OpenPipe $0.48-0.50/M (FT only, no cleaner, no CL). Mistral La Plateforme $1.00/M.
Head-to-head baselines: We have not run head-to-head comparisons against published CL methods (O-LoRA, InfLoRA, Lewandowski et al.) on our protocol. This is the single largest gap in our research; it is acknowledged openly in the paper. Our internal controls compare NAIVE vs FROZEN vs MODULAR on identical data.
CRMA results are from internal benchmarks using holdout evaluation. All forgetting-prevention numbers are conditional on correct inference-time routing.
CRMA continual learning is currently in closed beta. Fine-tuning ($3.99/M, free tier on TinyLlama) is available now.