Corrected AI failures should not die in logs.
SUPERNOVA–IMMUNE turns validated AI corrections into persistent behavioral patches and measures whether the same failure class recurs less often on hidden future cases.
No model retraining. No source-code disclosure. Black-box evaluation only.
A correction fixes one output. It rarely protects future behavior.
AI teams correct failures manually. But the same failure class can reappear later in logs, RAG workflows, vector memories, tool calls and autonomous agent loops.
The operational question is not only whether the next answer is correct. It is whether a validated correction actually reduces future recurrence of the same failure class.
Persistent behavioral patches for recurring AI failure classes.
SUPERNOVA–IMMUNE is designed to convert validated corrections into persistent behavioral patches, then evaluate whether similar failures recur less often on hidden future cases.
Validated correction
A corrected AI failure becomes structured operational signal.
Behavioral patch
The failure class is converted into a reusable black-box safeguard.
Hidden future cases
The buyer keeps future cases and labels hidden during evaluation.
Recurrence measured
The buyer compares recurrence, false inhibition and positive-case preservation.
PATCH
A recurring failure pattern, unsafe conversion, or unsupported action is detected.
ALLOW
The strong claim or action is actually supported within its evidence and permission scope.
NO_PATCH
The case is benign and no recurrence-related patch should fire.
REVIEW
The case is conflicted, under-documented, ambiguous, or high-risk.
A narrower metric: does the same failure class come back?
| Approach | Typical limitation | SUPERNOVA focus |
|---|---|---|
| Reflexion / self-correction | Often improves one case at a time. | Persistent failure-family patches. |
| Vector memory | Similarity is not recurrence prevention. | Failure-class recurrence reduction. |
| Heavy guardrails | Can reduce errors by overblocking valid cases. | False inhibition is measured explicitly. |
| Fine-tuning | Can be slow, costly, and unsuitable for every correction. | No model retraining required for black-box evaluation. |
| LLM-as-judge | Useful for scoring, not necessarily persistent behavior patching. | Corrections become operational safeguards. |
The buyer keeps the hidden labels.
We do not ask buyers to trust our benchmark. We ask them to test recurrence reduction on their own hidden future cases.
- You provide previously corrected AI failures.
- You keep future cases and labels hidden.
- SUPERNOVA returns only black-box decisions and public reason codes.
- You compare against your own baselines.
- You measure recurrence reduction, false inhibition and positive case preservation.
Failure Recurrence Reduction
Does the same failure class recur less often?
False Inhibition Rate
Does the system avoid blocking valid cases?
Positive Case Preservation
Can strong claims still pass when evidence is sufficient?
Exploratory signals, not yet customer validation.
Internal v0.4 benchmark: strong recurrence-reduction signal, low false inhibition, high useful success.
Grok-generated hostile black-box stress-tests: labels hidden until prediction, strong exploratory performance across adversarial lots, conflict and review stress-tests completed.
Detailed casebooks, labelbooks and decision logs are confidential. Independent customer validation is the proposed next step.
Where recurrence hurts most.
AI agent reliability
Failure classes that reappear across autonomous agent loops and tool use.
Enterprise RAG
Recurring unsupported claims, stale evidence, and context-to-answer drift.
Support agents
Repeated policy errors, over-promises, or unsafe procedural shortcuts.
Compliance assistants
Conflicts between old memory, current policy, authority, and external rules.
AI eval platforms
Measure whether corrections actually reduce future recurrence.
AI safety / guardrails
Add a recurrence-reduction layer beyond broad blocking rules.
Acquisition-oriented. Not a public SaaS.
SUPERNOVA–IMMUNE is not offered as a public SaaS, rental license or self-serve API.
We are open to strategic acquisition, exclusive technology transfer, acquisition-oriented black-box evaluation, and due diligence under NDA.
The source code and internal engine are not disclosed before an advanced transaction process.
The public site explains the value, not the engine.
We do not publish source code, receptor fields, patch-generation logic, internal scoring, full casebooks, full labelbooks or internal heuristics.
Detailed technical material is available only under NDA and only in an advanced acquisition or due diligence process.
Is this a guardrail?
No. Heavy guardrails often block broad categories. SUPERNOVA focuses on whether a previously corrected failure class recurs less often on hidden future cases.
Do you need access to our model?
No. The evaluation can be run black-box. The buyer keeps future labels and scores locally.
Do you retrain the model?
No model retraining is required for the black-box evaluation.
What do we receive?
Black-box decisions, public reason codes and aggregated evaluation results. The engine remains confidential.
Request a black-box evaluation
For acquisition, evaluation or technology-transfer discussion: contact@supernova-immune.com
Black-box evaluations: evaluation@supernova-immune.com