Positive formal evidence
When a candidate class is kernel-proved, the proof gives strong evidence for that generated Lean statement.
COVCAL turns Lean traces into a selective prediction system: accept formal answers when the proof signal is covered and calibrated; abstain when unresolved rival mass makes the judgment unsafe.
A Lean proof is strong positive evidence for the statement it checked. But missing proof can mean bad formalization, missing library context, timeout, or semantic drift. COVCAL treats these cases as unresolved instead of negative labels.
When a candidate class is kernel-proved, the proof gives strong evidence for that generated Lean statement.
Unproved answer classes may still be correct, unformalized, or merely out of prover budget. They cannot be treated as false.
The formal selector becomes reliable only when enough answer mass is typed, proved, and separated from unresolved rivals.
The pipeline groups raw candidates into normalized answer classes, records Lean statuses, computes coverage diagnostics, then accepts only when a calibrated rule certifies the risk of formal predictions.
Cluster equivalent strings like 1/2, 0.5, and 2/4 into answer classes with self-consistency weights.
Each class is marked proved, typechecked, timeout, illtyped, or unformalized from the formal pipeline.
Compute typed coverage, proved coverage, unresolved rival mass, and formal margin.
Bonferroni or dev-then-cal calibration certifies accepted formal answers, not fallback predictions.
On MATH-500, the formal signal is sharply coverage-dependent. Better autoformalization changes certificate feasibility more than the certificate machinery itself.
This toy panel mirrors the COVCAL decision logic: high typed coverage, high proved coverage, and a positive formal margin make formal selection trustworthy; otherwise the system abstains.
The proved class dominates the unresolved rival mass, so the formal trace is inside the trusted region.
Full pdf here.