{"id":"confidence-unreliable","text":"LLM self-assessed confidence does not reliably track accuracy. Confirmed across 4 models (corrected results): Opus r=0.280, Sonnet r=0.223, Flash r=0.267, Pro r=0.137. Confidence explains only 2-8% of variance. Revision based on self-assessment damages accuracy in all 4 models (-3pp to -41.5pp). Same structural flaw as human overconfidence (Kahneman) — answer and confidence come from the same process.","truth_value":"IN","source":"repo:beliefs-pi/CLAUDE.md","source_url":"","source_hash":"","justifications":[],"dependents":["eem-replaces-confidence","generate-and-critique"],"metadata":{},"created_at":"","updated_at":"","reviewed_at":"","verified_at":"","retracted_at":"","explanation":{"steps":[{"node":"confidence-unreliable","truth_value":"IN","reason":"premise"}]}}