{"results":[{"id":"belief-registry-externalizes-critique","text":"The belief registry externalizes and persists the critic's judgments. Instead of relying on the same LLM to both generate and evaluate (which fails — self-critique damages accuracy -3pp to -41.5pp), the registry stores review outcomes as truth values (IN/OUT), retraction records, and nogoods. The critic's work survives across sessions and is available to any model.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"generate-and-critique","text":"LLMs are extraordinary generators but unreliable critics. The belief registry externalizes and persists the critic's judgments, replacing internal self-assessment with external structured tracking","truth_value":"IN","justification_count":3,"dependent_count":0,"challenges":[],"last_reviewed":"2026-05-30T07:02:40","review_result":"invalid","source_type":""},{"id":"hybrid-tms","text":"ftl-reasons is a hybrid TMS: symbolic TMS handles structure (justifications, propagation, cascades, backtracking, challenge/defend) while LLMs handle semantic operations (derive generates beliefs, review-beliefs critiques them, contradiction detection finds nogoods)","truth_value":"IN","justification_count":1,"dependent_count":5,"challenges":[],"last_reviewed":"2026-05-29T17:30:21","review_result":"pass","source_type":""},{"id":"model-stacking","text":"Multi-pass agent pattern: Model A generates candidates → TMS records with provenance → Review critiques (machine + human) → Model B receives validated beliefs → Model B derives new beliefs → Review critiques derivations → Repeat. Each level is a full model pass with fresh context and critique pipeline as quality gate","truth_value":"IN","justification_count":2,"dependent_count":0,"challenges":[],"last_reviewed":"2026-05-29T17:30:21","review_result":"invalid","source_type":""},{"id":"model-stacking-evidence","text":"Multi-pass agent pattern observed: Model A generates candidates, TMS records with provenance, review critiques (machine + human), Model B receives validated beliefs, Model B derives new beliefs. Demonstrated in expert-build pipeline where Sonnet summarizes sources, then Sonnet derives, then Sonnet reviews — each pass gets fresh context with the TMS as the persistent layer between passes.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"self-critique-harmful","text":"LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""}],"count":6,"limit":20,"offset":0}