{"results":[{"id":"atms-de-kleer-1986","text":"de Kleer (1986) ATMS uses assumption-based environments and nogoods. TMS beats ATMS for EEM because revision matters more than multiple environments when the problem solver (LLM) produces 13-37% errors","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"continuity-human-problem","text":"The human cannot track what the LLM currently has in context. Context windows are opaque and compaction destroys justification networks. EEM solves this via visibility and persistence — the human can always inspect the current belief state regardless of what the model has in context.","truth_value":"IN","justification_count":2,"dependent_count":1,"challenges":[],"last_reviewed":"2026-05-30T07:02:40","review_result":"invalid","source_type":""},{"id":"credibility-is-presentation-problem","text":"The credibility gap on llmeem.ai is a presentation problem, not a substance problem. The evidence (eval harnesses, question sets, raw results, methodology writeups) exists but is not linked or publicly accessible. Fixing credibility requires linking to existing evidence, not generating new evidence.","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"evidence-exists-but-not-linked","text":"The eval harnesses, question sets, JSON result files, Langfuse traces, and methodology writeups all exist in project repos (beliefs-pi, expert-service, claude_code_langgraph). They are not public or linked from llmeem.ai. The credibility gap is a presentation problem, not a substance problem.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"frame-problem","text":"McCarthy & Hayes (1969) frame problem: what persists across state changes. check-stale addresses this by detecting when source files change under beliefs","truth_value":"IN","justification_count":0,"dependent_count":2,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"ftl-reasons-is-tms","text":"ftl-reasons implements actual Doyle-style TMS architecture: SL justifications with antecedents and outlists, BFS propagation cascades with restoration, entrenchment-scored dependency-directed backtracking. LLMs fill the problem-solver role Doyle left open","truth_value":"IN","justification_count":2,"dependent_count":6,"challenges":[],"last_reviewed":"2026-05-30T07:02:40","review_result":"pass","source_type":""},{"id":"llm-as-problem-solver","text":"Putting an LLM in the TMS problem-solver slot (generator via derive, critic via review-beliefs and contradiction detection) is what Doyle's architecture prescribes. The open question is whether an LLM is a good problem solver, not whether using one is faithful to the design","truth_value":"IN","justification_count":1,"dependent_count":1,"challenges":[],"last_reviewed":"2026-05-29T17:30:21","review_result":"pass","source_type":""},{"id":"llm-generation-capability","text":"LLMs are strong generators: they produce fluent, contextually relevant, often correct text across domains. The 88% A-grade with EEM, 98.5% dual-path, and consistent ~155 derivations per 10-round cycle across 6 domains demonstrate generation capability. The problem is reliability, not capability — 13-38% of derivations fail review.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"self-improvement","text":"The system finds problems in itself. The derive-review cycle surfaces errors (13-38% retraction rate), and corrections cascade through the network, improving the foundation for the next derivation round. Each cycle builds on a cleaner network than the last.","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":"2026-05-29T17:30:21","review_result":"invalid","source_type":""},{"id":"source-change-problem","text":"Beliefs derived from source material that has since changed may no longer be true. Source documents are updated, deprecated, or replaced over time. Without tracking which sources support which beliefs, the belief network silently diverges from reality.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"stale-belief-problem","text":"A belief derived from source material that has since changed may no longer be true. Without staleness detection, the belief network silently diverges from reality — it answers 'is this justified?' correctly for the old state but not the current one","truth_value":"IN","justification_count":2,"dependent_count":0,"challenges":[],"last_reviewed":"2026-05-29T17:30:21","review_result":"invalid","source_type":""},{"id":"staleness-addresses-frame-problem","text":"check-stale is EEM's answer to the frame problem: instead of tracking everything that didn't change, it detects what did change and flags affected beliefs for re-evaluation. This bounds the maintenance cost to changed sources rather than the entire belief set","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":"2026-05-29T17:30:21","review_result":"pass","source_type":""},{"id":"tms-doyle-1979","text":"Doyle (1979) designed Truth Maintenance Systems with SL justifications, propagation, retraction cascades, and an exogenous problem-solver slot. The TMS substrate is content-agnostic by design","truth_value":"IN","justification_count":0,"dependent_count":2,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""}],"count":13,"limit":20,"offset":0}