{"results":[{"id":"belief-registry-externalizes-critique","text":"The belief registry externalizes and persists the critic's judgments. Instead of relying on the same LLM to both generate and evaluate (which fails — self-critique damages accuracy -3pp to -41.5pp), the registry stores review outcomes as truth values (IN/OUT), retraction records, and nogoods. The critic's work survives across sessions and is available to any model.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"cognitive-budget","text":"Cognitive budget principle borrowed from graphics frame budgets: decompose work into focused passes (TMS pass, RAG pass, merge pass) each within the model's attention budget. Mixing beliefs and document chunks in a single prompt degrades performance (Opus drops 95.5% to 86%); three focused passes achieve 100%","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"confidence-unreliable","text":"LLM self-assessed confidence does not reliably track accuracy. Confirmed across 4 models (corrected results): Opus r=0.280, Sonnet r=0.223, Flash r=0.267, Pro r=0.137. Confidence explains only 2-8% of variance. Revision based on self-assessment damages accuracy in all 4 models (-3pp to -41.5pp). Same structural flaw as human overconfidence (Kahneman) — answer and confidence come from the same process.","truth_value":"IN","justification_count":0,"dependent_count":2,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"context-opacity","text":"The human cannot track what the LLM currently has in context. Context windows are opaque — the human sees their messages and tool results but not the model's internal state, compaction decisions, or what was silently dropped.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"continuity-human-problem","text":"The human cannot track what the LLM currently has in context. Context windows are opaque and compaction destroys justification networks. EEM solves this via visibility and persistence — the human can always inspect the current belief state regardless of what the model has in context.","truth_value":"IN","justification_count":2,"dependent_count":1,"challenges":[],"last_reviewed":"2026-05-30T07:02:40","review_result":"invalid","source_type":""},{"id":"cross-model-portability","text":"EEM works across model providers and sizes. The same belief network can be queried by Claude, Gemini, local models, or any LLM that can read text. Model upgrades, provider swaps, and cost optimization (Opus→Haiku) preserve all knowledge. The beliefs are plain text with structure — no model-specific format.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"eem-compensates-model-size","text":"EEM compensates for model size — smaller models with EEM match larger models without it","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":"2026-05-30T07:02:40","review_result":"unnecessary","source_type":""},{"id":"eem-definition","text":"External Epistemic Memory (EEM) is knowledge that lives outside the model, carries its justifications with it, and lets you understand how the system knows what it knows","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"eem-external","text":"External means outside model parameters, in a separate substrate. Survives compaction, model swaps, session boundaries. Six properties: separable (exists independently of the model), copyable (can be duplicated), shareable (multiple agents can access it), inspectable (humans can read it), editable (humans can modify it), auditable (justification chains are traversable).","truth_value":"IN","justification_count":0,"dependent_count":6,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"eem-vs-context","text":"Conversation history and context windows are ephemeral — lost at session boundaries, destroyed by compaction. EEM persists across sessions and model swaps. Context compaction destroys justification networks (quantified across 33 measured compaction events)","truth_value":"IN","justification_count":2,"dependent_count":0,"challenges":[],"last_reviewed":"2026-05-30T07:02:40","review_result":"pass","source_type":""},{"id":"evidence-model-compensation","text":"EEM compensates for model size: Sonnet+beliefs approximates Opus without beliefs. Haiku with dual-path achieves 94% A+B, matching Opus at 98%","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"karpathy-llm-wiki","text":"Andrej Karpathy (2026) independently proposed an 'LLM Wiki' — a persistent structured knowledge base that an LLM incrementally builds and maintains instead of re-discovering knowledge from scratch. Same diagnosis (RAG is stateless waste), same general solution (persistent knowledge artifact). EEM goes further: justification chains, retraction cascades, measured results across 4 model families. Independent convergence from a credible source validates the core insight.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"model-stacking","text":"Multi-pass agent pattern: Model A generates candidates → TMS records with provenance → Review critiques (machine + human) → Model B receives validated beliefs → Model B derives new beliefs → Review critiques derivations → Repeat. Each level is a full model pass with fresh context and critique pipeline as quality gate","truth_value":"IN","justification_count":2,"dependent_count":0,"challenges":[],"last_reviewed":"2026-05-29T17:30:21","review_result":"invalid","source_type":""},{"id":"model-stacking-evidence","text":"Multi-pass agent pattern observed: Model A generates candidates, TMS records with provenance, review critiques (machine + human), Model B receives validated beliefs, Model B derives new beliefs. Demonstrated in expert-build pipeline where Sonnet summarizes sources, then Sonnet derives, then Sonnet reviews — each pass gets fresh context with the TMS as the persistent layer between passes.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"self-critique-harmful","text":"LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"site-lacks-model-versions","text":"llmeem.ai does not specify which model versions (Sonnet, Opus, Haiku version numbers) or dates experiments were run against. Model version matters significantly for reproducibility.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"training-finetuning-cost-comparison","text":"Model fine-tuning costs $10K-$100K+ for a single domain adaptation, requires ML expertise, and produces a model locked to one provider. Training from scratch costs millions. EEM construction costs ~$300 (Sonnet) to ~$1,500 (Opus) for enterprise scale (13,511 beliefs, 6 departments), requires no ML expertise, and produces a portable knowledge artifact usable by any model. EEM is 10-100x cheaper than fine-tuning and works across providers.","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""}],"count":17,"limit":20,"offset":0}