# Belief Registry

## Claims

### agm-belief-revision [IN] OBSERVATION
AGM (Alchourrón, Gärdenfors, Makinson 1985) provides formal theory for rational belief revision. Entrenchment scoring in backtracking is a crude approximation of AGM

### amortization-argument [IN] OBSERVATION
EEM construction is expensive but amortizes. ~$300 Sonnet for 13,511 beliefs. Each query costs ~$0.01. Breakeven at 100-250 queries. After that, every query is cheaper than re-reading source documents from scratch.

### atms-de-kleer-1986 [IN] OBSERVATION
de Kleer (1986) ATMS uses assumption-based environments and nogoods. TMS beats ATMS for EEM because revision matters more than multiple environments when the problem solver (LLM) produces 13-37% errors

### automated-overnight-construction [IN] OBSERVATION
The derive-review-research cycle is mechanical enough to run unattended. Three of six substrate validations were run by an autonomous Claude session with no human intervention, producing consistent results. The target: kick off expert-build derive-review-repair in the evening, wake up to a reviewed knowledge base. Learn while you sleep, build while you're awake.

### automation-evidence [IN] OBSERVATION
Three of six substrate derive-review-research cycles (bare-metal, AWS, OpenShift) were run by an autonomous Claude session with no human intervention. Results were consistent with manually-run cycles: same four failure categories, same proportional breakdown, same repair strategies. The process is mechanical enough for a single LLM session to execute the full loop.

### belief-registry-externalizes-critique [IN] OBSERVATION
The belief registry externalizes and persists the critic's judgments. Instead of relying on the same LLM to both generate and evaluate (which fails — self-critique damages accuracy -3pp to -41.5pp), the registry stores review outcomes as truth values (IN/OUT), retraction records, and nogoods. The critic's work survives across sessions and is available to any model.

### beliefs-cli-vs-reasons-cli [IN] OBSERVATION
Two CLIs at different levels: beliefs CLI is a structured markdown KB with provenance and manual maintenance (simple, flat). reasons CLI (ftl-reasons) is a full TMS with automatic propagation, cascades, backtracking, and LLM-driven operations (powerful, dependency-aware). Use beliefs for independent facts, reasons for justified conclusions with dependency chains

### challenge-defend [IN] OBSERVATION
Dialectical argumentation: challenging a node makes it go OUT. Defending neutralizes the challenge. Multi-level chains supported — challenge the defense, defend the defense, etc. Preserves the original argument unlike retract

### cognitive-budget [IN] OBSERVATION
Cognitive budget principle borrowed from graphics frame budgets: decompose work into focused passes (TMS pass, RAG pass, merge pass) each within the model's attention budget. Mixing beliefs and document chunks in a single prompt degrades performance (Opus drops 95.5% to 86%); three focused passes achieve 100%

### compaction-destroys-networks [IN] OBSERVATION
Context compaction destroys justification networks. Quantified across 33 measured compaction events in beliefs-pi. Justification chains, dependency structures, and correction history are lost when the context window is compressed.

### confidence-unreliable [IN] OBSERVATION
LLM self-assessed confidence does not reliably track accuracy. Confirmed across 4 models (corrected results): Opus r=0.280, Sonnet r=0.223, Flash r=0.267, Pro r=0.137. Confidence explains only 2-8% of variance. Revision based on self-assessment damages accuracy in all 4 models (-3pp to -41.5pp). Same structural flaw as human overconfidence (Kahneman) — answer and confidence come from the same process.

### construction-cost-measured [IN] OBSERVATION
EEM construction cost measured for enterprise scale (6 departments, 5,366 sources, 13,511 beliefs): ~$300 at Sonnet pricing, ~$1,500 at Opus pricing. Dominant cost is the summarize step (~98M tokens). Per-query breakeven at 100-250 queries — after that, every query is cheaper than re-reading source documents.

### construction-vs-retrieval [IN] OBSERVATION
Construction cost dominates: O(chunks) + O(beliefs x rounds). But it amortizes across all queries O(queries). Expensive to build, cheap to query at scale

### context-opacity [IN] OBSERVATION
The human cannot track what the LLM currently has in context. Context windows are opaque — the human sees their messages and tool results but not the model's internal state, compaction decisions, or what was silently dropped.

### continuity-human-problem [IN] OBSERVATION
The human cannot track what the LLM currently has in context. Context windows are opaque and compaction destroys justification networks. EEM solves this via visibility and persistence — the human can always inspect the current belief state regardless of what the model has in context.

### credibility-is-presentation-problem [IN] OBSERVATION
The credibility gap on llmeem.ai is a presentation problem, not a substance problem. The evidence (eval harnesses, question sets, raw results, methodology writeups) exists but is not linked or publicly accessible. Fixing credibility requires linking to existing evidence, not generating new evidence.

### cross-model-portability [IN] OBSERVATION
EEM works across model providers and sizes. The same belief network can be queried by Claude, Gemini, local models, or any LLM that can read text. Model upgrades, provider swaps, and cost optimization (Opus→Haiku) preserve all knowledge. The beliefs are plain text with structure — no model-specific format.

### depth-reset-methodology [IN] OBSERVATION
Experiments reset derivation depth by providing new depth-0 observations. When derive saturates at depth 7-8, adding new premises from fresh source analysis provides new combinable pairs, restarting the derivation process from depth 1.

### derive-overshoot-observed [IN] OBSERVATION
Derive over-generates (produces beliefs that don't survive review) and review over-retracts (flags beliefs that could be recovered). Measured: 13-38% retraction rate across 6 infrastructure domains (939 derivations). Working through candidate retractions reveals insights — smuggled premises are usually recoverable (44-59% search-and-link recovery rate).

### derive-then-review [IN] OBSERVATION
Over-derive, then review catches errors, retraction cascades propagate corrections. Both roles overshoot (derive over-generates, review over-retracts). Working through candidate retractions is where insights hide

### dual-path-architecture [IN] OBSERVATION
Dual-path retrieval: TMS path (pre-computed beliefs) + FTS path (source chunk search), merged by a third pass. This is how EEM is queried at scale. Each path stays within cognitive budget

### dual-path-design-evidence [IN] OBSERVATION
Dual-path retrieval (TMS path for pre-computed beliefs + FTS path for source chunk search, merged by a third pass) achieves 98.5% A/B across 3,853 questions. Opus drops from 95.5% to 86% when mixing beliefs and document chunks in a single prompt; three focused passes achieve 100%.

### eem-cli-interface [IN] OBSERVATION
The reasons CLI provides: reasons init (create database), reasons add (add beliefs with --sl for justifications, --source for provenance), reasons retract (mark OUT with cascade), reasons assert (mark IN with restoration), reasons search (semantic search), reasons show (full details), reasons explain (justification trace), reasons derive (generate new beliefs), reasons review-beliefs (audit), reasons challenge/defend (dialectical argumentation), reasons check-stale (source change detection), reasons nogood (record contradictions), reasons export-markdown (beliefs.md output), reasons compact (token-budgeted summary).

### eem-compensates-model-size [IN] OBSERVATION
EEM compensates for model size — smaller models with EEM match larger models without it

### eem-definition [IN] OBSERVATION
External Epistemic Memory (EEM) is knowledge that lives outside the model, carries its justifications with it, and lets you understand how the system knows what it knows

### eem-epistemic [IN] OBSERVATION
Epistemic means not just facts but justified beliefs with truth values (IN/OUT), retraction cascades, contradiction records (nogoods), and derivation depth. This distinguishes EEM from RAG (which is external semantic memory but not epistemic)

### eem-external [IN] OBSERVATION
External means outside model parameters, in a separate substrate. Survives compaction, model swaps, session boundaries. Six properties: separable (exists independently of the model), copyable (can be duplicated), shareable (multiple agents can access it), inspectable (humans can read it), editable (humans can modify it), auditable (justification chains are traversable).

### eem-memory [IN] OBSERVATION
Memory in Tulving's semantic memory category — persistent structured knowledge, not ephemeral context

### eem-replaces-confidence [IN] OBSERVATION
EEM replaces 'am I sure?' with 'is this justified?' — shifting from unreliable confidence to auditable justification chains

### eem-three-properties [IN] OBSERVATION
EEM is defined by three load-bearing properties: external (outside parameters), epistemic (justified with truth values), and memory (persistent semantic knowledge)

### eem-vs-context [IN] OBSERVATION
Conversation history and context windows are ephemeral — lost at session boundaries, destroyed by compaction. EEM persists across sessions and model swaps. Context compaction destroys justification networks (quantified across 33 measured compaction events)

### eem-vs-knowledge-graphs [IN] OBSERVATION
Knowledge graphs store entities and relationships (what exists). EEM stores justified beliefs (what is believed and why). Knowledge graphs have no retraction cascades, no derivation depth, no contradiction tracking. When a fact is wrong, the graph doesn't know what else depends on it. EEM does. Every ontology is an implicit epistemology — it treats beliefs as facts, which works until they're wrong.

### eem-vs-parametric [IN] OBSERVATION
In-parameter knowledge has no audit trail. EEM makes 'how do you know that?' answerable by justification chain traversal. EEM's externality provides six properties: separable, copyable, shareable, inspectable, editable, auditable. Auditability is what distinguishes EEM from other external stores — it is the epistemic property.

### eem-vs-rag [IN] OBSERVATION
RAG is external semantic memory but not epistemic. It retrieves content by similarity but has no justification chains, truth values, retraction cascades, or contradiction tracking. EEM adds the epistemic layer that RAG lacks

### eem-works [IN] OBSERVATION
EEM measurably and dramatically improves LLM performance on domain tasks. The core research question is answered: yes

### epistemic-honesty [IN] OBSERVATION
An epistemically honest EEM should distinguish between what it can demonstrate (structural consistency, self-correction, provenance tracking) and what it cannot demonstrate without external validation (absolute accuracy of numeric claims, generalizability to other implementations, superiority over alternatives). The TMS makes this distinction tractable by separating premises from derivations

### evidence-beliefs-ablation [IN] OBSERVATION
Beliefs alone outperform beliefs + expert prompt: Opus 100% vs 94.2% (+5.8pp), Sonnet 94.2% vs 91.8% (+2.4pp). Adding expert prompt hurts — agent trusts its 'expertise' instead of consulting the knowledge base

### evidence-depth-ceiling [IN] OBSERVATION
Beliefs beyond depth 8 do not survive review. Retraction rate: 0% at depth 0, rising to 100% at depth 9+. The universal TMS is wide rather than deep

### evidence-dual-path [IN] OBSERVATION
Opus + dual-path architecture achieves 98.5% A/B across 3,853 questions. Zero D/F grades — eliminated the failure tail entirely

### evidence-exists-but-not-linked [IN] OBSERVATION
The eval harnesses, question sets, JSON result files, Langfuse traces, and methodology writeups all exist in project repos (beliefs-pi, expert-service, claude_code_langgraph). They are not public or linked from llmeem.ai. The credibility gap is a presentation problem, not a substance problem.

### evidence-expert-vs-baseline [IN] OBSERVATION
Expert-service with EEM scores 88% A-grade vs an agent pipeline 33% on same 50 questions, 15x faster

### evidence-model-compensation [IN] OBSERVATION
EEM compensates for model size: Sonnet+beliefs approximates Opus without beliefs. Haiku with dual-path achieves 94% A+B, matching Opus at 98%

### evidence-retraction-rate [IN] OBSERVATION
13-37% of derived beliefs are retracted per review round across multiple expert KBs. Self-correction works — the system finds and removes its own errors

### expert-agent-builder-repo [IN] OBSERVATION
expert-agent-builder automates the knowledge pipeline: fetch docs → generate entries → extract beliefs → derive → review. Install: pip install expert-agent-builder or uv tool install expert-agent-builder. Source and issues: https://github.com/benthomasson/expert-agent-builder

### expert-pipeline [IN] OBSERVATION
Expert pipeline: chunk source material → propose beliefs → human accepts → derive connections → review derivations → export. Value accrues at each stage, with derive producing new knowledge (connections the source doesn't make explicit)

### expert-pipeline-design [IN] OBSERVATION
The expert-build pipeline implements: fetch-docs (source → markdown), summarize (entries from sources), propose-beliefs (extract candidates), accept-beliefs (human gate), derive (find connections), review-beliefs (audit derivations), export (beliefs.md/JSON). Value accrues at each stage — derive produces new knowledge that no single source document states.

### expert-prompt-paradox [IN] OBSERVATION
Telling an agent it is an expert reduces belief utilization. The humble generic prompt produces better results because the agent consults the knowledge base instead of trusting its 'expertise'

### fabricated-specificity-rate-8-percent [IN] OBSERVATION
8% of premises contain fabricated details the source never mentioned (propose-beliefs Phase 2 experiment, 100 randomly sampled premises from handbook-expert). Dominant error type is embellishment, not contradiction — the proposer adds plausible but unsupported specifics (e.g., Redis when source says nothing about storage backend).

### four-failure-categories [IN] OBSERVATION
LLM derivation produces exactly four categories of error, validated across 6 infrastructure domains (939 derivations, 219 invalid): smuggled premises (41-54% of invalids — facts from parametric knowledge not cited in antecedents), false causal links (20-26% — independent capabilities asserted as integrated), unsupported superlatives (8-24% — strength claims overstating sources), domain conflation (13-15% — properties of one system applied to another). No new category has appeared. The taxonomy is complete.

### frame-problem [IN] OBSERVATION
McCarthy & Hayes (1969) frame problem: what persists across state changes. check-stale addresses this by detecting when source files change under beliefs

### ftl-reasons-implementation [IN] OBSERVATION
ftl-reasons implements: SL justifications with antecedents and outlists, BFS propagation cascades with restoration, entrenchment-scored dependency-directed backtracking, challenge/defend dialectical argumentation (challenge→OUT, defend neutralizes, multi-level chains), LLM-driven derive, review-beliefs, and contradiction detection. SQLite-backed, Python CLI.

### ftl-reasons-install [IN] OBSERVATION
ftl-reasons installs via pip or uv. Three options: (1) pip install ftl-reasons, (2) uv tool install ftl-reasons, (3) uvx ftl-reasons <command> to run without installing. Requires Python. Initialize a new database with: reasons init

### ftl-reasons-is-tms [IN] OBSERVATION
ftl-reasons implements actual Doyle-style TMS architecture: SL justifications with antecedents and outlists, BFS propagation cascades with restoration, entrenchment-scored dependency-directed backtracking. LLMs fill the problem-solver role Doyle left open

### ftl-reasons-quick-start [IN] OBSERVATION
Quick start: (1) pip install ftl-reasons, (2) reasons init, (3) reasons add my-belief 'Text of the belief' --source 'where I learned this', (4) reasons add derived-belief 'Conclusion' --sl my-belief to create a justified derivation, (5) reasons retract my-belief to see cascades propagate, (6) reasons assert my-belief to see restoration, (7) reasons explain derived-belief to trace the justification chain.

### ftl-reasons-repo [IN] OBSERVATION
ftl-reasons source code and issue tracker: https://github.com/benthomasson/ftl-reasons — open source, 211 tests covering propagation, cascades, restoration, nogoods, backtracking, challenge/defend, import/export, staleness detection. Issues and feature requests go here.

### generate-and-critique [IN] OBSERVATION
LLMs are extraordinary generators but unreliable critics. The belief registry externalizes and persists the critic's judgments, replacing internal self-assessment with external structured tracking

### how-agents-use-eem [IN] OBSERVATION
LLM agents use EEM by: querying beliefs via search/show/explain before answering, citing node IDs for auditability, running derive to generate new beliefs from existing ones, running review-beliefs to self-audit, recording nogoods when contradictions appear. The agent does not need to be told it is an expert — the knowledge base speaks for itself

### how-humans-use-eem [IN] OBSERVATION
Humans use EEM by: inspecting beliefs.md for current state, running reasons explain to understand why something is believed, challenging beliefs with reasons challenge, reviewing the network with reasons status, checking staleness with reasons check-stale. The key value is visibility — humans can see and audit what the system knows

### how-to-start [IN] OBSERVATION
To start using EEM: (1) reasons init — creates reasons.db, (2) add premises from observations with reasons add, (3) add justified conclusions with --sl to link dependencies, (4) use reasons derive to find connections, (5) use reasons review-beliefs to audit, (6) retract when evidence changes and let cascades propagate

### http-endpoint-access [IN] OBSERVATION
EEM is accessible via a single HTTP GET at https://expert.ftl2.com/public/eem-expert/beliefs — no Python library, no CLI installation, no database copy, no setup. Three formats available: HTML (human-browsable), Markdown (agent-readable), JSON (machine-readable). Any agent that can fetch a URL can consume justified beliefs immediately.

### hybrid-tms [IN] OBSERVATION
ftl-reasons is a hybrid TMS: symbolic TMS handles structure (justifications, propagation, cascades, backtracking, challenge/defend) while LLMs handle semantic operations (derive generates beliefs, review-beliefs critiques them, contradiction detection finds nogoods)

### import-agent-implementation [IN] OBSERVATION
import-agent command imports another agent's beliefs with SL justifications including agent:active as antecedent. Node is IN iff agent is active AND original belief is justified. Implemented in ftl-reasons CLI.

### independent-validation [IN] OBSERVATION
Karpathy's LLM Wiki (2026) independently arrived at the same diagnosis (RAG is stateless waste) and same general solution (persistent structured knowledge). EEM goes further with justification chains, retraction cascades, and controlled eval data. When independent researchers arrive at the same architecture from different starting points, the architecture is probably right.

### independent-verification-needed [IN] OBSERVATION
Independent verification requires: public repos with eval harnesses and question sets, raw score data with confidence intervals, author identity and methodology, third-party replication on different domains. Without these, EEM's numeric claims are the author's reported results, not independently verified benchmarks

### karpathy-llm-wiki [IN] OBSERVATION
Andrej Karpathy (2026) independently proposed an 'LLM Wiki' — a persistent structured knowledge base that an LLM incrementally builds and maintains instead of re-discovering knowledge from scratch. Same diagnosis (RAG is stateless waste), same general solution (persistent knowledge artifact). EEM goes further: justification chains, retraction cascades, measured results across 4 model families. Independent convergence from a credible source validates the core insight.

### llm-as-problem-solver [IN] OBSERVATION
Putting an LLM in the TMS problem-solver slot (generator via derive, critic via review-beliefs and contradiction detection) is what Doyle's architecture prescribes. The open question is whether an LLM is a good problem solver, not whether using one is faithful to the design

### llm-generation-capability [IN] OBSERVATION
LLMs are strong generators: they produce fluent, contextually relevant, often correct text across domains. The 88% A-grade with EEM, 98.5% dual-path, and consistent ~155 derivations per 10-round cycle across 6 domains demonstrate generation capability. The problem is reliability, not capability — 13-38% of derivations fail review.

### model-stacking [IN] OBSERVATION
Multi-pass agent pattern: Model A generates candidates → TMS records with provenance → Review critiques (machine + human) → Model B receives validated beliefs → Model B derives new beliefs → Review critiques derivations → Repeat. Each level is a full model pass with fresh context and critique pipeline as quality gate

### model-stacking-evidence [IN] OBSERVATION
Multi-pass agent pattern observed: Model A generates candidates, TMS records with provenance, review critiques (machine + human), Model B receives validated beliefs, Model B derives new beliefs. Demonstrated in expert-build pipeline where Sonnet summarizes sources, then Sonnet derives, then Sonnet reviews — each pass gets fresh context with the TMS as the persistent layer between passes.

### multi-agent-beliefs [IN] OBSERVATION
Multi-agent TMS: import-agent imports another agent's beliefs with SL justifications including agent:active as antecedent. Node is IN iff agent is active AND original belief is justified. Doyle-style truth maintenance across agents

### no-code-adoption [IN] OBSERVATION
Three levels of EEM integration: (1) HTTP GET — just a URL, read beliefs as context, no installation. (2) CLI (ftl-reasons) — search, show, explain, pip install. (3) Full pipeline (expert-build) — build, derive, review, maintain. The HTTP level is the on-ramp: try EEM in 30 seconds, see if it helps, no commitment.

### nogood-mechanism [IN] OBSERVATION
A nogood is a set of nodes that cannot all be IN simultaneously. When detected, dependency-directed backtracking traces backward through justification chains and retracts the responsible premise with fewest dependents (minimal disruption)

### ontology-vs-epistemology [IN] OBSERVATION
Every knowledge graph and ontology is an implicit epistemology that has forgotten it is one. Ontologies store entities and relationships (what exists) and treat beliefs as facts. EEM stores justified beliefs (what is believed and why). When a fact in a knowledge graph is wrong, the graph doesn't know what depends on it. EEM does — retraction cascades propagate corrections automatically through all dependents.

### premises-are-trust-boundaries [IN] OBSERVATION
Premises (nodes with no justifications) are the trust boundaries of the network. The TMS cannot validate them — they are accepted by fiat. Every derived belief inherits the epistemic status of its premises. If a premise is wrong, everything that depends on it is structurally valid but false

### reasons-db-vs-beliefs-md [IN] OBSERVATION
Architecture pattern in practice: reasons.db (SQLite) is the primary store for all structural operations — add, retract, derive, review, justify. beliefs.md is an export format for querying — fast, human-readable, grep-able, used as agent context. Both kept in sync via reasons export-markdown.

### reasons-for-maintenance-beliefs-for-queries [IN] OBSERVATION
Architecture pattern: use reasons database for all structural operations (add, retract, derive, review). Export to beliefs.md for querying (fast, human-readable, grep-able). Keep both in sync via reasons export-markdown

### restoration [IN] OBSERVATION
When a retracted node comes back IN, dependents are recomputed — no manual rederivation needed. The TMS tracks structure so restoration is automatic

### retraction-cascade [IN] OBSERVATION
When a node goes OUT, all dependents whose justifications become invalid also go OUT — automatically, transitively. This is the most important operation: retract one belief and the network figures out what else falls

### review-catches-llm-errors [IN] OBSERVATION
The review step catches the specific kinds of errors LLMs make during derivation. Four categories account for 100% of failures across 6 domains. Each maps to a repair strategy: smuggled premises → search-and-link (44-59% recoverable), superlatives → soften, false causal links → retract, domain conflation → retract. The 13-38% retraction rate validates that TMS review compensates for exactly the kind of errors LLMs produce.

### scale-data [IN] OBSERVATION
40+ expert knowledge bases built across domains. Smallest: aap-expert (237 beliefs). Largest: redhat-expert (12,511 nodes across 6 departments/expert agents, 11,897 IN after repair). Domains include enterprise products, codebases, research papers, certification curricula, cloud infrastructure (AWS, GCP, Azure, OpenShift, bare-metal, Hetzner).

### scale-evidence [IN] OBSERVATION
EEM scales from small domains (237 beliefs, aap-expert) to large enterprises (12,731 beliefs, redhat-expert). 40+ expert knowledge bases built across code, product, project, and domain-specific experts

### search-and-link-recovery [IN] OBSERVATION
44-59% of smuggled premises are recoverable via search-and-link: search the existing belief network for the smuggled fact, add it as a proper antecedent. The LLM isn't fabricating — it's laundering correct facts without citation. The repair is 'find and link' rather than 'delete and forget.' Retract-only workflows destroy value by discarding beliefs whose factual content is correct.

### self-critique-harmful [IN] OBSERVATION
LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error

### self-improvement [IN] OBSERVATION
The system finds problems in itself. The derive-review cycle surfaces errors (13-38% retraction rate), and corrections cascade through the network, improving the foundation for the next derivation round. Each cycle builds on a cleaner network than the last.

### self-referential-evidence [IN] OBSERVATION
The EEM evidence base is currently self-referential: all experimental results (98.5% dual-path, 88% vs 33% expert-vs-baseline, confidence r-values) are sourced from the same project's internal entries. The belief registry demonstrating itself on its own claims is circular — structure and provenance tracking work, but the underlying numbers lack independent verification

### site-lacks-author-attribution [IN] OBSERVATION
llmeem.ai has no author, affiliation, or credentials listed anywhere on the site. Visitors cannot determine who ran the experiments or what their qualifications are.

### site-lacks-methodology-links [IN] OBSERVATION
llmeem.ai presents headline numbers (98.5% A/B across 3,853 questions, 88% vs 33% expert-vs-baseline, confidence r-values) without links to methodology, question sets, scoring rubrics, or raw result data. The numbers appear as unverified marketing claims.

### site-lacks-model-versions [IN] OBSERVATION
llmeem.ai does not specify which model versions (Sonnet, Opus, Haiku version numbers) or dates experiments were run against. Model version matters significantly for reproducibility.

### six-domain-validation [IN] OBSERVATION
The derive-review-research workflow has been validated across 6 independent infrastructure domains: bare-metal (RHEL), AWS, OpenShift, GCP, Azure, Hetzner. 939 total derivations, 219 invalid (23%), same four failure categories in every domain, same three repair strategies (search-and-link, soften, retract), zero uncategorized failures. The workflow is domain-independent.

### sl-justification [IN] OBSERVATION
SL (Support List) justification: a node is IN when ALL antecedents are IN. Multiple justifications allowed — node stays IN if ANY justification is valid. Enables non-monotonic reasoning via outlist (believe X unless Y)

### source-change-problem [IN] OBSERVATION
Beliefs derived from source material that has since changed may no longer be true. Source documents are updated, deprecated, or replaced over time. Without tracking which sources support which beliefs, the belief network silently diverges from reality.

### source-quality-determines-derivation [IN] OBSERVATION
Source document quality is the primary determinant of derivation quality. Technical/operational documentation (procedures, configurations, specs) produces 13-31% invalid rates with mostly softenable errors. Marketing/strategy documentation (assertions without evidence) produces up to 70% invalid rates with destructive retractions. The derive-review pipeline functions as a document quality assay.

### stale-belief-problem [IN] OBSERVATION
A belief derived from source material that has since changed may no longer be true. Without staleness detection, the belief network silently diverges from reality — it answers 'is this justified?' correctly for the old state but not the current one

### staleness-addresses-frame-problem [IN] OBSERVATION
check-stale is EEM's answer to the frame problem: instead of tracking everything that didn't change, it detects what did change and flags affected beliefs for re-evaluation. This bounds the maintenance cost to changed sources rather than the entire belief set

### staleness-detection [IN] OBSERVATION
Staleness detection tracks whether the source material a belief was derived from has changed. Each belief records a source path and SHA-256 hash at creation time. check-stale compares stored hashes against current file content and flags any IN belief whose source has changed

### staleness-implementation [IN] OBSERVATION
check-stale implementation: each belief records a source path and SHA-256 hash at creation time. check-stale compares stored hashes against current file content and flags any IN belief whose source has changed. Implemented in ftl-reasons CLI.

### staleness-workflow [IN] OBSERVATION
Staleness workflow: run check-stale after source material changes → review flagged beliefs → retract or update beliefs whose sources invalidate them → let retraction cascades propagate to dependents. This keeps the belief network aligned with current reality

### structure-not-truth [IN] OBSERVATION
A belief being IN means its justification chain is structurally valid within the TMS — all antecedents are IN. It does not mean the belief is externally verified or true. Structure guarantees consistency, not correspondence with reality

### structure-not-truth-applies-to-site [IN] OBSERVATION
The expert.ftl2.com belief explorer demonstrates that TMS mechanics work (justification chains, IN/OUT propagation, retraction cascades) but does not prove the underlying claims are correct. A belief can be IN and fully justified within the system while being wrong, because all antecedents trace back to the same author's observations. Structure proves the tooling; external evidence proves the claims.

### tms-addresses-circularity-partially [IN] OBSERVATION
The TMS partially addresses circularity through mechanisms that no self-reported benchmark has: retraction cascades mean correcting one error propagates to all dependents, nogoods record when claims contradict each other, check-stale detects when source material changes under beliefs. These make the system self-correcting but not externally validated

### tms-doyle-1979 [IN] OBSERVATION
Doyle (1979) designed Truth Maintenance Systems with SL justifications, propagation, retraction cascades, and an exogenous problem-solver slot. The TMS substrate is content-agnostic by design

### training-finetuning-cost-comparison [IN] OBSERVATION
Model fine-tuning costs $10K-$100K+ for a single domain adaptation, requires ML expertise, and produces a model locked to one provider. Training from scratch costs millions. EEM construction costs ~$300 (Sonnet) to ~$1,500 (Opus) for enterprise scale (13,511 beliefs, 6 departments), requires no ML expertise, and produces a portable knowledge artifact usable by any model. EEM is 10-100x cheaper than fine-tuning and works across providers.

### wide-not-deep [IN] OBSERVATION
The universal TMS is wide rather than deep. Depth-8 ceiling is structural. Experiments reset derivation depth by providing new depth-0 observations
