Critics & Arbitration
The Spec Closure System validates LLM-generated specifications before they cross the Compiler Wall. Three agents, 12 rules, max 3 iterations.
When are critics needed? Only during LLM-assisted spec authoring. If you write your .dmx specification directly, the critics system is bypassed entirely — the spec goes straight to the Lark parser.
Three Agents
Converts natural language requirements into .dmx specifications. Implements targeted repairs identified by the Critic agent.
- Generates initial .dmx draft from natural language
- Implements repairs using specified JSONPaths only
- Temperature locked at zero for consistency
- Receives full DMX language reference as system context
Evaluates specifications against the Spec Closure Contract (SCC) — 12 validation rules. Produces closure scores and identifies blocking issues.
- Evaluates against 12 SCC validation rules
- Produces closure score (0-100%)
- Identifies BLOCK and WARN issues with JSONPaths
- Token compression: 180K → 15K tokens input optimization
Validates critic evaluations through deterministic checks before LLM inference. Reduces token consumption ~70%.
- 5 objective checks run deterministically (<10ms)
- Auto-disagree triggers when checks contradict critic
- LLM inference only when deterministic checks pass
- No authority to override — advisory role only
Closure Loop
The spec closure loop runs a maximum of 3 iterations. Each iteration must improve the closure score (monotonic guarantee). If convergence is not reached, the system escalates to human review.
Generate
LLM-A generates initial .dmx draft from requirements
Critique
LLM-B evaluates against 12 SCC rules, produces closure score
Converge?
Score >= 95% and no blocking issues = CONVERGED. Otherwise continue.
Repair
LLM-A applies path-locked repairs (only specified JSONPaths modified)
Monotonic Check
Verify closure score improved. If not, escalate to human.
Triangulate
LLM-C validates critic output (optional). Max 3 iterations total.
Max 3 iterations. Score must be monotonically non-decreasing. Regression = human escalation.
Design Principles
Fail-Closed
Uncertain findings escalate to humans rather than auto-resolving. The system never guesses.
Path Locking
Repairs only modify the specific JSONPaths identified by the critic. No side-effect edits.
Monotonic Closure
Each iteration must improve or maintain the closure score. Regression triggers escalation.
No Authority for LLM-C
The triangulation verifier is advisory only. It cannot override the critic or approve specs.
Evidence-First
Every issue citation includes the specific rule violated, the JSONPath, and the evidence.
Deterministic Bypass
LLM-C's 5 objective checks run in under 10ms. LLM inference is only invoked when needed.