LLMs Tame Software Requirements

Ambiguous, inconsistent, and underspecified natural-language software requirements pose a critical risk, especially in safety-critical domains. These defects can propagate into formal models and implemented code, leading to unsafe behavior. The VERIMED system, a neurosymbolic pipeline detailed in a recent arXiv publication, demonstrates how large language models (LLMs), augmented with an SMT solver, can effectively audit these requirements.

Visual TL;DR. Ambiguous Requirements Risk addressed by VERIMED System. VERIMED System uses LLMs Formalize Ambiguity. LLMs Formalize Ambiguity reveals SMT Inequivalent Formalizations. SMT Inequivalent Formalizations analyzed by Bidirectional SMT Checking. Bidirectional SMT Checking enables Testable Signals. Testable Signals leads to Boosted Verified Accuracy.

Related startups

Ambiguous Requirements Risk: defects propagate into formal models and implemented code, leading to unsafe behavior
VERIMED System: neurosymbolic pipeline leveraging LLMs and SMT solvers for auditing
LLMs Formalize Ambiguity: generating multiple, independent formalizations of the same requirement
SMT Inequivalent Formalizations: signals ambiguity when multiple formalizations are not SMT-equivalent
Bidirectional SMT Checking: transforms disagreement into a concrete, solver-checkable test
Testable Signals: precise identification of requirements with multiple plausible interpretations
Boosted Verified Accuracy: turning ambiguity into testable signals and boosting verified accuracy

Visual TL;DRQuickExplainDeeper

Ambiguity as a Formalizable Signal

VERIMED tackles requirement ambiguity by translating natural language into formal logic. The key innovation lies in leveraging stochastic variation: multiple, independent formalizations of the same requirement are generated. When these formalizations are SMT-inequivalent, it signals ambiguity. The system then employs bidirectional SMT equivalence checking to transform this disagreement into a concrete, solver-checkable test, enabling precise identification of requirements with multiple plausible interpretations. This approach transforms a qualitative problem into a quantifiable one, paving the way for more robust LLM requirement auditing.

Granular Feedback Drives Verified Accuracy

The effectiveness of symbolic feedback is directly tied to its granularity. In a counterexample-guided repair process on a hemodialysis question-answering benchmark, VERIMED's approach yielded a dramatic improvement in verified accuracy, leaping from 55.4% to 98.5%. This highlights how concrete SMT counterexamples provided by the solver, derived from the LLM's formalizations, enable targeted and highly effective correction of software specifications. This demonstrates the power of LLM requirement auditing when coupled with precise, actionable feedback.

LLMs Tame Software Requirements

Related startups

Ambiguity as a Formalizable Signal

Granular Feedback Drives Verified Accuracy

AI Daily Digest