Ambiguous, inconsistent, and underspecified natural-language software requirements pose a critical risk, especially in safety-critical domains. These defects can propagate into formal models and implemented code, leading to unsafe behavior. The VERIMED system, a neurosymbolic pipeline detailed in a recent arXiv publication, demonstrates how large language models (LLMs), augmented with an SMT solver, can effectively audit these requirements.
Related startups
Ambiguity as a Formalizable Signal
VERIMED tackles requirement ambiguity by translating natural language into formal logic. The key innovation lies in leveraging stochastic variation: multiple, independent formalizations of the same requirement are generated. When these formalizations are SMT-inequivalent, it signals ambiguity. The system then employs bidirectional SMT equivalence checking to transform this disagreement into a concrete, solver-checkable test, enabling precise identification of requirements with multiple plausible interpretations. This approach transforms a qualitative problem into a quantifiable one, paving the way for more robust LLM requirement auditing.
Granular Feedback Drives Verified Accuracy
The effectiveness of symbolic feedback is directly tied to its granularity. In a counterexample-guided repair process on a hemodialysis question-answering benchmark, VERIMED's approach yielded a dramatic improvement in verified accuracy, leaping from 55.4% to 98.5%. This highlights how concrete SMT counterexamples provided by the solver, derived from the LLM's formalizations, enable targeted and highly effective correction of software specifications. This demonstrates the power of LLM requirement auditing when coupled with precise, actionable feedback.