Existing safeguards for tool-using language agents—schema validators, policy filters, provenance checks—fail to guarantee that a chosen action will have a predictable, identifiable causal effect. In confounded environments, an action appearing optimal in observational data can paradoxically decrease utility when executed. This fundamental gap hinders the reliability of AI agents performing state-changing tasks.
Beyond Action Validity: The Primacy of Causal Effect Identifiability
The core insight presented in this arXiv paper is that current tool-use verification focuses on whether an action can be performed, not whether it should be performed from a causal impact perspective. Even with sophisticated LLMs, the ability to generate a syntactically correct tool call does not equate to a causally sound intervention. This distinction is critical in workflows where confounding variables can lead to deceptive observational signals.