The burgeoning field of vision-language models (VLMs) has shown promise in automating medical image interpretation, particularly for complex scans like CT. However, current VLMs often leave clinicians as passive recipients of final reports, lacking the crucial transparency needed to understand the AI's decision-making process. This gap hinders validation, refinement, and ultimately, trust in AI-driven medical diagnostics.
Unlocking Transparency with Agentic Reasoning
To bridge this critical gap, the researchers introduce RadAgent, a tool-using AI agent designed for stepwise and interpretable CT report generation. Unlike monolithic VLMs, RadAgent produces reports through a structured, iterative process, where each step and tool interaction is meticulously logged. This creates a fully inspectable reasoning trace, allowing clinicians to meticulously examine the derivation of reported findings and build confidence in the AI's output. This approach marks a significant departure from the opaque nature of previous VLM-based solutions, offering a path towards more reliable AI in radiology.
Quantifiable Gains in Accuracy and Robustness
The experimental results highlight RadAgent's superior performance compared to its 3D VLM counterpart, CT-Chat. The system achieved a 6.0-point improvement (36.4% relative) in macro-F1 and a 5.4-point improvement (19.6% relative) in micro-F1 for clinical accuracy. Crucially, RadAgent demonstrated a substantial 24.7-point increase (41.9% relative) in robustness under adversarial conditions. Furthermore, RadAgent introduced a new capability: 37.0% in faithfulness, a metric entirely absent in the CT-Chat system, underscoring the value of its interpretable, agentic framework for RadAgent CT report generation.