Real-world software often falters in unpredictable ways after deployment. Teams typically spend weeks fixing bugs based on user feedback, a process that relies on engineers to translate those issues into product improvements. However, by leveraging advanced agentic capabilities like those found in Codex, coupled with robust evaluation infrastructure and direct access to domain experts, it’s now possible to build systems that self-improve.
Over six months, OpenAI engineers and researchers partnered with Thrive Holdings to develop Tax AI for Crete’s accounting firms. This system aims to streamline the preparation of complex tax returns, moving beyond a purely engineer-driven improvement cycle. Tax AI transforms real-world usage into actionable signals for autonomous enhancement.
The accounting firms processed tens of thousands of tax returns, involving millions of documents. For complex filings, data entry alone can consume eight hours per return, often complicated by messy data sources and manual calculations. Tax AI processed 7,000 returns in its pilot phase, automating significant portions of the 1040 and 1041 tax return preparation.
Crucially, Tax AI has demonstrably improved since its initial deployment. The system now saves practitioners about a third of their time, drafts returns with up to 97% accuracy, and increases throughput by approximately 50%.
Measurable Self-Improvement
Accuracy is measured by the percentage of returns completed correctly without subsequent correction. At launch, only 25% of returns achieved 75% correct field completion. Within six weeks, this figure rose to 86%, with even faster growth seen at 90% and 100% completion levels.