Box Unveils GPT-5.5 with Enhanced AI Capabilities

Box announces GPT-5.5, showcasing significant improvements in accuracy and multi-step reasoning for enterprise applications.

Yashodha Bhavnani, Head of AI Products at Box, speaking about GPT-5.5.
Image credit: Box· OpenAI Youtube

Box has officially announced GPT-5.5, a new iteration of its AI model, promising substantial improvements in performance and capabilities. Yashodha Bhavnani, Head of AI Products at Box, presented the findings, highlighting the model's enhanced accuracy and reasoning abilities, particularly in complex tasks relevant to enterprise clients. This announcement positions Box to further integrate advanced AI into its content cloud platform, aiming to streamline knowledge work and boost productivity for its users.

GPT-5.5: A Leap in AI Performance

The introduction of GPT-5.5 marks a significant milestone for Box's AI initiatives. Bhavnani expressed her initial astonishment upon reviewing the evaluation results, stating, "When I saw the eval results come back, I said, is this true? Because it was such a big leap from what we've seen in the past." This sentiment underscores the substantial performance gains observed with the new model. The presentation included a comparative analysis of GPT-5.4 and GPT-5.5 across various datasets and industry subsets, illustrating a clear improvement in accuracy.

Related startups

The full discussion can be found on OpenAI Youtube's YouTube channel.

Introducing GPT-5.5 with Box - OpenAI Youtube
Introducing GPT-5.5 with Box — from OpenAI Youtube

Enhanced Accuracy and Reasoning

Box's internal evaluation showcased GPT-5.5 outperforming GPT-5.4 significantly. On a full dataset, GPT-5.5 achieved an accuracy of 77%, compared to 67% for GPT-5.4. The improvements were even more pronounced in industry-specific subsets. In Financial Services, GPT-5.5 reached 83% accuracy, up from 84% for GPT-5.4, and in Healthcare, it scored 78%, a notable increase from 69%. The Public Sector saw an accuracy of 72% with GPT-5.5 versus 67% for GPT-5.4, and Media & Entertainment achieved 70% with GPT-5.5 compared to 67% for GPT-5.4. Bhavnani emphasized that GPT-5.5 excels in multi-step reasoning, a critical capability for complex enterprise tasks.

Box Agent and Use Cases

The video also provided a glimpse into the practical application of GPT-5.5 through the Box Agent. A demonstration showed the agent analyzing the connection between 'Project Heritage' mentioned in an engineering roadmap and the performance of the enterprise customer segment in a churn summary. The agent was able to extract detailed information, including NovisPay's move to a new cloud-native stack, the status of its core settlement engine, and the allocation of engineering capacity. It also identified clear correlations between Project Heritage, stability issues in the legacy settlement engine, and enterprise customer churn. Bhavnani summarized the impact, stating, "GPT-5.5 takes the edge off of knowledge work and makes knowledge workers more efficient." This suggests that the model will enable users to produce higher quality and more accurate work by automating complex analytical tasks.

The analysis further revealed that Project Heritage appears to be a plausible driver of sharp deterioration in the enterprise segment's churn performance. The report indicated that NovisPay has moved 70% of traffic to its new cloud-native stack, but the core settlement engine remains on a legacy mainframe instance called Project Heritage. The roadmap also allocates 45% of Q2 2025 engineering capacity to 'Heritage' hotfixing and patching, far more than any other listed initiative. This points to significant ongoing investment in maintaining legacy systems. The enterprise segment shows the clearest churn stress, with enterprise churn rising from 1% in Q1 to 15% in Q4, a 14 percentage-point increase and a 15x increase in the churn rate. This is material because Enterprise has only 12 active users but the highest ARPU at $5,000 per month, making each lost account disproportionately valuable.

The strongest connecting evidence comes from a market intelligence report, which states that NovisPay's largest enterprise client, Global Retail Corp, has reportedly issued an RFP for a new provider, citing stability issues in the legacy settlement engine. Since the engineering roadmap identifies the legacy settlement engine as Project Heritage, this directly links Project Heritage's technical instability to Enterprise customer risk. In short, Project Heritage is not just an internal technical debt issue; it appears to be affecting enterprise retention. The enterprise segment's Q4 churn spike aligns with external market intelligence that a major enterprise customer is seeking alternatives because of stability issues in the same settlement engine that the roadmap says still has not been migrated off the mainframe.

These improvements are critical because GPT-5.5 is essentially taking the edge off of knowledge work. It can be used to do a lot of the heavy lifting that is currently required to do these types of analyses. Bhavnani expressed excitement about bringing these capabilities to Box customers, stating, "I'm incredibly excited by what GPT-5.5 will bring to the Box Agent as well as our customers." The enhanced model is expected to significantly boost the efficiency and accuracy of knowledge workers, allowing them to focus on higher-value tasks.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.