Chatbots Fail News Accuracy, Forum AI Study Reveals

A Forum AI study reveals major chatbots struggle with news accuracy, showing high failure rates on election-related prompts and reliance on biased sources.

3 min read
Screen shows a Bloomberg Tech segment discussing AI chatbots and a study from Forum AI.
Bloomberg Technology

A recent study by Forum AI has revealed significant shortcomings in the news accuracy and sourcing capabilities of major chatbots, including ChatGPT, Gemini, Claude, and Grok. The findings suggest that while these AI models are increasingly used to consume information, they are not yet reliable sources, particularly on sensitive topics like elections and foreign policy.

Forum AI's Study Methodology

Campbell Brown, CEO of Forum AI, explained the study's methodology, which involved testing chatbots across three key dimensions: factual accuracy, bias, and the quality of sources used. The researchers aimed to provide an objective assessment of these AI tools, moving beyond the self-evaluations often provided by the companies developing them.

The full discussion can be found on Bloomberg Technology's YouTube channel.

Related startups

Major Chatbots Miss the Mark on News: Forum AI Study - Bloomberg Technology
Major Chatbots Miss the Mark on News: Forum AI Study — from Bloomberg Technology

Key Findings: Accuracy and Bias Concerns

The study uncovered a startling 90% failure rate for major chatbots when responding to election-related prompts. Furthermore, 35% of their answers on foreign policy issues relied on state-run media, raising concerns about the impartiality and reliability of the information being disseminated. On basic finance and market questions, a 30% factual error rate was observed.

Brown highlighted the critical need for independent evaluation, stating, "The model companies are essentially grading their own homework. And it's really important that there be companies outside of the model companies that are doing this work and sharing the results." She emphasized that most current benchmarking focuses on areas like coding and model capability, which are important but do not address the critical issue of factual accuracy and bias in real-world applications.

Political Bias in Chatbot Responses

A notable finding was the apparent political leaning in the responses of different chatbots. The study indicated that ChatGPT and Gemini tended to provide less biased answers on election-related questions, leaning more towards centrist or left-leaning perspectives. In contrast, Grok was found to exhibit a more pronounced right-leaning bias.

"Gemini and I handled a lot of the questions better than some of the other models," Brown noted, suggesting that while there is room for improvement, some models are performing better on specific types of queries. She added that the lack of independent evaluation means that the companies are effectively "grading their own homework."

The Need for Independent Evaluation

Brown stressed the importance of an independent evaluation system for AI models, particularly as they become more integrated into daily life and professional workflows. "I'm not calling for regulation, but I do think you're going to see the demand moving in that direction," she stated. "You're already seeing some states pass laws where they're requiring independent evaluation."

The study's findings underscore the challenge of ensuring AI accuracy and neutrality, especially in critical domains like news and politics. As consumers increasingly turn to AI for information, the reliability and bias of these tools become paramount concerns for both the public and the companies developing them.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.