Copilot CLI adds a 'Rubber Duck' reviewer

GitHub Copilot CLI's new 'Rubber Duck' feature uses a second AI model to review code plans, aiming to catch errors and improve performance on complex tasks.

3 min read
Copilot CLI adds a 'Rubber Duck' reviewer
Github Blog

GitHub is enhancing its AI coding assistant with a built-in second opinion. The latest experimental feature for GitHub Copilot CLI, dubbed 'Rubber Duck', leverages a different AI model family to scrutinize the primary agent's plans and outputs.

This approach aims to mitigate the inherent biases of a single model reviewing its own work. By introducing an independent reviewer, GitHub seeks to catch critical errors that might otherwise compound through the development process.

Catching Confident Mistakes

Traditional AI coding agents follow a linear process: assess, plan, implement, test, and iterate. However, foundational decisions made early on, particularly during the planning phase, can embed inefficiencies or errors. A model reviewing itself is still constrained by its own training data and blind spots.

Related startups

Rubber Duck acts as a focused review agent, powered by a complementary AI family. For instance, when a Claude model orchestrates the task, Rubber Duck might employ GPT-5.4 for review. This cross-family perspective is designed to surface overlooked details, questionable assumptions, and potential edge cases.

Performance Gains on Complex Tasks

Evaluations using the SWE-Bench Pro benchmark, which comprises difficult real-world coding problems, demonstrate significant improvements. Claude Sonnet paired with Rubber Duck (GPT-5.4) narrowed the performance gap with the more powerful Claude Opus model by 74.7%.

The benefit is most pronounced in complex scenarios spanning multiple files and requiring numerous steps. On these challenging tasks, the Sonnet-Rubber Duck combination showed a 3.8% higher resolution rate than Sonnet alone, increasing to 4.8% for the most difficult problems.

Specific examples include catching an architectural flaw that would prevent jobs from running, identifying a loop that silently overwrote data, and flagging cross-file conflicts where a crucial Redis key was no longer being written.

Automated and On-Demand Critiques

GitHub Copilot can automatically invoke Rubber Duck proactively, for example, after drafting a plan or completing a complex implementation. It can also be triggered reactively if the agent gets stuck.

Users can also manually request a critique at any point. The system then reasons over the feedback, presenting changes and justifications.

This judicious invocation targets moments with the highest potential for impactful feedback, minimizing disruption to the developer workflow. This advancement builds upon existing infrastructure used for other subagents within GitHub Copilot CLI.

Getting Started

Rubber Duck is available now in experimental mode. Users can access it via the /experimental command within GitHub Copilot CLI. It functions when a Claude model is selected and GPT-5.4 is enabled.

The feature is particularly beneficial for complex refactors, high-stakes tasks, ensuring thorough test coverage, and any situation where a second opinion on a plan is desired before committing resources.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.