AI Scientist Now Published in Nature

An ambitious project aiming to automate the entire machine learning research lifecycle has reached a significant milestone. The AI Scientist, an agent powered by foundation models, has now been formally documented in a new publication in the prestigious journal Nature. This work is the result of a collaboration between researchers at Sakana AI, the University of British Columbia, the Vector Institute, and the University of Oxford.

First introduced as a preprint, The AI Scientist demonstrated its capability to generate novel ideas, conduct experiments, and write research papers autonomously. A subsequent iteration, AI Scientist-v2, achieved the historic feat of producing an AI-generated paper that successfully passed a rigorous human peer-review process, a key step in the journey toward automated scientific discovery.

Under the Hood: From Idea to Publication

The Nature paper details the system's architecture, which begins with a broad research direction. The AI then autonomously generates novel research hypotheses, searches and synthesizes relevant literature, and designs, programs, and executes experiments. This process utilizes parallelized agentic tree search, with a foundation model possessing vision capabilities providing feedback on figures.

The system's evolution has been marked by two distinct phases. Initially, it was given a starting code template and proved that end-to-end automation of machine learning research was possible. Later, the system was granted more freedom to explore AI research topics, culminating in a paper submitted to the ICLR 2025 ICBINB workshop. This AI-generated manuscript achieved an average score of 6.33, surpassing the average human acceptance threshold and scoring higher than 55% of human-authored papers, showcasing the potential for AI-generated paper peer review.

The Automated Reviewer and Scaling Laws

To scale the evaluation of AI-generated science, the researchers developed an Automated Reviewer. This tool, prompted to act as an Area Chair, ensembles five independent reviews to make a final decision. Benchmarking against thousands of human decisions from the OpenReview dataset revealed that the Automated Reviewer matches human performance, achieving a balanced accuracy of 69% and an F1-score exceeding inter-human agreement in some experiments.

Crucially, this reviewer enabled the discovery of a clear scaling law: as the underlying foundation models improve, the quality of generated papers increases correspondingly. This suggests future versions of The AI Scientist will become substantially more capable as compute costs decrease and model capabilities continue to advance.

Limitations and the Future of AI Science

Despite these breakthroughs, The AI Scientist is still in its early stages. The system occasionally produces naive ideas, struggles with deep methodological rigor, and can be susceptible to hallucinations like inaccurate citations. However, the rapid progress in machine learning indicates that capabilities, once demonstrated, often surpass human performance surprisingly quickly due to scale and improved core models.

While currently limited to computational experiments, the playbook published in Nature is expected to be adapted to other domains, potentially catalyzing scientific advances. The researchers emphasize responsible development, including watermarking AI-generated papers and recommending community norms for AI research.

This publication marks a new era where AI agents can accelerate the pace of scientific discovery, potentially enabling solutions to global challenges.

AI Scientist Now Published in Nature

Under the Hood: From Idea to Publication

Related startups

The Automated Reviewer and Scaling Laws

Limitations and the Future of AI Science

AI Daily Digest