"I think the bias-variance trade-off is an incredible misnomer. There doesn't actually have to be a trade-off." This provocative statement from Professor Andrew Wilson of NYU, articulated during his interview with MLST, encapsulates a fundamental challenge to decades of machine learning orthodoxy. Wilson, a distinguished figure in AI research, spoke with the interviewer about the prevailing misconceptions surrounding model complexity, generalization, and the very nature of artificial intelligence, arguing that many deeply held beliefs are not only wrong but actively hinder progress.
The prevailing wisdom in machine learning has long dictated a cautious approach to model complexity. The classic bias-variance trade-off posits a delicate balancing act: a model that is too simple (high bias) might underfit, failing to capture underlying patterns, while one that is too complex (high variance) might overfit, essentially memorizing data and performing poorly on unseen examples. For years, this trade-off has guided model design, pushing practitioners to fear overly expressive models.
Professor Wilson argues that this foundational belief is fundamentally flawed. He points to phenomena like "double descent" and "benign overfitting," where increasing model complexity, beyond a certain point, surprisingly leads to *better* generalization, not worse. This counterintuitive behavior suggests that massive, overparameterized models are not inherently prone to catastrophic overfitting. Instead, they exhibit a "simplicity bias," effectively acting like an automatic Occam's razor. The solution to overfitting, paradoxically, might be to make the model even larger, not smaller.
He asserts that parameter counting is a "very bad proxy for model complexity." What truly matters, Wilson contends, are the properties of the "induced distribution over functions" that a model can represent. A model's capacity shouldn't be judged solely by its raw parameter count, but by its inherent preferences for certain types of solutions. This perspective allows for the possibility of models that are both incredibly expressive and flexible, yet simultaneously strongly biased toward simple, generalizable solutions.
Wilson’s core philosophy champions building models that honestly reflect our beliefs about the world. We believe the world is complex, so our models should be expressive enough to capture that complexity. Yet, we also believe in Occam's razor – that the simplest explanation is often the best. Bayesian methods, with their inherent marginalization process, naturally embody this duality, acting as an "automatic Occam's razor" by favoring simpler explanations consistent with the data, even within highly flexible model classes. This approach allows for the development of models that perform well across varying data regimes, from small to large datasets, without requiring constant manual intervention or a re-evaluation of fundamental assumptions.
Challenging such entrenched conventional wisdom is never easy. Wilson acknowledges the inherent resistance, stating, "We should always be trying to do that because otherwise we're just preaching to the choir... if you're not changing anyone's beliefs about anything, then maybe it doesn't make a difference." He posits that much progress in AI has been "stalled... by just getting stuck on misconceptions." By re-evaluating the bias-variance trade-off, understanding the benefits of scale, and embracing a Bayesian perspective, researchers can unlock new avenues for building more robust, adaptive, and truly intelligent systems.

