The conventional wisdom treats pretrained models as mere starting points for iterative adaptation. However, a new perspective from Yulu Gan and Phillip Isola, detailed on arXiv, reframes pretraining's outcome not as a single vector, but as a distribution rich with task-specific experts.
The Dense Landscape of Large Model Experts
In small models, specialized solutions are rare, requiring structured optimization like gradient descent to find. The researchers observed a notable shift in large, well-pretrained models: the density of these task-experts increases dramatically. This means diverse, task-improving specialists are not outliers but populate a substantial fraction of the neighborhood around the pretrained weights. This insight fundamentally alters how we approach adapting these powerful models.