The increasing autonomy of Large Language Models (LLMs) within multi-agent ecosystems necessitates robust minimax training. However, standard approaches falter when non-linear policies create extreme local curvature, leading to instability. Existing remedies, such as enforcing global Jacobian bounds, prove overly conservative, stifling necessary sensitivity and incurring a significant 'Price of Robustness.' This work introduces a novel solution, Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned method that precisely controls sensitivity along adversarial ascent directions.
Beyond Conservative Global Constraints
AAJR fundamentally shifts the paradigm from global sensitivity limitations to a more targeted approach. By controlling sensitivity strictly along adversarial ascent directions, the method allows for a strictly larger admissible policy class compared to global constraints, under mild conditions. This structural improvement promises a weakly smaller approximation gap and reduced nominal performance degradation, directly addressing the limitations of current regularization techniques as detailed on arXiv.