Christina Baek @ ICML
3 months
8/ We prove that in 2-layer deep linear nets, SAM’s Jacobian regularizes the norm of the last-layer weights and features. Empirically, this also happens in deeper networks. Also, if we explicitly do SGD + regularize the feature/weight norm, we see a big boost in performance!