Re-basin via implicit Sinkhorn differentiation
Fidel A. G. Pena
Heitor Medeiros
Thomas Dubail
Masih Aminbeidokhti
Eric Granger
Marco Pedersoli
CVPR 2023
[GitHub]
[Paper]



(a) The loss landscape for the polynomial approximation task. θA and θB are solutions found by SGD. LMC suggests that permuting hidden units of θB would result in πP (θB) which is functionally equivalent to before permutation, with no barrier on its linear interpolation with θA. (b) Comparison of the cost value in the linear path before (naive) and after re-basin. The dashed line in both figures corresponds with the original (naive) path between models, and the solid line represents the path and corresponding loss after the proposed Sinkhorn re-basin.



Abstract

The recent emergence of new algorithms for permuting models into functionally equivalent regions of the solution space has shed some light on the complexity of error surfaces, and some promising properties like mode connectivity. However, finding the right permutation is challenging, and current optimization techniques are not differentiable, which makes it difficult to integrate into a gradient-based optimization, and often leads to sub-optimal solutions. In this paper, we propose a Sinkhorn re-basin network with the ability to obtain the transportation plan that better suits a given objective. Unlike the current state-of-art, our method is differentiable and, therefore, easy to adapt to any task within the deep learning domain. Furthermore, we show the advantage of our re-basin method by proposing a new cost function that allows performing incremental learning by exploiting the linear mode connectivity property. The benefit of our method is compared against similar approaches from the literature, under several conditions for both optimal transport finding and linear mode connectivity. The effectiveness of our continual learning method based on re-basin is also shown for several common benchmark datasets, providing experimental results that are competitive with state-of-art results from the literature.


Try our code



Paper and Supplementary Material

Fidel A. G. Pena, Heitor Medeiros, Thomas Dubail, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli.

Re-basin via implicit Sinkhorn differentiation.
In CVPR, 2023.

(hosted on arXiv)

[Bibtex]

Results on Different Tasks

- Model Alignment




Estimated permutation matrices via Weight Matching (WM) and the proposed Sinkhorn re-basin. Pi refers to the expected 10 × 10 permutation matrix with ones represented in black and zeros in white. The estimated permutations matrix Pˆi shows matching permutations as blue squares and miss-matchings in red and yellow. The permutation matrices Pi ∈ R 10×10 correspond with transportation plans of layer i, with each layer containing 10 neurons. These matrices correspond with actual permutation matrices from the experiment with random initialization and 2 hidden layers.




- Linear Mode Connectivity (LMC)



Example of linear mode connectivity achieved by WM, STE, and our Sinkhorn re-basin with CL2, CMid, and CRnd costs for a NN with two hidden layers. Accuracy and loss are shown for the Mnist and Cifar10 classification, while only the L2 loss is shown for the regression tasks. For Mnist, we include an amplified version of the loss and accuracy for better comparison.







Related Works


Samuel K Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. Git Re-Basin: Merging Models modulo Permutation Symmetries. arXiv preprint arXiv:2209.04836, 2022.

Aditya Kumar Akash, Sixu Li, and Nicolas Garcıa Trillos. Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks. arXiv preprint arXiv:2210.06671, 2022.

Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, and Angelika Steger. Random initialisations performing above chance and how to find them. arXiv preprint arXiv:2209.07509, 2022.

Rahim Entezari, Hanie Sedghi, Olga Saukh, and Behnam Neyshabur. The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks. In International Conference on Learning Representations, 2022.




Acknowledgements

This work was supported by Distech Controls Inc., the Natural Sciences and Engineering Research Council of Canada, and the Digital Research Alliance of Canada.