SyncBatchNorm for libtorch, anyone?

I’m looking for a way to implement the behavior of nn.SyncBatchNorm in libtorch. Does anybody have any ideas/suggestions on how to do this?

Without synchronizing batch statistics, running distributed training with libtorch is a no-go (assuming the network has BatchNorm layers), as I have learned the hard way! Others must have run through the same issue, and hopefully implemented a custom layer to solve the problem? I’m not even hoping for a ready-to-use solution, but maybe some ideas on how to put it together. Thanks!