Does group norm require multi gpu synchronization?

I know batch norm benefits from it. I’m guessing group norm would benefit from it as well?

Also, are there best practices regarding batch size per gpu for group norm? For example, I see a suggestion that batch norm works better when number of samples per gpu is greater than 4.

1 Like