What are the differences between fan in and fan out mode? I know it says to preserve the magnitude of variance vs. magnitude of gradient. Isn’t it always better to preserve the gradient? How would the mode affect model’s performance?
What are the differences between fan in and fan out mode? I know it says to preserve the magnitude of variance vs. magnitude of gradient. Isn’t it always better to preserve the gradient? How would the mode affect model’s performance?