Dear Expert,
I’m using a CNN network with Conv2d and Linear layers in a regression problem where I use the MSE as loss function.
For both the Conv and the Linear layers I’m using the xavier_uniform initialisation, and Adam as minimizer.
Does the initialisation strategy is the right one in the context of MSE loss (i.e if I am right the Xavier method was introduced for Classification problem)?