I don’t think I found any info on this online but what are the default filters used by Conv2d?
I think filters are initialized with random values and they are updated via backpropagation.
Wait the filter values are updated, meaning that they likely change, every epoch?
As per my intutions, they should update every time when we do loss.backward().
Am I in the right directions, experts?
loss.backward() calculates all gradients in the current computation graph (including the filter weight gradients). The weights are updated if you call
optimizer.step() (and passed these parameters to the optimizer before) or update the weights manually using the gradients.
@modeler in fact the weights change in every iteration if the gradients are non-zero.
What is the reasoning for a randomized initialization as opposed to some non-zero constant initialization?
I think symmetry breaking might be one reason, although the problem of equal outputs won’t be that serious like in linear layers.
E.g. if you initialize a linear layer with some constant weight, each output will have the same value. Later in the backward pass this could create the same weight updates for each parameter etc. Each weight thus cannot learn anything “new”, and you would have a whole layer of a cloned neuron.
Random initialization breaks this symmetry.
Also, I think another reason might be that your constant values might bias your model towards a particular solution, which might be useful, if you know what you are doing.
In addition to Peter’s spot-on comments about symmetry breaking, there is a the lottery ticket hypothesis, roughly speaking the theory that (overparametrised by traditional standards) NNs are “looking in many places of the parameter landscape, thereby picking up some useful ones”.
Weight initialization in particular is something that has been identified as fairly important and I can recommend spending thought on it - PyTorch inherits the initializations mostly from Torch, and might not always reflect the latest advice of how to do it. Most stock modules have a method
reset_parameters that has the default (e.g. do
?? torch.nn.Conv2d.reset_parameters to see the source in IPython/Jupyter).
In contrast to weight, bias can, in my experience, often just be zeroed.