When a module is used multiple times, for example in a siamese network, are the gradients averaged?
[...]
output1 = net(image1)
output2 = net(image2)
loss = ...
loss.backward()
[...]
When a module is used multiple times, for example in a siamese network, are the gradients averaged?
[...]
output1 = net(image1)
output2 = net(image2)
loss = ...
loss.backward()
[...]
the gradients are summed (i.e. accumulated).