Appropriate method for weight sharing in convolutional layers?

Thanks @ptrblck .
Since the gradients are added up, in order to effectively average the update across the different forward passes, would dividing the gradients by the number of forward passes shared weights are used in, using a backward hook, be a good way to do this?

I see in this topic, @albanD recommended sharing weights by .clone()ing the same weight into the weight attribute of different modules on the forward pass. Are there reasons that would better or more memory efficient than using F.conv2d and passing in the weight tensor to a functional call?