I would appreciate it if you could give me a detailed explanation of what affine does in nn.Conv2d() or nn.BatchNorm2d().
It is just scale & shift: y = x*w+b, for batch norm it is done channelwise, i.e.: x[B,C,H,W] * w[1,C,1,1]+b[1,C,1,1].
Conv operations don’t have this functionality, as kernel and bias parameters implicitly do scale & shift.
Thank you for responding!