What is the proper arrangement for creating a convolutional block?

Jeev · November 29, 2023, 12:45pm

Hi, I want to learn how to create a custom convolutional block in PyTorch. I will make it like this

nn.conv2d(),
nn.ReLU(),
nn.BatchNorm2d(),

or like this

nn.conv2d(),
nn.BatchNorm2d(),
nn.ReLU(),

Which one is correct, and please explain why it should be arranged like that?

tom · November 29, 2023, 1:14pm

ResNet uses the latter and while people have said things about why, it would seem that the main thing is that it works well in practice.
That said, with bias in BN initialized to 0, you have that the inputs to relu are 50% > 0 and 50% < 0 if the distribution is somewhat symmetric, so the relu “does something” and the outputs don’t all vanish.
Also note that quite often, this order is in the context of a residual block, so 0 means that the skip connection’s term is passed forward, which might not be a bad thing to have reasonably often.
But again, if the other thing would work better, people would use it and invent stories why that way is the best.

Best regards

Thomas

Jeev · November 30, 2023, 9:27am

Thank you for your response, but I’m confused about why when the bias in Batch Normalization (BN) is initialized to 0, half of the input to the ReLU function will be greater than 0, and the other half will be less than 0. Also, what is meant by bias here? Is it the same as weight parameters?

I apologize for asking too many questions, it’s because I don’t know anything yet and am just starting to learn about CNN.

tom · November 30, 2023, 9:45am

Oh, that was sloppy. If you have an distribution with median matches approximately the mean and the batch norm sets the mean to 0, you have that half the inputs are < 0. The bias is added to the batch norm at the end (γ in the formula).

Best regards

Thomas