I found this Topic.
And there’s one sentence in the reply said that:

batchnorm layer does not require it’s output to be able to perform the backward pass.

I don’t understand these words. Why we don’t need the output here?

Also, I tried some experiments:

``````        # layer define
self.l1 = nn.Linear(2, 1)
self.l2 = nn.Linear(1, 1)
self.l3 = nn.Linear(1, 1)
self.bn1 = nn.BatchNorm1d(1)
``````
``````        # forward V1.0
x1 = self.l1(input)
x2 = self.l2(x1)
x = self.bn1(x2)
x += self.l3(x1)
``````
``````        # forward V2.0
x1 = self.l1(input)
x = self.l2(x1)
x += self.l3(x1)
``````

Both forwarding is working, so I am also curious about why V2.0 also works?
It seems no difference between BatchNorm and Linear layer.

BatchNorm is only responsible for normalizing the activations from the previous layer, so that all the activations from the previous layer have their mean=0 and std=1.

In V1.0 doing `x = self.bn1(x2)` is only normalizing x2, and then passing the normalized value to x.
In V2.0 you are just not doing normalization, except that the code is same.

But I think I didn’t have a perfect explanation on the second question.

My question about V2.0 and V1.0 is that why V2.0 works?
It seems that it is using in-place operation which should raise an error while running(mentioned in this same link).
But in my case, both version is working? I am curious about that.

Inplace operation in your case does not make much of a difference. Try to see from this perspective you only have one inplace operation and no operations are dependent on it, so there are fewer chances of any conflict. Also, errors in inplace operation occur generally when we have multiple vectors referencing same memory and then we try to update them inplace resulting in inconsistency.
A simple example if x needs value of y in backprop, but we already updated y inplace, then it becomes a problem for x.

Also check the docs as it explains inplace operations in quite some detail.

I see.