Is there a reason why residual blocks are written with high verbosity?

Hi there.

I was trying to implement a simple residual block, and I was looking for other people’s codes.

One thing that really caught my eye is that the residual block while in theory being x = f(x) + x, is implemented usually like this:

def forward(self, x):
     identity = x
     x = f(x) + identity
     return x

I was wondering why most are not writing this with the simpler form:

def forward(self, x):
     x += f(x)
     return x

The question might sound trivial or superficial, but I suspect maybe there is a deeper reason that just semantics at work here.

Both return the same value, but the difference is that the second form would modify the input in-place and return the same input tensor object. The first form returns a new tensor, which is what we want.

1 Like