I’m trying to understand how the shortcuts work in neural networks models (e.g., ResNet models) implemented using Pytorch.
I’m trying the following ResNet block found on a github repository:
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, in_planes, planes, stride=1,
shortcut_enabled = True,
device='cuda'
):
super(BasicBlock, self).__init__()
self.shortcut_enabled = shortcut_enabled
self.conv1 = nn.Conv2d(in_planes, planes,
kernel_size=3,
stride=stride, padding=1, bias=False,
device=device)
self.bn1 = nn.BatchNorm2d(planes, device=device)
self.conv2 = nn.Conv2d(planes, planes,
kernel_size=3,
stride=1, padding=1, bias=False,
device=device)
self.bn2 = nn.BatchNorm2d(planes, device=device)
if self.shortcut_enabled:
self.shortcut = nn.Sequential()
if stride != 1 or in_planes != self.expansion*planes:
self.shortcut = nn.Sequential(nn.Conv2d(in_planes, self.expansion*planes,kernel_size=1, stride=stride, bias=False, device=device),nn.BatchNorm2d(self.expansion*planes,device=device)
)
def forward(self, x, verbose=0):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
if self.shortcut_enabled:
out += self.shortcut(x)
out = F.relu(out)
return out
This block of code runs (both in learning and evaluation mode) without any problem. However, since it makes an inplace operation on a Tensor, i.e. out += self.shortcut(x)
, I expected an error like one of the variables needed for gradient computation has been modified by an inplace operation
, as in other similar models that I tested in the past.
I would understand when, in Pytorch, an inplace operation can be made and when it is not allowed.
why can Resnet Block use in-place operation and It works when training.