I’ve tried to chain ReLU and Dropout, both in place:
import torch import torch.nn as nn import torch.nn.functional as F class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.conv = nn.Conv2d(3, 1, 1) self.relu = nn.ReLU(inplace = True) self.dropout = nn.Dropout(inplace = True) def forward(self, x): return self.dropout(self.relu(self.conv(x))).sum() model = Net() model.cuda() model.train() model(torch.autograd.Variable(torch.FloatTentsor(1, 3, 16, 16).cuda().uniform_())).backward()
This fails with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
It seems one could still compute the gradient of ReLU even if Dropout was applied inplace after, since dropout is just a multiplication by a positive number and doesn’t change the ReLU gating mask.
One can of course write a simple module for doing it in a combined way, but I was wondering your thoughts on expressing this in PyTorch (say, by disabling dirty checking if a module is marked by some special attribute) and possibility of fusion when JIT arrives?