I’m trying to implement a custom version of ReLU that requires a bit more logic. It looks something like this
class ReLU(torch.nn.Module):
def __init__(self, in_features):
super(ReLU, self).__init__()
# Parameterization of the special ReLU layer
self.lambdas = Parameter(torch.rand(in_features))
def forward(self, x):
# compute some upper and lower bounds on the input rows
bounds = map(row_bound, x)
_, epsilon_id = x.shape
# loop over rows
for i, (l, u), lmb in zip(range(x.shape[0]), bounds, self.lambdas):
# check if the upper bound is <= 0 then the ReLU returns 0
if u <= 0:
# Inplace op
x[i] = x[i] * 0
# If the bounds cross 0 the ReLU implements some custom logic
elif l < 0:
x = torch.nn.ZeroPad2d((0, 1))(x)
# Inplace op
x[i] = x[i] * lmb
if lmb >= u / (u - 1):
# Inplace op
x[i, epsilon_id] = -l * lmb / 2
else:
# Inplace op
x[i, epsilon_id] = u * (1 - lmb)
# Inplace op
x[i, 0] = x[i, 0] + x[i, epsilon_id]
epsilon_id += 1
# If neither if branch ran the lower bound >= 0 and the ReLU returns identity
return x
The input x comes in as an n by m tensor. The forward pass first computes upper and lower bounds for each row, then based on those bounds it either zeroes that row, leaves it, or applies a bit more logic based on the layer parameters lambda.
The issue is I don’t know how to modify individual rows of x without performing in-place operations, and if I do those operations I get frequent errors doing backpropagation to learn the layer parameters.
x[i] = x[i] * lmb
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [6]] is at version 3; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
My one attempt to fix this was to make whatever gets passed into the model, i.e x
, require gradients, but this seems unnecessary since I don’t ever want to compute gradients of x
and it still doesnt help because I get the following error likely because I’m directly manipulating trainable parameters
RuntimeError: leaf variable has been moved into the graph interior
My only other thought is that the logic is too complex for autograd and in need to implement this ReLU as a function with a proper backward pass method, but I don’t want to go there if I don’t have to.
If anyone has some insight on how to solve this problem that would be much appreciated. Thanks