Class method not being called?

hyunwookim · September 9, 2021, 2:45am

when I run this class and call the method ALIGNet.forw(x), I get an error saying
"
AttributeError: ‘Net’ object has no attribute ‘forw’

Are there any suggestions as to why this is happening ?

class Net(nn.Module):

def init(self, grid_size):

super().__init__()

self.conv = get_conv(grid_size).to(DEVICE)

self.flatten = nn.Flatten().to(DEVICE)

self.linear1 = nn.Sequential(nn.Linear(80,20),nn.ReLU(),).to(DEVICE)

self.linear2 = nn.Linear(20, 2*grid_size*grid_size).to(DEVICE)

self.linear2.bias = nn.Parameter(init_grid(grid_size).view(-1)).to(DEVICE)

self.linear2.weight.data.fill_(float(0))

self.grid_offset_x = torch.tensor(float(0), requires_grad=True).to(DEVICE)

self.grid_offset_y = torch.tensor(float(0), requires_grad=True).to(DEVICE)

self.grid_offset_x = nn.Parameter(self.grid_offset_x)

self.grid_offset_y = nn.Parameter(self.grid_offset_y)

def forw(self, x):

print(f'X gradient: {get_tensor_info(x)}')

x = self.conv(x)

x = self.flatten(x)

x = self.linear1(x)

x = self.linear2(x)

print(f'X gradient: {get_tensor_info(x)}')

return x

ptrblck · September 9, 2021, 5:17am

The custom forw method works for me using this minimal code snippet:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(1, 1)
    
    def forw(self, x):
        x = self.lin(x)
        return x

model = Net()
x = torch.randn(1, 1)
out = model.forw(x)

(I had to remove parts of your model, which were undefined), but I would generally not recommend to use this approach and to implement the needed forward method to be able to call the model directly and to make sure that e.g. hooks are working properly.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier.

hyunwookim · September 9, 2021, 6:55am

Thanks a lot! I think the issue had to do with the model not being called properly. I also fixed forw to the proper forward function to avoid hooks from being detatched.
I have one more issue with the model–turns out that none of the x’s in the forward method retains a gradient after calling backwards, which is presumably why my network is performing so poorly. Do you notice anything in the above model that would cause such issues? Or could pre-processing of input images (before the forward() call) influence the computational graph anyhow? Truly appreciate your help @ptrblck

ptrblck · September 9, 2021, 7:24am

No, the intermediate activations won’t retain the gradient unless you explicitly ask for it:

lin1 = nn.Linear(1, 1)
lin2 = nn.Linear(1, 1)

# fails
x = torch.randn(1, 1)
y1 = lin1(x)
y2 = lin2(y1)

y2.backward()

print(y1.grad)
> UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.

# works
x = torch.randn(1, 1)
y1 = lin1(x)
y1.retain_grad()
y2 = lin2(y1)

y2.backward()
print(y1.grad)

Yes, don’t use the .data attribute and don’t call to() on the nn.Parameter as it won’t be a leaf node anymore, but use it on the tensor instead:

self.linear2.bias = nn.Parameter(init_grid(grid_size).view(-1).to(DEVICE))

The better way would be to remove all to() operations from the __init__ and call model.to() once it’s initialized.

hyunwookim · September 9, 2021, 7:51am

Thanks a lot for your help. I fixed the part you suggested and I am getting the same errors. I do have a pretty complex loss function, which involves a calculation of loss after applying cumulative sum, upsampling, and warping operations to the output of the network. Since backward() tracks gradients from the endpoint of the loss function, would it be possible that some problem in one of the post-processing operations cause ‘x’ in the forward operation to loose its gradients?

ptrblck · September 9, 2021, 8:26am

This could happen, yes.
To check it:

print the .grad attribute of all parameter before the first backward operation: all should show None
perform the forward and backward pass
print the .grad attribute again and make sure all used paramters show a valid gradient

If that’s the case, the computation graph doesn’t seem to be detached.

hyunwookim · September 9, 2021, 1:14pm

Yep I’ll try that and see if it works. Thanks a lot really