Param-=alpha*param.grad

bhartpence · December 16, 2020, 11:34pm

Hello all,
I have several models that quit working after an update (I think that was the catalyst) - all of the models are CNN and MLP based with code that is very similar to the examples on the site - just different sizes and shapes for layers and different optimizers… For some reason, all of them started failing with the same error:

unsupported operand type(s) for *: ‘float’ and ‘NoneType’

So, it looks like param.grad is None which previously worked fine but now I am getting the error. I recently did a system upgrade that included moving to Python 3.8.5. Before I go tarzaning around my models I thought I would ask you all about it.

Thanks

ptrblck · December 17, 2020, 9:55am

I haven’t seen this error popping up after a specific PyTorch update.
The optimizer should also skip all gradients with a None value, so I assume the error is not raised by the optimizer.

bhartpence · December 17, 2020, 3:40pm

Thanks for getting back. It is a little weird - I have dozens of models all running the same datasets and have never had a problem with it before. I even have ensembles of models with not prior problem. I am trying not to tunnel vision but the updates were substantial - system, applications, Python, etc.

So, if it is NOT the updates, any suggestions on where to look? Just for fun, here is a code snippet:

for epoch in range(epochs):
    for i, data in enumerate(train_loader,1):
        inputs, targets = data
        
        if use_cuda:             #Use this for all other optimizers
            y_pred=net_model(inputs)
            loss=loss_fn(y_pred, targets)
            time=str(datetime.now())  
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()                   
        with torch.no_grad():
            counter=0
            for param in net_model.parameters():
                param-=alpha*param.grad

    loss_array[epoch]=loss.item()    #Un-comment these 6 lines if NOT using LBFGS
    if epoch%100==0:
        print(datetime.now(),"Loss for epoch:", epoch, loss.item())
    if loss.item() < 1e-8:
        print(datetime.now(),"Loss for epoch:", epoch, loss.item())
        break

bhartpence · December 17, 2020, 4:11pm

Minor update - I ran models without specifying an optimizer and I get the same error.

ptrblck · December 17, 2020, 8:02pm

I’m unsure why you are manipulating the parameters manually, if your are already using an optimizer, but this would at least explain why this operation could fail (as said before, the optimizer would just skip it).

Your code snippet works using this simple example:

net_model = nn.Linear(1, 1)
loss_fn = nn.MSELoss()
inputs = torch.randn(1, 1)
targets = torch.randn(1, 1)

optimizer = torch.optim.SGD(net_model.parameters(), lr=1e-3)

y_pred=net_model(inputs)
loss=loss_fn(y_pred, targets)

optimizer.zero_grad()
loss.backward()
optimizer.step()
alpha = 0.1         
with torch.no_grad():
    counter=0
    for param in net_model.parameters():
        param-=alpha*param.grad

so could you post an executable code snippet to debug the issue?

bhartpence · December 18, 2020, 6:22pm

Well, OK. First let me say that I appreciate your help with this. Second - I got it working. What the update did was kill my CUDA install. Looking at the code I realized that I had “if use_cuda” but not an else statement because I was lazy.

But, your comment about individual parameters has me wondering if I am doing this the best way. So here is the current working code.

use_cuda=torch.cuda.is_available()
device=torch.device(“cuda:0” if use_cuda else “cpu”)
x=X_train
print(“Shape of x”,x.size())
y=Y_labels
print(“Shape of y”,y.size())
print()
loss_array=torch.zeros(epochs,1)
tick=datetime.now()

loss_fn=torch.nn.MSELoss(reduction=‘sum’)
optimizer=torch.optim.Adadelta(net_model.parameters(),lr=1.0,rho=0.9,eps=1e-06,weight_decay=0)

for epoch in range(epochs):
    for i, data in enumerate(train_loader,1):
        inputs, targets = data
        #print(inputs.size(), targets.size())
        #print(epoch, i, data[1])
        
        if use_cuda:             #Use this for all other optimizers
            #print("CUDAing")
            y_pred=net_model(inputs)
            loss=loss_fn(y_pred, targets)
            time=str(datetime.now())  
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            #print('Here')
        else:
            #print("NOT CUDAing")
            y_pred=net_model(inputs)
            loss=loss_fn(y_pred, targets)
            time=str(datetime.now())  
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            #print('Here')                
            
        with torch.no_grad():
            #counter=0
            for param in net_model.parameters():
                param-=alpha*param.grad

    loss_array[epoch]=loss.item()    #Un-comment these 6 lines if NOT using LBFGS
    if epoch%100==0:
        print(datetime.now(),"Loss for epoch:", epoch, loss.item())
    if loss.item() < 1e-8:
        print(datetime.now(),"Loss for epoch:", epoch, loss.item())
        break

Thanks again.

ptrblck · December 19, 2020, 1:22am

I’m not familiar with your use case, but the optimizer would already update all passed parameters in the optimizer.step() method. You are currently subtracting the alpha*grad again from each parameter even after the parameters were already updated.
Could you explain your use case a bit more, as I might misunderstand it?

bhartpence · December 20, 2020, 1:16am

You are being very kind I think we have an artifact of an earlier build. Before I discovered pytorch, I used to do things by manually coding everything and so this probably something I simply didn’t take out or simply forgot about. Been so focused on what we could get done I may have lost sight of some basic principles. Time to RTFM and I guess I know what I am doing with the other models…

I ran the current model without the lines you were wondering about and it seems to run fine.

BTW, the application deals with packet traffic on communication networks and seeing what we can do with machine learning to solve some problems.

And again, I appreciate the time and the clarity - if you are ever heading to Rochester I’d love to buy you a beer or coffee by way of thanks.

Bruce

ptrblck · December 20, 2020, 7:35am

Ha, thanks! I’ll let you know once I’m in the area.