[SOLVED] Loss.backward hangs

educob · February 9, 2019, 4:19pm

Hi.

I just installed pytorch on linuxmint 18.3. Cuda 9.0.

I am running one simple program (no cuda on it).

import torch
print(torch.__version__)
import numpy as np
import matplotlib.pyplot as plt

from torch.autograd import Variable

# Training Data
def get_data():
    train_X = np.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
    train_Y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])
    dtype = torch.FloatTensor
    X = Variable(torch.from_numpy(train_X).type(dtype),requires_grad=False).view(17,1)
    y = Variable(torch.from_numpy(train_Y).type(dtype),requires_grad=False)
    return X,y

def plot_variable(x,y,z='',**kwargs):
    l = []
    for a in [x,y]:
        if type(a) == Variable:
            l.append(a.data.numpy())
    plt.plot(l[0],l[1],z,**kwargs)

def get_weights():
    w = Variable(torch.randn(1),requires_grad = True)
    b = Variable(torch.randn(1),requires_grad=True)
    return w,b

def simple_network(x):
    y_pred = torch.matmul(x,w)+b
    return y_pred

def loss_fn(y,y_pred):
    loss = (y_pred-y).pow(2).sum()
    for param in [w,b]:
        if not param.grad is None: param.grad.data.zero_()
    loss.backward()
    print("Loss.data:", loss.data)
    print("Loss.data[0]:", loss.data[0])
    return loss.data[0]


def optimize(learning_rate):
    w.data -= learning_rate * w.grad.data
    b.data -= learning_rate * b.grad.data

learning_rate = 1e-4
print("Starting")

x,y = get_data()               # x - represents training data,y - represents target variables
w,b = get_weights()           # w,b - Learnable parameters
for i in range(500):
    print("i:", i)
    y_pred = simple_network(x) # function which computes wx + b
    loss = loss_fn(y, y_pred)   # calculates sum of the squared differences of y and y_pred
    if i % 50 == 0: 
        print(loss)
    optimize(learning_rate)    # Adjust w,b to minimize the loss

x_numpy = x.data.numpy()
plot_variable(x,y,'ro')
plot_variable(x,y_pred,label='Fitted line')

loss.backward() hangs.

Thanks.

joeyIsWrong · February 9, 2019, 4:31pm

One approach you can try is to run on a different version of GPU, like from titan x to gtx 1080ti. It helps me sometimes.

I have no idea why though

joeyIsWrong · February 9, 2019, 4:32pm

Sorry I did not see no cuda…Then I have no idea

sparseinference · February 9, 2019, 4:35pm

I just looked at your code quickly and I don’t see where you zero the gradients.

educob · February 9, 2019, 9:02pm

update 2: After I uninstall and then compile pytorch it worked.
But after a while it’s hanging at the loss.backward.

I don’t understand this unstability.

educob · February 10, 2019, 12:20pm

I was installing pytorch 0.4. After installing 1.0 at last it works (not all cause torch.cuda.is_available() returns false). At last.