The .grad attribute of a Tensor that is not a leaf Tensor is being accessed

Hi! I’m a newbie in pytorch and i’m learning linear regression with pytorch.
Below is my jupyter notebook when building model.
In In[11] (tle last block shown below),I encountered problems as below:

  1. The .grad attribute of a Tensor that is not a leaf Tensor is being accessed…
  2. TypeError: unsupported operand type(s) for *: ‘float’ and ‘NoneType’ in line 17 of In[19].

I’ve been reading docs of pytorch and similar problems but I’m still totally confused about what happened.
Any tips on how I can solve these problems?

import libs

In[1]

import torch
import random
import matplotlib.pyplot as plt
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

generate data and labels

Generate data with y = xw + b + \epsilon
Assuming that there’s n samples, each sample has m demensions.
So x is an n * m matrix, w is an m * 1 vector,$b and \epsilon are scalars.
In[2]

def generate_data(w, b, num_of_samples, num_of_demensions_per_sample):
    """
    To generate data with y = xw + b + ϵ.
 
    real_w: weight <- torch.tensor
        real_w.shape = Size([1, num_of_demensions_per_sample]) or Size([num_of_demensions_per_sample, ])
    real_b: bias <- torch.tensor or int
    num_of_samples <- int
    num_of_demensions_per_sample <- int

    return: x, y -> (torch.tensor, torch.tensor)
        x.shape = Size([num_of_samples, num_of_demensions_per_sample])
        y.shape = Size([num_of_samples, 1])
    """
    real_w.reshape((-1, 1))
    x = torch.normal(0, 1, (num_of_samples, num_of_demensions_per_sample)).to(device)
    # generate an n * m matrix, elements satisfy standard normal distribution
    y = (torch.matmul(x, real_w) + real_b).to(device) # calculate y without random noise
    y += torch.normal(0, 0.01, y.shape).to(device) # add random noise (in normal distribution) to y
    return x, y.reshape((-1, 1)) # return samples and its demensions together with its labels

In[3]

real_w = torch.tensor([3.2, 6.4, -1.5, 5]).to(device)
real_b = torch.tensor([4.7]).to(device)
num_of_samples=1000
num_of_demensions_per_sample=4
x, y = generate_data(real_w, real_b, num_of_samples, num_of_demensions_per_sample)
print(f"real_w.T = {real_w.T}, real_b = {real_b}")
print(f"x.shape = {x.shape}, y.shape = {y.shape}.")

data iter

recieve x, y, then return iterable batches of data
In[4]

def data_iter(x, y, batch_size):
    """
    To recieve x and y, then it returns iterable batches of data.

    x: examples and it's demensions <- torch.tensor
    y: examples' laels <- torch.tensor
        x.shape = Size([num_of_samples, num_of_demensions_per_sample])
        y.shape = Size([num_of_samples, 1])
    batch_size: size of batch <- int

        x.shape = Size([batch_size, num_of_demensions_per_sample])
        y.shape = Size([1, batchsize])
    """
    num_of_examples = len(x)
    indexes = list(range(num_of_examples)) # generate list of indexes (from 0 to num_of_examples-1)
    random.shuffle(indexes)
    for i in range(0, num_of_examples, batch_size):
        batch_indexes = torch.tensor(indexes[i: min(i + batch_size, num_of_examples)])
        yield x[batch_indexes], y[batch_indexes]

In[5]

# test
batch_size = 10
for sample, label in data_iter(x, y, batch_size):
    print(sample)
    print(label)
    break

build model

In[6]

# params
w = torch.normal(0, 0.01, size=(num_of_demensions_per_sample, 1), requires_grad=True).to(device)
b = torch.zeros(1, requires_grad=True).to(device)

In[7]

# y
def estimated(x, w, b):
    """
    returns y according to y = xw + b

    x <- torch.tensor
        x.shape = Size([a_certain_number, num_of_demensions_per_sample])
    w <- torch.tensor
        w.shape = Size([num_of_demensions_per_sample, 1])
    b <- torch.tensor
        b.shape = Size([1, ])

    return: y -> torch.tensor
        y.shape = Size([a_certain_number, 1])
    """
    return (torch.matmul(x, w) + b).to(device)

loss function

In[8]

def squared_loss(estimation, label):
    """
    loss function between estimated value and truth

    estimation <- torch.tensor
        estimation.shape = Size([a_certain_number, 1])
    label <- torch.tensor
        label.shape == estimation.shape

    return: l(estimation, label) -> torch.tensor
        l.shape = estimation.shape
    """
    return (0.5 * (estimation - label.reshape_as(estimation)) ** 2).to(device)

In[9]

# optimizer
def msgd(params, lr, batch_size):
    """
    mini-batch stochastic gradient descent

    params: [w, b] <- [torch.tensor. torch.tensor]
        w <- torch.tensor
            w.shape = Size([num_of_demensions_per_sample, 1])
        b <- torch.tensor
            b.shape = Size([1, ])
    lr: learning rate
        0 < lr < 1
    batch_size <- int
    """
    with torch.no_grad(): # no need for grad when updating params
        for param in params:
            param -= lr * param.grad / batch_size # TODO: PROBLEM OCCURED
            param.grad.zero_()

In[10]

train

num_of_epochs = 10
net = estimated
loss = squared_loss
lr = 0.02

In[11]

for epoch in range(num_of_epochs):
    # train
    for example, label in data_iter(x, y, batch_size):
        l = loss(net(example, w, b), label)
        l.sum().backward()
        msgd([w, b], lr, batch_size)
    # estimate
    with torch.no_grad():
        train_loss = loss(net(x, w, b), y)
        print(f"epoch {epoch + 1:3d}, loss = {float(train_loss.mean()):.2f}")

Hi,
Regarding your error, I don’t see any problem in the code you provided. Can you double check if the error still exists by restarting your jupyter notebook? You can try it on google colab and share it here.

BTW, pytorch provides much simpler and more elegant way to do linear regression. The following code is an example of linear regression.

import torch.nn as nn
from torch.optim import SGD

net = nn.Linear(4, 1, bias=True)  
# Build your model, by default weights and bias are initialized by Kaiming initialization see code at https://github.com/pytorch/pytorch/blob/e6a3154519721eec323b227ea8dfbd05409cfe37/torch/nn/modules/linear.py#L88

# you can also initialize the weights manually by nn.init
optimizer = SGD(lr=0.02)

for epoch in range(num_of_epochs):
    # train
    for example, label in data_iter(x, y, batch_size):
        optimizer.zero_grad()                          # clean the gradient before backward()
        output = net(example)                         # forward pass
        train_loss = loss(output, label).sum()
        train_loss.backward()                         # backward pass to accumulate gradient 
        optimizer.step()                                   # update model parameters
       
    # estimate
    with torch.no_grad():
        test_loss = loss(net(x), y)
        print(f"epoch {epoch + 1:3d}, loss = {float(test_loss.mean()):.2f}")




1 Like

Hi,
Your advice really helped me a lot!
I removed all .to(device) and tried again locally and it works!
Then I deployed it on colab, the cpu version can be executed smoothly, but the gpu version still has the same problem.
In fact, this is my first time to use pytorch with gpu, so maybe there’s something wrong with my programming (espcially for variables on gpu), but now I realized that this error may be linked with variables on cpu and gpu or something else.

And thank you for your idea in using torch.nn and torch.optim. To tell you the truth, I’ve tried it before writing this notebook and succeeded. But I want to use scratch to make sure that I really understand all the principles. Personally I believe that it’s essential for beginners to realize how the network works. Although modern deep learning frameworks can do a lot of work almost automatically, implementing them from scratch ensures that I really know what I’m doing.

.to is computation if it changes the device, so your parameter tensors will not be leaves. Replacing .to(device) with a device=device keyword argument in the factory functions (torch.normal and torch.zeros) should solve make it work on GPU as well.

Best regards

Thomas

1 Like

It works! Thanks a lot!
I replaced .to(device) with device=device key word argument and it works.
Now learned that to(device) is used for device conversion and device=device represents the device on which a tensor is or will be allocated.
You really hepled me a lot!

Typos:

  1. TypeError ... ‘float’ and ‘NoneType’ in line 17 of In[19].TypeError ... ‘float’ and ‘NoneType’ in line 17 of In[9].
  2. demensiondimension
    :relaxed: