Understanding how torch.nn.Module works

vfdev-5 · October 19, 2017, 2:06pm

Thanks, this is exactly what happend. In this case, probably, it would be better to replace

weights = Variable(torch.zeros((2, 1)).float(), requires_grad=True)
x.mm(weights)

by something like

weights = Variable(torch.zeros((2,)).float(), requires_grad=True)
x.mv(weights)

zhanluezhao · June 7, 2018, 9:50pm

Thanks a lot for your posts. I have learned a lot.

Would you please let me know what version of pytorch you used? Your code produced one warning and one error on my machine. In particular,
loss = torch.mean((net_input - y)**2) does not work as expected due to broadcasting.

I have fixed these issues. I am new to pytorch. I am wondering if this indicates pytorch’s portability is not good. Or I may have missed something.

Thanks a lot!

ZL

rasbt · June 7, 2018, 10:10pm

Glad that this was useful, but please note that this particular code example is ~1.5 years old (might have been the very, very first version of PyTorch, PyTorch 0.1).

In PyTorch 0.4, it’s not necessary to wrap everything into a “Variable” anymore. Below, I updated the code accordingly:

OLD:

from torch.autograd import Variable
import torch


x = Variable(torch.Tensor([[1.0, 1.0], 
                           [1.0, 2.1], 
                           [1.0, 3.6], 
                           [1.0, 4.2], 
                           [1.0, 6.0], 
                           [1.0, 7.0]]))
y = Variable(torch.Tensor([1.0, 2.1, 3.6, 4.2, 6.0, 7.0]))
weights = Variable(torch.zeros(2, 1), requires_grad=True)


for i in range(5000):

    net_input = x.mm(weights)
    loss = torch.mean((net_input - y)**2)
    loss.backward()
    weights.data.add_(-0.0001 * weights.grad.data)
    
    if loss.data[0] < 1e-3:
        break

print('n_iter', i)
print(loss.data[0])

NEW:

import torch


x = torch.tensor([[1.0, 1.0], 
                  [1.0, 2.1], 
                  [1.0, 3.6], 
                  [1.0, 4.2], 
                  [1.0, 6.0], 
                  [1.0, 7.0]], requires_grad=True)
y = torch.tensor([1.0, 2.1, 3.6, 4.2, 6.0, 7.0], requires_grad=True)
weights = torch.zeros(2, 1, requires_grad=True)


for i in range(5000):

    net_input = x.mm(weights)
    loss = torch.mean((net_input - y)**2)
    loss.backward()
    weights.data.add_(-0.0001 * weights.grad.data)
    
    if loss.data < 1e-3:
        break

print('n_iter', i)
print(loss.data)

DIFF between OLD and NEW:

Liang_Li · March 12, 2019, 2:51pm

x = torch.tensor([[1.0, 1.0],
[1.0, 2.1],
[1.0, 3.6],
[1.0, 4.2],
[1.0, 6.0],
[1.0, 7.0]], requires_grad=True).cuda()
y = torch.tensor([1.0, 2.1, 3.6, 4.2, 6.0, 7.0], requires_grad=True).cuda()
weights = torch.zeros(2, 1, requires_grad=True).cuda()

for i in range(5000):

net_input = x.mm(weights)
print(net_input)
loss = torch.mean((net_input - y) ** 2)
loss.backward()
weights.data.add_(-0.0001 * weights.grad.data)

if loss.data < 1e-3:
    break

print(‘n_iter’, i)
print(loss.data)

I tried to run this simple example in GPU, but when I add all tensor to cuda(), the weights is updated nowhere, pop with an error, AttributeError: ‘NoneType’ object has no attribute ‘data’.

Anyone can help?

ptrblck · March 12, 2019, 3:45pm

Set the device inside the tensor creation:

device = 'cuda'
x = torch.tensor([[1.0, 1.0],
                  [1.0, 2.1],
                  [1.0, 3.6],
                  [1.0, 4.2],
                  [1.0, 6.0],
                  [1.0, 7.0]], requires_grad=True, device=device)
y = torch.tensor([1.0, 2.1, 3.6, 4.2, 6.0, 7.0], requires_grad=True, device=device)
weights = torch.zeros(2, 1, requires_grad=True, device=device)

net_input = x.mm(weights)
print(net_input)
loss = torch.mean((net_input - y) ** 2)
loss.backward()
weights.data.add_(-0.0001 * weights.grad.data)

The cuda() call creates a new tensor with the same content, for which the gradients won’t be computed.