Gradient is always zero when using a from_numpy tensor

jdoliner · October 5, 2022, 9:42pm

I have a net that seems to work properly when I pass it a tensor created with torch.rand. However when I pass it real data which comes from a c++ library I wrote all of the gradients come back as 0. Here’s what the code looks like:

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import libmachine

class Net(nn.Module):
	def __init__(self):
		super(Net, self).__init__()

		self.conv1 = nn.Conv1d(libmachine.ENCODED_CARD_SIZE, libmachine.MAX_CARDS, 1)
		self.conv2 = nn.Conv1d(libmachine.MAX_CARDS, libmachine.MAX_CARDS * 2, 1)
		self.fc1 = nn.Linear(200, 1)
		self.fc2 = nn.Linear(100, 1)

	def forward(self, x):
		x = torch.unsqueeze(x, 2)
		x = self.conv1(x)
		x = F.relu(x)
		x = self.conv2(x)
		x = F.relu(x)
		x = torch.flatten(x, 1)
		x = self.fc1(x)
		x = torch.squeeze(x, 1)
		x = self.fc2(x)
		x = torch.sigmoid(x)
		return x

net = Net()
loss = nn.L1Loss()
opt = optim.SGD(net.parameters(), lr = 10.0e6)
opt.zero_grad()
print([p.grad for p in net.parameters()][0])

m = libmachine.Machine()
m.new_game()
options = m.options()
# score = net(torch.from_numpy(m.encode_board(options[0])).float())
score = net(torch.rand(libmachine.MAX_CARDS, libmachine.ENCODED_CARD_SIZE))
target = torch.tensor([1.0])
print("score: ", score, "target: ", target)
out = loss(score, target)
out.backward()
print([p.grad for p in net.parameters()][0])

The code as it is works correctly, however if I switch where the data is coming from:

score = net(torch.from_numpy(m.encode_board(options[0])).float())
# score = net(torch.rand(libmachine.MAX_CARDS, libmachine.ENCODED_CARD_SIZE))

then all the gradients are suddenly 0.
Is there something special I have to do to get an input tensor from numpy.

AlphaBetaGamma96 · October 5, 2022, 10:15pm

Hi @jdoliner,

If you’re passing data from torch.from_numpy it’ll have no gradients because its operations aren’t tracked by autograd. Unless I’m missing something?

jdoliner · October 5, 2022, 10:17pm

No that seems to match with what I’m seeing. So is there an easy way to convert it into a “real” torch tensor so I can get gradients? Googling hasn’t given me a way to do it other than from_numpy.

Thanks for responding.

AlphaBetaGamma96 · October 5, 2022, 11:48pm

Can you print out what torch.from_numpy(m.encode_board(options[0])).float() is? Is this the line that fails? And # score = net(torch.rand(libmachine.MAX_CARDS, libmachine.ENCODED_CARD_SIZE)) works?

jdoliner · October 6, 2022, 5:18am

Printing out torch.from_numpy(m.encode_board(options[0])).float() gives me:

tensor([[ 2.,  0.,  0.,  ...,  0., -1., -1.],
        [ 2.,  1.,  1.,  ...,  0., -1., -1.],
        [ 1.,  2.,  0.,  ...,  0., -1., -1.],
        ...,
        [ 0.,  0.,  0.,  ...,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  ...,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  ...,  0.,  0.,  0.]])

and yes, that’s the line that fails. I.e. gives me 0 gradients. The commented line works fine.

jdoliner · October 7, 2022, 9:01pm

I figured it out, the issue wasn’t the fact that the tensor was coming from numpy it was because of the values in the tensor being too large and always giving values on the outside of the sigmoid. So the gradient was being computed, it just actually was 0. Normalizing my input data so that the values were between 0 and 1 fixed it.