How to print the computed gradient values for a network


#1

I want to print the gradient values before and after doing back propagation, but i have no idea how to do it.

if i do loss.grad it gives me None.

can i get the gradient for each weight in the model (with respect to that weight)?

sample code:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        self.conv11 = nn.Conv2d(3, 64, 3, padding = 1 )
        self.pool1 = nn.AvgPool2d(2, 2)

        self.conv21 = nn.Conv2d(64, 64*2, 3, padding = 1 )
        self.pool2 = nn.AvgPool2d(2, 2)
        
        self.conv52 = nn.Conv2d(64*2, 10, 1)
        self.pool5 = nn.AvgPool2d(8, 8)
        
    def forward(self, x):
        
        x = F.relu(self.conv11(x))
        x = self.pool1(x)

        x = F.relu(self.conv21(x))
        x = self.pool2(x)
        
        x = self.conv52(x)
        x = self.pool5(x)
        
        x = x.view(-1, 10)
        return x
    

net = Net()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
net.to(device)
inputs = torch.rand(4,3,32,32)
labels = torch.rand(4)*10//5
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
inputs = inputs.to(device)
labels = labels.to(device)

outputs = net(inputs)

loss = criterion(outputs, labels.long() )

print(loss.grad)
loss.backward()
print(loss.grad)

optimizer.step()
 


#2

Before the first backward call, all grad attributes are set to None. After the first backward you should see some gradient values. Thereafter the gradients will be either zero (after optimizer.zero_grad()) or valid values.


#3

I understand, but why it is not showing the gradient values :confused:
am i doing something wrong:


# Initialization
net = Net()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
# defining loss
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

#some random input and lables
inputs = torch.rand(4,3,32,32)
labels = torch.rand(4)*10//5
inputs, labels= inputs.to(device), labels.to(device)

# zero_grad
net.zero_grad()
optimizer.zero_grad()

outputs = net(inputs)
loss = criterion(outputs, labels.long() )
print(loss.data)
print(loss.grad)
loss.backward()
print(loss.grad)
optimizer.step()
print(loss.grad)

output:

tensor(2.3276, device='cuda:0')
None
None
None

(Zenghao Liu) #4

Yes, you can get the gradient for each weight in the model w.r.t that weight. Just like this:

print(net.conv11.weight.grad) 
print(net.conv21.bias.grad)

The reason you do loss.grad it gives you None is that “loss” is not in optimizer, however, the “net.parameters()” in optimizer.

optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

And “loss” is not leaf node in computation graph, so you can’t add it to optimizer directly


#5

Sorry for the misunderstanding. I haven’t realized you would like to see the gradient of your loss.
In that case, @zhl515 is right, and you would need to use hooks to get the gradients w.r.t. some intermediate values (i.e. calculated from leaf variables).
Could you try to add loss.register_hook(lambda grad: print(grad)) before the backward call?


#6

@ptrblck when i put loss.register_hook(lambda grad: print(grad)) before loss.backward() it gives me tensor(1., device='cuda:0'), is it what it is supposed to show? what intermediate values it is computing the gradient wrt?

@zhl515 and @ptrblck
I have a follow up question:

print(net.conv11.weight.grad) 

let me to print the grad values for conv11.weights, if i want to set these weights value to zero i thought i can do this:

Temp = net.conv11.weight.grad =net.conv11.weight.grad.clone()
net.conv11.weight.grad = torch.zeros(Temp.size())

but it is throwing

RuntimeError: assigned grad has data of a different type

can you please let me know your suggestion on that?

thanks

update:

I noticed that the second question is solved when i do the following :slight_smile:

net.conv11.weight.grad = torch.zeros(Temp.size()).to(device)

#7

Yes, as you are calculating dLoss/dLoss = 1. Note that you can also pass this gradient directly to your backward call.


#8

wow :sunglasses:
so cool!
thanks

the following code making it more clear for myself, maybe it helps others too

from torch import FloatTensor
from torch.autograd import Variable


# Define the leaf nodes
a = Variable(FloatTensor([4]))

weights = [Variable(FloatTensor([i]), requires_grad=True) for i in (2, 5, 9, 7)]

# unpack the weights for nicer assignment
w1, w2, w3, w4 = weights

b = w1 * a
c = w2 * a
d = w3 * b + w4 * c
L = (10 - d)

L.register_hook(lambda grad: print(grad)) 
d.register_hook(lambda grad: print(grad)) 
b.register_hook(lambda grad: print(grad)) 
c.register_hook(lambda grad: print(grad)) 
b.register_hook(lambda grad: print(grad)) 

L.backward()


for index, weight in enumerate(weights, start=1):
    gradient, *_ = weight.grad.data
    print(f"Gradient of w{index} w.r.t to L: {gradient}")

tensor([1.])
tensor([-1.])
tensor([-7.])
tensor([-9.])
tensor([-9.])
Gradient of w1 w.r.t to L: -36.0
Gradient of w2 w.r.t to L: -28.0
Gradient of w3 w.r.t to L: -8.0
Gradient of w4 w.r.t to L: -20.0


(Paul Gureghian) #9

Print the ‘state_dict_keys’ for the model, then print the specific key and get the values.