How to see or check the "output gradient" by each layer in pytorch with sequential

Hello~ Guys~!

I am working on the pytorch to learn.

And There is a question how to check the output gradient by each layer in my code.

I do want to get the “output gradient squared sum” by first and second layer!

My code is below

#import the nescessary libs
import numpy as np
import torch
import time

# Loading the Fashion-MNIST dataset
from torchvision import datasets, transforms

# Get GPU Device

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (0.5,))
# Download and load the training data
trainset = datasets.FashionMNIST('MNIST_data/', download = True, train = True, transform = transform)
testset = datasets.FashionMNIST('MNIST_data/', download = True, train = False, transform = transform)
trainloader =, batch_size = 32, shuffle = True, num_workers=4)
testloader =, batch_size = 32, shuffle = True, num_workers=4)

# Examine a sample
dataiter = iter(trainloader)
images, labels =

# Define the network architecture
from torch import nn, optim
import torch.nn.functional as F

model = nn.Sequential(nn.Linear(784, 128),
                      nn.Linear(128, 10),
                      nn.LogSoftmax(dim = 1)

# Define the loss
criterion = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr = 0.001)

# Define the epochs
epochs = 5

train_losses, test_losses = [], []

# start = time.time()
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
    # Flatten Fashion-MNIST images into a 784 long vector
        images =
        labels =
        images = images.view(images.shape[0], -1)

    # Training pass
        output = model.forward(images)
        loss = criterion(output, labels)
#         print(loss.grad)

        running_loss += loss.item()

What I want to get I want to print the gradient squared sum between first (nn.Linear(784, 128)) layer and second (nn.ReLU()) layer, and want to print how it is improved by every epoches!

Is there any best way to achieve this goal? or is there any adivce to know the gradient of output between first and second layers?


Does registering a hook (e.g., torch.Tensor.register_hook — PyTorch master documentation) work?

Thank you for reply, Where do I need to place register_hook?!