I am building a machine learning framework and need to understand what is stored for backward. Therefore, I checked out torch.autograd.graph.saved_tensors_hooks for some insights. Here is the code that I ran:
import torch.nn as nn
import torch
class MyModule(nn.Module):
def __init__(self, module, name=""):
super().__init__()
self.module = module
self.name = name
def forward(self, *args, **kwargs):
def pack_hook(tensors):
print("in forward hook of", self.name, tensors.shape)
return tensors
def unpack_hook(tensors):
print("in backward hook of", self.name, tensors.shape)
return tensors
with torch.autograd.graph.saved_tensors_hooks(pack_hook, unpack_hook):
rst = self.module(*args, **kwargs)
return rst
net = nn.Sequential(
MyModule(nn.Linear(3, 5), "m1"),
MyModule(nn.Linear(5, 7), "m2"),
MyModule(nn.Linear(7, 9), "m3")
)
x = torch.randn(2, 3)
x = net(x)
And here is the result:
in forward hook of m1 torch.Size([2, 3])
in forward hook of m2 torch.Size([5, 7])
in forward hook of m2 torch.Size([2, 5])
in forward hook of m3 torch.Size([7, 9])
in forward hook of m3 torch.Size([2, 7])
After checking out tensors with shape [5, 7] and [7, 9], I found that they were actually transposed version of each linear weight respectively.
I am confused, why the weight of m1, which has shape [3, 5], was not captured by saved_tensors_hooks?
Many thanks!