I am building a machine learning framework and need to understand what is stored for backward. Therefore, I checked out torch.autograd.graph.saved_tensors_hooks for some insights. Here is the code that I ran:
import torch.nn as nn import torch class MyModule(nn.Module): def __init__(self, module, name=""): super().__init__() self.module = module self.name = name def forward(self, *args, **kwargs): def pack_hook(tensors): print("in forward hook of", self.name, tensors.shape) return tensors def unpack_hook(tensors): print("in backward hook of", self.name, tensors.shape) return tensors with torch.autograd.graph.saved_tensors_hooks(pack_hook, unpack_hook): rst = self.module(*args, **kwargs) return rst net = nn.Sequential( MyModule(nn.Linear(3, 5), "m1"), MyModule(nn.Linear(5, 7), "m2"), MyModule(nn.Linear(7, 9), "m3") ) x = torch.randn(2, 3) x = net(x)
And here is the result:
in forward hook of m1 torch.Size([2, 3]) in forward hook of m2 torch.Size([5, 7]) in forward hook of m2 torch.Size([2, 5]) in forward hook of m3 torch.Size([7, 9]) in forward hook of m3 torch.Size([2, 7])
After checking out tensors with shape [5, 7] and [7, 9], I found that they were actually transposed version of each linear weight respectively.
I am confused, why the weight of m1, which has shape [3, 5], was not captured by saved_tensors_hooks?