Hi all,
My goal is to replicate the forward pass of the model manually, layer by layer, and verify the results.
Setup:
1.I first run the model end-to-end using PyTorch forward method and save the intermediate layer inputs, weights, and outputs (for both convolution and fully connected layers) in CSV format in a directory (original)
#hooks to save intermediate layer output
def attach_hooks(model):
for name, layer in model.named_modules():
if isinstance(layer, (torch.nn.Conv2d, torch.nn.Linear, torch.nn.ReLU, torch.nn.MaxPool2d, torch.nn.AdaptiveAvgPool2d, torch.nn.Dropout)):
layer.register_forward_hook(lambda layer, inp, out, name=name: save_activation(name, layer, inp, out))
run inference on model
model = load_pretrained_vgg11_model()
attach_hooks(model)
output = model(input_tensor)
…
-
I created a second script that reads these CSV files and performs the same computations manually for each layer (e.g., convolution, ReLU, pooling, linear layers) and saves the outputs of each layer in a another directory (layer_by_layer)
-
I compare the outputs from my manual layer-by-layer calculations with the original outputs that were saved during the initial end-to-end run.
The Problem:
- When I compare the output tensors from my manual calculations with the stored output tensors for each layer, there are significant differences, especially in deeper layers.
- The final prediction from my layer-by-layer inference is incorrect. Instead of predicting “beagle” (which was the correct prediction from the original inference), the model predicts “chain” after softmax is applied to the output of the last fully connected layer.
What I’ve Noticed:
- The maximum difference and relative errors between the computed and stored outputs grow progressively larger in deeper layers. For instance:
For features.0, the maximum difference is ~13.91, with a relative error of ~4569017.
For features.3, the maximum difference is ~45.15, with a relative error of ~475987.
For classifier.6 (the final layer), the maximum difference is ~18.54, with a relative error of ~219.
Questions:
- Why are there significant differences in the intermediate outputs of the layers, and how might this lead to an incorrect final prediction?
- Are there any differences between how PyTorch performs computations in an end-to-end inference compared to my manual layer-by-layer implementation that could explain these discrepancies and the wrong final prediction?
- Has anyone encountered similar issues, and how can I reduce these differences to improve accuracy in the layer-by-layer manual calculations?
Any guidance on resolving these discrepancies or improving the accuracy of my manual calculations would be greatly appreciated!
Thanks in advance!