Different performance in identical architectures

I have re-implemented ResNet 18. I verified if the architecture is correct by printing my model and the actual pytorch resent implementation. However my resnet model does not seem to learn, what could be the reason for this eventhough printing the model shows that the archs are identical?

Printing the model via print(model) shows only all created modules, not the forward implementation and thus neither how the modules are used.
Also all functional calls are missing from the output, which could be used in the forward.
You could either compare the source codes directly (your vs. torchvision implementation) or load the same state_dict into both models and compare the output of each layer to narrow down where a difference might be coming from.

1 Like

it could be much easier to help if you could provide a minimal repreducible example.
apart from what ptrblk said, you could be missing the proper weight initialization. check that as well.

1 Like

I just knew the first answer would come from ptrblck!

Thanks for the insight, the problem was in my forward block. Naive of me to not consider that!

1 Like