How to measure model loading time layer by layer

I wanna profile layer-wise loading time for DL models in pytorch. As shown below, I defined a timestamp list lTime and used it in forward_pre_hooks. However, I found that sum(iTime) is always not equal to duration(model.cuda()) and there is such big difference, for example, for resnet18, layerwise loading time I measured was 2.8ms but 3.5ms by model.cuda(). Could you help me ?