Probably you would have to change the code a bit and skip the creation of the nn.Sequential modules (line of code).
Do you need x for the training, i.e. are you feeding it to another layer, or is it just for debugging purposes?
If the latter is true, you could use hooks to get the activation.
Thanks for your reply!
Yes, I need x for training thus I’ve considered “hook”, but it doesn’t fit my purpose.
Can you elaborate more about the modification?
Sure. Let’s first see if we can scale down the problem.
Would it be sufficient, if we just use BottleNeck modules or du you need the flexibility of BasicBlock/BottleNeck?
Also, do you need the different ResNet architectures, i.e. resnet18, resnet34, ... or can we focus on just one implementation?
OK.
I want to compute a regularization term using x or out, i.e. torch.norm(x) or torch.norm(out). Then sum over them among all BottleNeck, finally add it to loss term as a regularizer.
Cool. Just register the hook on shotcut does help. So if I want to pass out any feature map inside the bottlenek, I just register it after the specific layer, am I right?
i.e. register_forward_hook on Bottleneck.bn3 can give me the final feature map of Bottleneck?
But one more question, what if I want this out before this line? Right after computing out += residual
ptrblck, I met a troublesome bug.
Since the feature map of each layer is distributed on different GPUs when using DataParallel. How can I add (or other manipulation) them up? An error is got: RuntimeError: arguments are located on different GPUs ?
I’ve searched this problem but couldn’t found a solution.