An easier way to get the intermediate activations would be forward hooks as described here.
I think you are seeing a flattened tensor, since you are only replacing the modules, while the functional API call in the forward
(such as x = x.view(x.size(0), -1)
) are still used.