The purpose of aten::size → prim::ListConstruct → aten::view triplet

hidefromkgb · March 7, 2021, 8:36pm

While exploring a torch.jit.trace() of the standard ResNet50 from torchvision.models I stumbled upon a peculiar structure at the very end of the network:

       <...>
         |
 ________v_________        ________             _________
|                  |      |        |           |         |
| aten::avg_pool2d |      | Int(0) |           | Int(-1) |
|__________________|      |________|           |_________|
         |                    |                     |
         |               _____v______     __________v__________
         |              |            |   |                     |
         |--------------> aten::size |---> prim::ListConstruct |
         |              |____________|   |_____________________|
         |                                          |
         |               ____________               |
         |              |            |              |
         `--------------> aten::view <--------------'
                        |____________|
                              |
                              v             _________
                       #==============# <--|_weights_|
                       # aten::linear #     _________
                       #==============# <--|_biases__|

What exactly is it needed for?
Why cannot aten::avg_pool2d be plugged directly into aten::linear?

At the moment I`m not too well-versed in PyTorch — but in those ML frameworks I`m familiar with nothing prevents such direct connections between 2D Average Pooling and Inner Product.

tom · March 8, 2021, 1:51pm

I think with more recent PyTorch/TorchVision you’d get aten::flatten instead – this a traced version of torch.view(x, [x.size(0), -1]) to move from n,c,h,w to n,features.
Keras does this flattening implicitly when you use “GlobalAveragePooling” because it actually removes the spatial dimensions which PyTorch’s pooling does not.

hidefromkgb · March 8, 2021, 2:08pm

torch.view(x, [x.size(0), -1])

Do I get it right that this just tells torch.view to keep the batch count and merge the remaining dimensions of x together?
Damn, this explains so much. Thank you!

tom · March 8, 2021, 3:36pm

Yes, this is exactly what it does.