If I understand correctly, resnext50_32x4d means resnet50 with 32 groups (cardinality) and each group has 4 channel outputs (bottleneck width). However, the torchvision implementation has 4 groups with 32 channel outputs.
Can someone help me confirming this?