Inception Block Implementation

I’m still unsure why the inception module takes some parts from Figure 4 and some from Figure 5 in the paper. From the code it looks like the InceptionA module has a 5x5 branch as per Figure 4 (inception v1 / googlenet) but the 3x3 branch is as per Figure 5.

I also found some other discrepancies like the fact that inception_v3 architecture as per Table 1 in the paper uses a conv with 3x3 / stride=1 / input = 73x73x64 right after 1st pool. However in the code, Conv2d_3b_1x1 after 1st pool uses 1x1 not 3x3.

Could someone please explain the idea behind these choices? I’m assuming this inception_v3 has to match with Google’s implementation since the pretrained weights were directly converted from Google. In that case how do we explain the differences between Google’s implementation and the arch reported on their paper?

cc: @smth