Question re: torch.nn.Dropout2d

I was going through the Pytorch Recipe: Defining a Neural Network in Pytorch, and I didn’t understand what the torch.nn.Dropout2d function was doing and what its purpose was in the following algorithm in Step 2 of the recipe when it teaches us how to define and initialize the neural network.

Question 1: The comments say that the torch.nn.Dropout2d function is “Designed to ensure that adjacent pixels are either all 0s or all active with an input probability” – what “adjacent pixels” is this referring to, and what is the purpose of making them either all 0s or all active? also, what is the purpose of giving them a probability?

Question 2: Also, I don’t understand where the “9216” (i.e. the number highlighted) comes from for the first parameter in nn.Linear where self.fc1 is defined. The second convolutional layer outputted 64 features, but then the input to the first fully connected layer has 9216 features, so I don’t see the connection? I’m assuming what I’m missing is whatever the torch.nn.Dropout2d function is doing, or perhaps not, I’m not sure.

Any guidance on both of my above questions would be so appreciated, thank you!

  1. “Adjacent pixels” refer to the values of an entire channel, which would be dropped using nn.Dropout2d, while nn.Dropout would drop random pixel locations as seen here:
x = torch.randn(2, 2, 4, 4)
drop = nn.Dropout2d()
out = drop(x)
print(out)
> tensor([[[[ 0.0000,  0.0000, -0.0000, -0.0000],
            [-0.0000, -0.0000,  0.0000, -0.0000],
            [-0.0000,  0.0000,  0.0000,  0.0000],
            [-0.0000, -0.0000,  0.0000,  0.0000]],

           [[-1.7700,  0.5395,  1.1095,  1.5484],
            [-1.5528, -0.6495,  1.5294,  0.6949],
            [-1.5919,  0.3380,  2.6201,  2.0743],
            [-1.5087,  0.5487, -0.4077,  1.1598]]],


          [[[-5.5042, -0.3527, -0.1202,  1.6333],
            [ 1.9476, -1.1323,  1.2164, -1.9838],
            [ 1.9263,  1.0842, -1.4239, -0.8705],
            [ 2.7384,  1.5202,  2.0018, -1.3804]],

           [[ 0.0000, -0.0000,  0.0000, -0.0000],
            [ 0.0000,  0.0000, -0.0000,  0.0000],
            [ 0.0000,  0.0000,  0.0000,  0.0000],
            [-0.0000,  0.0000,  0.0000,  0.0000]]]])

drop = nn.Dropout()
out = drop(x)
print(out)
> tensor([[[[ 1.3324,  0.0000, -0.0000, -0.0000],
            [-0.9783, -0.0000,  0.0000, -4.8274],
            [-0.0000,  2.1448,  0.1935,  0.0000],
            [-0.6435, -0.0000,  2.8480,  0.2524]],

           [[-0.0000,  0.0000,  0.0000,  1.5484],
            [-1.5528, -0.6495,  0.0000,  0.0000],
            [-0.0000,  0.0000,  2.6201,  0.0000],
            [-0.0000,  0.0000, -0.4077,  0.0000]]],


          [[[-0.0000, -0.3527, -0.1202,  0.0000],
            [ 0.0000, -0.0000,  0.0000, -0.0000],
            [ 0.0000,  1.0842, -1.4239, -0.8705],
            [ 2.7384,  1.5202,  0.0000, -0.0000]],

           [[ 2.2113, -0.5281,  3.7269, -1.7598],
            [ 0.0000,  0.0000, -0.0000,  0.0000],
            [ 3.0427,  0.0000,  0.0000,  0.0000],
            [-0.0000,  0.4729,  0.0000,  0.0000]]]])

The purpose is best explained in the Dropout paper, which explains it can help avoiding “co-adaption”. The specified probability is a hyperparameter and specifies the drop probability.

  1. The in_features of a linear layer are defined by the number of features of the flattened input activation. Since nn.Conv2d layers output a 4-dimensional tensor in the shape [batch_size, channels, height, width], you would specify the in_features of the linear layer as channels*height*width and flatten the activation via nn.Flatten() or manually via x = x.view(x.size(0), -1).