Does torch.squeeze() affect the gradients?

I modified the WaveNet architecture for binary classification:

class CustomConv(nn.Module):
    def __init__(self):
        super(CustomConv, self).__init__()
        self.conv1 = nn.Conv1d(in_channels=1, out_channels=256, kernel_size=3, padding=1)
        self.blocks = self.build_conv_block(num_blocks, 256)
        self.conv2 = nn.Conv1d(in_channels=256, out_channels=128, kernel_size=1)
        self.conv3 = nn.Conv1d(in_channels=128, out_channels=1, kernel_size=1)
        self.act = nn.ReLU()
        self.linear = nn.Linear(2000, 1)
        self.sigmoid = nn.Sigmoid()

    def build_conv_block(self, num_layers, num_channels):
        block = []
        for _ in range(num_layers):
            for i in range(12):
                block.append(ConvBlock(num_channels, 2**i))
        return nn.Sequential(*block)

    def forward(self, x):
        x = self.conv1(x)
        x = self.act(x)
        _,x = self.blocks((x,0))
        x = self.act(x)
        x = self.conv2(x)
        x = self.act(x)
        x = self.conv3(x)
        x = torch.squeeze(x)
        x = self.linear(x)
        x = self.sigmoid(x)

        return x

My model hasn’t been training very well, and I was wondering if torch.squeeze() could have any (negative) effect on the gradients produced during backprop, or any effect on training in general. Is there better practice for something like this? Any ideas?

Usually you would flatten the activation via

x = x.view(x.size(0), -1)

to create a 2-dimensional tensor out of the 4-dimensional conv output.
If you are using torch.squeeze(x), this will remove a variable size of dimensions.
Assuming that you are using a batch_size>1, at least the channel dimension will be removed.
This would yield an activation of [batch_size, h, w], which is then passed to the linear layer.
nn.Linear takes an input as [batch_size, *, in_features], where * denotes additional dimensions. The linear layer will be applied on each of these additional dimensions and I’m not sure, if that’s what you really want in this case.

Could you replace the squeeze with the view operation and check the results again?

1 Like

Thanks for your response!

I think you’re right – view is better than squeeze in this case as my batch_size>1. So thank you for pointing this out.

When I retrained the model with view, interestingly enough, I got the same accuracy as what I observed with squeeze. I’m not sure why this was the case (any ideas?), but nonetheless thank you for helping me make my code better reflect my intentions.

Are you using nn.BCELoss as the criterion?

I would assume the model to work quite differently if you pass a 2-dimensional tensor vs. a 3-dimensional one into nn.Linear.

Yes, I am using nn.BCELoss as the criterion.

Yeah, I’m not sure why the model gives me the same accuracy. Perhaps I made a mistake – let me double check and retrain both models and get back to you if the accuracies vary.