Extremely high output values from the network (1e34) with no training with normalization

Hello, after experimenting with multiple off-the-shelf and written from scratch networks I am starting to feel there is something wrong with my network without being able to understand what:
My network

class MySubPixelCNN(nn.Module):
    def __init__(self, upscale_factor,num_features):
        super(MySubPixelCNN, self).__init__()

        self.relu = nn.ReLU()
        self.conv1 = nn.Conv2d(num_features, 64, kernel_size=5, stride=1, padding=2)
        self.bb1 = nn.BatchNorm2d(64)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)
        self.bb2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1)
        self.bb3 = nn.BatchNorm2d(32)
        self.conv4 = nn.Conv2d(32, num_features*upscale_factor ** 2, kernel_size=3, stride=1, padding=1)
        self.pixel_shuffle = nn.PixelShuffle(upscale_factor)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bb1(self.relu(x))
        x = self.conv2(x)
        x = self.bb2(self.relu(x))
        x = self.conv3(x)
        x = self.bb3(self.relu(x))
        x = self.conv4(x)
        x = self.pixel_shuffle(x)
        return x

My input images are textures (I.e not the ordinary Images), with a simply preprocessing to have them in [0,1].
Yet, sometimes (which makes the network immediately diverge)
model(input).max() > 1e30
As mentioned, I always add batch norm, and as the network is not that deep, I simply cannot understand what could’ve gone wrong.

Shouldn’t it be num_features * 2 ** upscale_factor?

No, why? Is that might be the reason why the results vary from 0 to 1e34?

I am now 100% certain that it is a problem with cuda 11, When using another GPU I have with cuda10.1 it works fine. I cannot prove it unfortunately, but I now understand that it started when I installed cuda11 and conda install pytorch=10.2 as previous posts here suggested it is fine. I assume it is considered a bug, how can I nail the problem and report it?

Could you post some information for this setup, i.e.:

  • which GPU are you using
  • how did you install CUDA11 and which version exactly
  • which cudnn version are you using
  • which PyTorch commit are you using
  • are you seeing the extremely high loss after a certain number of iterations for random inputs in [0, 1]?