I try to train a convolutional variational auto encoder on greyscale images but getting poor/no results.
I use a discriminator to try to force the latent space representation into a sensible distribution.
After analyzing the gradients I found that they become none after few batches.
So I tried debugging with torch.autograd.set_detect_anomaly(True)
And I get the Error in the Title, it is triggered when calling loss.backwards()
I checked the batch of inputs and there are no none values, also my batch size is greater than 1 so there shouldn’t be a problem with the normalization of the batches.
My model architecture is:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [4, 8, 254, 254] 80
BatchNorm2d-2 [4, 8, 254, 254] 16
ReLU-3 [4, 8, 254, 254] 0
Conv2d-4 [4, 8, 127, 127] 584
BatchNorm2d-5 [4, 8, 127, 127] 16
ReLU-6 [4, 8, 127, 127] 0
Conv2d-7 [4, 8, 127, 127] 584
ConvBlock-8 [4, 8, 127, 127] 0
Conv2d-9 [4, 8, 127, 127] 64
ResDownBlock-10 [4, 8, 127, 127] 0
BatchNorm2d-11 [4, 8, 127, 127] 16
ReLU-12 [4, 8, 127, 127] 0
Conv2d-13 [4, 16, 64, 64] 1,168
BatchNorm2d-14 [4, 16, 64, 64] 32
ReLU-15 [4, 16, 64, 64] 0
Conv2d-16 [4, 16, 64, 64] 2,320
ConvBlock-17 [4, 16, 64, 64] 0
Conv2d-18 [4, 16, 64, 64] 128
ResDownBlock-19 [4, 16, 64, 64] 0
Linear-20 [4, 200] 13,107,400
Linear-21 [4, 200] 13,107,400
================================================================
Total params: 26,219,808
Trainable params: 26,219,808
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.00
Forward/backward pass size (MB): 96.70
Params size (MB): 100.02
Estimated Total Size (MB): 197.73
----------------------------------------------------------------
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Linear-1 [4, 65536] 13,172,736
BatchNorm2d-2 [4, 16, 128, 128] 32
ReLU-3 [4, 16, 128, 128] 0
Conv2d-4 [4, 8, 128, 128] 1,160
BatchNorm2d-5 [4, 8, 128, 128] 16
ReLU-6 [4, 8, 128, 128] 0
Conv2d-7 [4, 8, 128, 128] 584
ConvBlock-8 [4, 8, 128, 128] 0
Conv2d-9 [4, 8, 128, 128] 128
ResUpBlock-10 [4, 8, 128, 128] 0
BatchNorm2d-11 [4, 8, 256, 256] 16
ReLU-12 [4, 8, 256, 256] 0
Conv2d-13 [4, 1, 256, 256] 73
BatchNorm2d-14 [4, 1, 256, 256] 2
ReLU-15 [4, 1, 256, 256] 0
Conv2d-16 [4, 1, 256, 256] 10
ConvBlock-17 [4, 1, 256, 256] 0
Conv2d-18 [4, 1, 256, 256] 8
ResUpBlock-19 [4, 1, 256, 256] 0
Sigmoid-20 [4, 1, 256, 256] 0
================================================================
Total params: 13,174,765
Trainable params: 13,174,765
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 94.00
Params size (MB): 50.26
Estimated Total Size (MB): 144.26
----------------------------------------------------------------
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
BatchNorm1d-1 [4, 200] 400
ReLU-2 [4, 200] 0
Linear-3 [4, 128] 25,728
LinearBlock-4 [4, 128] 0
BatchNorm1d-5 [4, 128] 256
ReLU-6 [4, 128] 0
Linear-7 [4, 1] 129
LinearBlock-8 [4, 1] 0
Sigmoid-9 [4, 1] 0
================================================================
Total params: 26,513
Trainable params: 26,513
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.03
Params size (MB): 0.10
Estimated Total Size (MB): 0.13
----------------------------------------------------------------