CVAE after 38 Epochs is dropping to NaN in the loss function

Good morning,

I hope everyone is staying safe and isolating as much as possible/required…

I have a problem with a 1D CVAE i am creating. no matter what I do after 38 or 39 epochs I always get a NaN value in the loss function when using the BCEwithLogitsLoss.

I have a feeling that it is to do with the loss function calculation or i am doing something wrong in setting the model up, but i really cannot figure it out.

Loss Function

def sample(self, eps=None):
      if eps is None:
          eps = torch.randn(1, self.lat_dim)
          print("eps=", eps)
     return self.decode(eps, apply_sigmoid=True)

def loss_fn(model, data):
    mean, logvar = model.encode(data)

    criterion = torch.nn.BCELoss(size_average=False,reduce=False, reduction='sum')
    #criterion = torch.nn.BCEWithLogitsLoss(size_average=False,reduce=False, reduction='sum')
    logqz_x=log_normal_pdf(z2, mean, logvar)
    return logvar,mean,loss,out,logqz_x,logpz,logpx_z,z2

Model Activation

        data = data.cuda()
        logvar,mean,loss,out,logqz_x,logpz,logpx_z,z2 = loss_fn(model, data)

as a quick asside the loss function starts to drop to zero but never drops below 300 and always has a grad_fn=

I have wondered if the problem is in loss.backward() & optimizer.Step()…

I hope someone can spot the error…

Many thanks & stay safe everyone


Since this issue seems to be reproducible, could you store the model output and target, which creates the NaN value and check their values?

Hi ptrblck,

I hope you are well.

I have looked at the output from the last 3 epochs before failing and the only thing that looks strange is in the epoch before NaN appears a single value in the mean tensor goes to NaN, though there should be no reason for this.


ps. this is related to my other post about cuda run time, which i am currently looking into.

Just a quick sanity check- Make sure your input features are normalized.
You can use this transformation on your features if required-

features = (features - features.min())/(features.max() - features.min())

It will normalize your features and makes it’s minimum and maximum value 0 and 1 respectively.

hi Braindotai,

Thanks for this, I have normalised the inputs using:

def normalize(x):
    x_normed = x / x.max(1, keepdim=True)[0]
    return x_normed


Is the mean value calculated using these lines of code?

    logqz_x=log_normal_pdf(z2, mean, logvar)

If so, could you try to store the activations and upload them so that we could try to reproduce this issue?