Binary cross entropy unusual input and target VAE

Hi Team,
I am new to Pytorch so this might be a silly question but hopefully someone can help, I have tried my best over the past few days to debug/educate myself more to try and resolve my problem but have been unable to.

Set up:

  1. I am building a variational auto encoder
  2. Data is just y values of some graph that has multiple Gaussians, there are 239 points.
  3. I have 45k of these “y data” arrays for training and 5k for testing
  4. I am preprocessing my data by removing outliers and applying a whitening transformation which uses the mean and std per dimension calculated on all 50k sets of data

VAE: I followed to blog post tutorial by Raviraja G

I am using these parameters

BATCH_SIZE = 64     # number of data points in each batch
N_EPOCHS = 20      # times to run the model on complete data
INPUT_DIM = 239    # size of each input
HIDDEN_DIM = 50    # hidden dimension
LATENT_DIM = 5     # latent vector dimension
lr = 1e-7          # learning rate

#import for generalisation, so data is fed in randomly

and this is how I have set up the encoder, decoder and VAE

class Encoder(nn.Module):
    """ This the encoder part of VAE

    def __init__(self, input_dim, hidden_dim, z_dim):


        self.linear = nn.Linear(input_dim, hidden_dim) = nn.Linear(hidden_dim, z_dim)
        self.var = nn.Linear(hidden_dim, z_dim)

    def forward(self, x):
        # x is of shape [batch_size, input_dim]

        hidden = F.relu(self.linear(x))
        # hidden is of shape [batch_size, hidden_dim]
        z_mu =
        # z_mu is of shape [batch_size, latent_dim]
        z_var = self.var(hidden)
        # z_var is of shape [batch_size, latent_dim]

        return z_mu, z_var

class Decoder(nn.Module):
    """ This the decoder part of VAE
    def __init__(self, z_dim, hidden_dim, output_dim):

        self.linear = nn.Linear(z_dim, hidden_dim)
        self.out = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        # x is of shape [batch_size, latent_dim]

        hidden = F.relu(self.linear(x))
        # hidden is of shape [batch_size, hidden_dim]

        predicted = torch.sigmoid(self.out(hidden))
        # predicted is of shape [batch_size, output_dim]

        return predicted

class VAE(nn.Module):
    """ This is the VAE, which takes a encoder and decoder.

    def __init__(self, enc, dec):

        self.enc = enc
        self.dec = dec

    def forward(self, x):
        # encode
        z_mu, z_var = self.enc(x)

        # sample from the distribution having latent parameters z_mu, z_var
        # reparameterize
        std = torch.exp(z_var / 2)
        eps = torch.randn_like(std)
        x_sample = eps.mul(std).add_(z_mu)

        # decode
        predicted = self.dec(x_sample)
        return predicted, z_mu, z_var

And I am running training and testing the following way.

def train(model):
    # set the train mode
    # loss of the epoch
    train_loss = 0
    for i, (x,n,f) in enumerate(train_iterator):
        # reshape the data into [batch_size, 784]
        x = x.view(-1, 239)
        x =

        # update the gradients to zero

        # forward pass
        x_sample, z_mu, z_var = model(x.float())

        #reconstruction loss
            recon_loss = F.binary_cross_entropy(x_sample, x.float(), size_average=False)
            print(np.min(x.numpy()), np.max(x.numpy()), np.min(x.numpy()), np.max(x.numpy()))

        # kl divergence loss
        kl_loss = 0.5 * torch.sum(torch.exp(z_var) + z_mu**2 - 1.0 - z_var)
        # total loss
        loss = recon_loss + kl_loss

        # backward pass
        train_loss += loss.item()
        # update the weights

    return train_loss

def test(model):
    # set the evaluation mode
    # test loss for the data
    test_loss = 0
    # we don't need to track the gradients, since we are not updating the parameters during evaluation / testing
    with torch.no_grad():
        for i, (x,n,f) in enumerate(test_iterator):
            # reshape the data
            x = x.view(-1, 239)
            x =

            # forward pass
            x_sample, z_mu, z_var = model(x.float())
            #reconstruction loss
                recon_loss = F.binary_cross_entropy(x_sample, x.float(), size_average=False)
                print(np.min(x.numpy()), np.max(x.numpy()), np.min(x.numpy()), np.max(x.numpy()))
            # kl divergence loss
            kl_loss = 0.5 * torch.sum(torch.exp(z_var) + z_mu**2 - 1.0 - z_var)

            # total loss
            loss = recon_loss + kl_loss
            test_loss += loss.item()

    return test_loss

train_ll = []
test_ll = []

best_test_loss = float('inf')

for e in range(N_EPOCHS):

    train_loss = train(model)
    test_loss = test(model)

    train_loss /= len(train_dataset)
    test_loss /= len(test_dataset)

    print(f'Epoch {e}, Train Loss: {train_loss:.2f}, Test Loss: {test_loss:.2f}')

    if best_test_loss > test_loss:
        best_test_loss = test_loss
        patience_counter = 1
        patience_counter += 1

    if patience_counter > 3:

During the training/testing stage my F.Binary cross entropy fails as it says the values need to be between 0,1. This is where things get odd. before feeding my data into the VAE it appears to be scaled correctly. The target and input values for the BCE are usually between [-1, 3] and occasionally have a large or small value |3000| somewhere in that tensor.
Occasionally it does train without failing, sometimes it fails. When it does not fail the loss does not really get better but the VAE kind of works?

And here are the results, blue line is the original data and yellow is output of the same data through the VAE.


  1. I am unsure if the way I have set up the VAE is the best way to do this problem, WHat are some other things I could change/test?
  2. Why would the BCE fail only sometimes when nothing has been changed?
  3. The error I get sometimes from BCE, What are some things I could do to check/debug what could be creating that.
  4. How do I improve the VAE? should I bump up the epochs or something else? I am unsure how to interpret the Loss results.

Any suggestions or thought would be greatly appreciated. thank you :slight_smile:

The raised error points most likely to the model output, which might not be in the required range of [0, 1]. Note that also the target should be in the same range, but it seems that you are using values outside of [0, 1]?

  1. Try to add print statements for x_sample and check the min and max values.

  2. Unsure at the moment, but the error is usually raised, if the model output is out of the range.

Thanks so much for getting back to me I really appreciate it!
I did what you recommended and here are some of the results…
x_sample is constantly between [0,1]. X doesn’t appear to be scaled as it should be. Here is an example of some of the values in the order, during training stage.
min(x_sample), max(x_sample), min(x), max(x)

If I understand correctly the X values should also be between [0,1] which suggests that my whitening method is incorrect. I am using the following formulas…

y* = (y - mean)/std

where mean is an array or means per input dimension from entire training set.
and std is array of std per input dimension from entire training set.

If this is correct are there any reasons to why the data wouldn’t scale to the correct range [0, 1]?

The training now runs but the reconstruction loss is obviously useless currently due to the mis match between x_sample and x.

If you have any suggestion to correct my initial scaling or if something else appears wrong please let me know,
again thank you very much for your time.

You are currently passing x as the target, so I’m not sure why you are normalizing it.
Also, I don’t think that the current value range calculates what you expect given in the binary cross entropy formula.

The posted formula will not normalize the tensor to the range [0, 1], but will standardize it to a zero mean and unit variance.

I wouldn’t try to normalize the target, but leave it in the range [0, 1].

Let me know, if I misunderstand your use case or code, please.

hmm ok I will try be a little clearer,
To my understanding, (which might be wrong)
X is the data before going into the encoder and x_sample is the data that has come out of the decoder.

Because I am using my own data I have done what I believe to be the correct preprocessing, which is to apply the whitening method described above which as you corrected me on, standardizes my data to 0, before feeding it into the VAE.

Are you suggesting that for BCE to work correctly, my target (The input to the VAE) has to be scaled to the range [0,1]? If this is correct, how would I do this? Should I scale it per piece of data that I have or should I take the min max out of my entire data set and scale by the global min max?

I am unclear on a) what should the target values look like. b) how they should be scaled and any preprocessing used on the target. c) that I am correct in thinking that x (target) is input to VAE and x_sample (input) is the out put of the VAE,

Thanks again for your help

I would try to normalize the complete dataset to values in the range [0, 1] for the input and target.
You might standardize the input e.g. in your forward method, but I’m not sure, if this would help or if it could even be harmful.

If you pass a target outside of [0, 1], your loss might get negative, which seems weird to me (also I’m not sure what the target outside of [0, 1] would mean in theory for binary cross entropy).

Hey Patrick!

Your suggestions helped me figure out what was wrong, I had an extra sigmoid layer in the decoder and so the output was not being return to the correct domain as the input, hence BCE couldn’t calculate an accurate measure.

Thank you for your help :slight_smile: