2D Convolutional VAE loss getting stuck in a local minima

chaslie · August 24, 2020, 11:16am

Ok, I give up.

I am trying to get a 2D convolutional autoencoder working with non image based data. The data is of size[-1,1,264,132].

The loss function get stuck at around 430, which means that some of the data is encoded, however some is not. I have tried playing with the hyper parameters on ADAM, and found that if i drop the LR to below 5e-4 the it gets a loss function Nan. I have tried different loss functions and nothing works (using the standard KLD value the error still gets stuck just at different non 0 values). The one i am currently using is:

def log_normal_pdf(sample,mean,logvar):
    log2pi=torch.tensor(np.log(2.*np.pi))
    return torch.sum(-0.5*((sample-mean)**2.*torch.exp(-logvar)+logvar+log2pi),1)
def loss_fn(Gen_model, data,lab):
    mean, logvar = Gen_model.encode(data,lab)
    z2=Gen_model.reparm(mean, logvar)
    out=Gen_model.decode(z2,lab)
    criterion = torch.nn.BCEWithLogitsLoss(size_average=True,reduce=False, reduction='mean')
    BCE=criterion(out,data)
    logpx_z=-torch.sum(BCE,(1,2,3),keepdim=False)
    logpz=log_normal_pdf(z2,torch.tensor(0.),torch.tensor(1.))
    logqz_x=log_normal_pdf(z2, mean, logvar)
    return -torch.mean(logpx_z+logpz-logqz_x),out

I have included the reparamatization

    def encode(self,data,lab):
        data=self.Encoder(data,lab)
        mean, logvar=torch.chunk(data,2,dim=1)
        return mean, logvar

    def reparm(self,  mean, logvar):
        epsilon = torch.randn(mean.size()).to(device)
        z = epsilon*torch.exp(logvar.mul(0.5))+mean
        #print("z=",z.shape)
        return z

I now i have got something wrong (obviously), but i cannot see it. I read somewhere that the the input data should be normalised with a mean of zero and an std of 1, however this cannot happen because the BCE function needs data between [0,1].

The raw data (before normalising) ranges from +1000 to -1000, normalising this data sets the pre normalised 0 to 0.5, the -1000 to 0 and +1000 to 1. the increments can therefore be very small, am i right in thinking that this maybe causing the issue??? if so, would it be better to create an input data set of [-1,3,264,132]?