I am trying to calculate the validation loss along with the training set loss while training a simple VAE network. But, I am receiving a CUDA error as follows:
<ipython-input-27-5638e9c724e6> in unSupTrain(epoch)
22 recon_batch, mu, sigma = model(data)
23 # Get valloss value
---> 24 vallossitem = elbo(recon_batch, data, mu, sigma).item()
25 # Append loss to history
26 hist_validation += vallossitem
RuntimeError: CUDA error: device-side assert triggered
It probably occurs because of the loss function setup but I somehow couldn’t find any solution for separating training and validation loss.
Here is the elbo function:
def elbo(recon_x, x, mu, sigma):
'''Loss function.'''
# Reshape the input
x = x.view(-1, INP_SIZE)
# Binary cross entropy
RE = F.binary_cross_entropy(recon_x, x, reduction='sum')
# KL divergence
KL = F.kl_div(recon_x, x, reduction='sum', log_target=True)
# Return the loss
return RE - KL
and training function (certain parts of validation and training loss calculation):
# ---------- Validation -----------
# Don't track gradients
with torch.no_grad():
# Trace
print('Running Validation')
for batch_idx, (data, _) in enumerate(valloader):
# Convert to cuda if possible
data = data.to(device=DEVICE)
# Forward
recon_batch, mu, sigma = model(data)
# Get valloss value
vallossitem = elbo(recon_batch, data, mu, sigma).item()
# Append loss to history
hist_validation += vallossitem
# Round loss
vallossitem = round(vallossitem / len(data), 3)
# -------- Training -------- #
for batch_idx, (data, _) in enumerate(trainloader):
# Convert to cuda if possible
data = data.to(device=DEVICE)
# Zero grad
optimizer.zero_grad()
# Forward
recon_batch, mu, sigma = model(data)
# Loss calculation
trainloss = elbo(recon_batch, data, mu, sigma)
# Backward
trainloss.backward()
# Get trainloss value
trainlossitem = trainloss.item()
# Append loss to history
hist_training += trainlossitem
I would appreciate any suggestions. Thanks in advance!