hi Ptrblck,

The model has completed its run with

```
## debugging the remove when debugged
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
```

and the result is:

```
31 6912 tensor(298.7733, device='cuda:0', grad_fn=<NegBackward>)
32 6912 tensor(287.6319, device='cuda:0', grad_fn=<NegBackward>)
33 6912 tensor(345.7334, device='cuda:0', grad_fn=<NegBackward>)
34 6912 tensor(299.0026, device='cuda:0', grad_fn=<NegBackward>)
35 6912 tensor(336.5276, device='cuda:0', grad_fn=<NegBackward>)
36 6912 tensor(304.2394, device='cuda:0', grad_fn=<NegBackward>)
37 6912 tensor(274.8873, device='cuda:0', grad_fn=<NegBackward>)
38 6912 tensor(328.6809, device='cuda:0', grad_fn=<NegBackward>)
Traceback (most recent call last):
logvar,mean,loss,out,logqz_x,logpz,logpx_z,z2 = loss_fn(model, data)
XXXXXXX line 286, in loss_fn
logpx_z=-torch.sum(BCE,[1,2],keepdim=False)
RuntimeError: CUDA error: device-side assert triggered
```

the function with the code of question is:

```
def loss_fn(model, data):
mean, logvar = model.encode(data)
z2=model.reparm(mean,logvar)
out=model.decode(z2)
criterion = torch.nn.BCELoss(size_average=False,reduce=False, reduction='sum')
#criterion = torch.nn.BCEWithLogitsLoss(size_average=False,reduce=False, reduction='sum')
BCE=criterion(out,data)
logpx_z=-torch.sum(BCE,[1,2],keepdim=False)
#logpx_z=-torch.sum(BCE,2,keepdim=True)
logpz=log_normal_pdf(z2,torch.tensor(0.),torch.tensor(0.))
logqz_x=log_normal_pdf(z2, mean, logvar)
mean=logpx_z+logpz-logqz_x
loss=-torch.mean(mean)
return logvar,mean,loss,out,logqz_x,logpz,logpx_z,z2
```

I really have no idea why this should be failing because when i look at the output for logpz in the past there doesn’t seem to be anything strange.

The model is running on the cpu:

```
warnings.warn(warning.format(ret))
1 6912 tensor(627.7763, grad_fn=<NegBackward>)
```

regards,

chaslie