I am training a neural network that recursively applies a learned transformation on my input.
Since I do not want the backpropagation to propagate all the way back through the recursive application for performance reasons, I used something like this:
x = input
for i in range(applications):
output = transform(x)
x = output.detach()
Using this results in a error in Cuda:
cudaEventSynchronize in future::wait: device-side assert triggered
When I add the flag CUDA_LAUNCH_BLOCKING=1
(as adviced here), the error changes to:
after cudaLaunch in triple_chevron_launcher::launch(): device-side assert triggered
Without the detach, no error occurs. The error does not occur at a fixed time. The network can successfully train for many batches until this occurs.