I am training a neural network that recursively applies a learned transformation on my input.
Since I do not want the backpropagation to propagate all the way back through the recursive application for performance reasons, I used something like this:
x = input for i in range(applications): output = transform(x) x = output.detach()
Using this results in a error in Cuda:
cudaEventSynchronize in future::wait: device-side assert triggered
When I add the flag
CUDA_LAUNCH_BLOCKING=1 (as adviced here), the error changes to:
after cudaLaunch in triple_chevron_launcher::launch(): device-side assert triggered
Without the detach, no error occurs. The error does not occur at a fixed time. The network can successfully train for many batches until this occurs.