Cublas Runtime Error in C++ PyTorch code

I am trying to port ResNet to PyTorch for C++ and train it on MNIST. The training code I have used is taken from the PyTorch example linked in the documentation.

While running my code I receive one of two errors when calling loss.backward(). My understanding is that these errors are probably red-herrings and that there is some underlying error in my network that I haven’t discovered.

When I run my code directly I receive:

terminate called after throwing an instance of 'std::runtime_error'
  what():  cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:258

When I step through my code with a debugger I receive:

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=24 error=59 : device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
  what():  cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:24

To debug this it’s recommended that I use torch.cuda.synchronize() to avoid the error being deferred to later operations.

My question is: How can I call torch.cuda.synchronize() from C++ for PyTorch? I see the underlying call here but don’t know how to call it from C++.

If you’re interested in a repro, I have made the code available at: https://gist.github.com/JoshVarty/143aa35c0efc25d29d18ac523fbb597c

It’s possible I’m doing things very obviously wrong as I’m not super familiar with PyTorch or C++.

Use envvar CUDA_LAUNCH_BLOCKING=1 when you start the program