RuntimeError: cuda runtime error : device-side assert triggered while trying loss.backward()

I have the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-10-3370ce850bee> in <module>()
     87 
     88 # train the model
---> 89 model_scratch = train(30, loader_scratch, model_scratch, optimizer_scratch, criterion_scratch, use_cuda, '/data/model_scratch.pt')
     90 
     91 # load the model that got the best validation accuracy

<ipython-input-10-3370ce850bee> in train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path)
     38             loss = criterion(output, target)
     39             # backward pass: compute gradient of the loss with respect to model parameters
---> 40             loss.backward()
     41             # perform a single optimization step (parameter update)
     42             optimizer.step()

/opt/conda/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
     91                 products. Defaults to ``False``.
     92         """
---> 93         torch.autograd.backward(self, gradient, retain_graph, create_graph)
     94 
     95     def register_hook(self, hook):

/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     87     Variable._execution_engine.run_backward(
     88         tensors, grad_tensors, retain_graph, create_graph,
---> 89         allow_unreachable=True)  # allow_unreachable flag
     90 
     91 

/opt/conda/lib/python3.6/site-packages/torch/autograd/function.py in apply(self, *args)
     74 
     75     def apply(self, *args):
---> 76         return self._forward_cls.backward(self, *args)
     77 
     78 

/opt/conda/lib/python3.6/site-packages/torch/nn/_functions/dropout.py in backward(ctx, grad_output)
     47     def backward(ctx, grad_output):
     48         if ctx.p > 0 and ctx.train:
---> 49             return grad_output * ctx.noise, None, None, None
     50         else:
     51             return grad_output, None, None, None

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:331

My model is on cuda() and the data and target also.

        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU Y
            if use_cuda:
                data, target = data.to(device), target.to(device)
            # clear the gradients of all optimized variables
            optimizer.zero_grad()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            print("OUTPUT____", output)
            print("TARGET____", target)
            # calculate the batch loss
            criterion = nn.CrossEntropyLoss()
            loss = criterion(output, target)
            # backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()  #### HERE happens the error
            # perform a single optimization step (parameter update)
            optimizer.step()
            # update training loss
            train_loss += loss.item()*data.size(0)

The error happens by loss.backward().

output and target shape:

OUTPUT____ torch.Size([1, 10])
TARGET____ torch.Size([1])

Could you run the code on CPU and check the error message, as it might be a bit clearer?
If it’s running fine on the CPU, could you post the GPU you are using as well as the CUDA version?