"invalid configuration" error when I try to run on GPU

My code works fine on the CPU, but when I move everything to GPU and try to run it, I get an error on this line:
print(ctx["input"])

The shape of ctx["input"] is torch.Size([1, 50]) and its dtype is torch.float32.

The error says: “RuntimeError: CUDA error: invalid configuration argument”.

Actually, the error appears whenever I try to do anything with ctx["input"].

How can I fix this?

This is the entire traceback: dpaste/qDRq (Python)

Could you rerun your code via CUDA_LAUNCH_BLOCKING=1 python script.py args and post the stack trace here?
The error points to a wrong kernel launch config. Are you using any custom CUDA code in your application? If not, which PyTorch and CUDA version as well as GPU are you using?

1 Like

If I’m in Google Colab, can I just put !CUDA_LAUNCH_BLOCKING=1 at the beginning of the cell?

I think the root of the problem is in this module:

class MessageMaker(torch.nn.Module):
    def __init__(self, embed_size, hidden_size):
        super().__init__()
        
        node_edge_concat_size = embed_size * 2
        self.rnn = torch.nn.LSTM(node_edge_concat_size, hidden_size)

    def forward(self, edges):
        node_reps, edge_reps = edges.src["rep"], edges.data["rep"]
        inputs = torch.cat([node_reps, edge_reps], 1)
        inputs = inputs.unsqueeze(0) #Since we're processing only 1 seq element at a time.
        rnn_state = edges.src["sum_incoming_hidden_and_cell"]
        hiddens, cells = rnn_state[:, 0].contiguous(), rnn_state[:, 1].contiguous()
        hiddens, cells = hiddens.unsqueeze(0), cells.unsqueeze(0) #Since it's only 1 layer+direction.
        outputs, (updated_hiddens, updated_cells) = self.rnn(inputs, (hiddens, cells))
        updated_hiddens, updated_cells = updated_hiddens.squeeze(0), updated_cells.squeeze(0)
        updated_rnn_state = torch.stack([updated_hiddens, updated_cells], 1)
        new_features = { "hidden_and_cell": updated_rnn_state }

        return new_features

After moving to GPU I got an error “rnn: hx is not contiguous”, so I added .contiguous() to the 5th line of forward() to fix it. Could the CUDA error have something to do with that?

It could potentially be related. Could you post an executable code snippet, which raises the initial error using this module?

It could work, if you set this environment variable before importing any other library, which might initialize the CUDA context. As it’s often not straightforward to use it properly in a Jupyter notebook, I usually recommend to run it in a terminal instead.

1 Like

I don’t have a personal GPU so I have to use a Colab notebook. Will try setting it at the beginning.