I am trying to run a script in a Google Colab Notebook where I am using CUDA. However, I am running into the following error when I am trying to initialize my neural ODE network:
RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Here is where the error is occurring:
# Some code above # Set data type to doubles torch.set_default_tensor_type(torch.DoubleTensor) # Set the device device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') # Some code below class NeuralModel(nn.Module): """ A simple neural ODE with nlayers fully connected internal and ninternal internal variables The network should account for (stress, internal_state, strain_rate, T) = 3 + nstate inputs and have (stress, internal_state) = 1 + nstate outputs Args: w_in: (n_features, n_inputs) size tensor containing the weights for the input layer b_in: (n_features,) size tensor containing the biases for the input layer w_hid: (n_features, n_features, n_layers) size tensor containing the weights for each of n_layers hidden layers b_hid: (n_features, n_layers) size tensor containing the biases for each of n_layers hidden layers w_out: (n_outputs, n_features) size tensor containing the weights for the output layer b_out: (n_outputs,) size tensor containing the biases for the output layer activation (optional): the activation function to use in the hidden layers; default is ReLU out_activation (optional): the activation function to use in the output layer; default is Sigmoid """ def __init__(self, w_in, b_in, w_hid, b_hid, w_out, b_out, erate, T, time, activation = nn.Sigmoid()): super().__init__() self.w_in = w_in self.b_in = b_in self.w_hid = w_hid self.b_hid = b_hid self.w_out = w_out self.b_out = b_out self.activation = activation # Check that the number of output features is exactly 2 less than the number of input features if self.w_in.shape - self.w_out.shape != 2: raise ValueError("The number of input features must be exactly 2 greater than the number of output features") self.model = self.network_factory() self.initialize_weights() self.model.nsize = self.w_out.shape self.d0 = torch.zeros((1000,)).to(device) self.force1_interp = utility.ArbitraryBatchTimeSeriesInterpolator(time, erate) self.force2_interp = utility.ArbitraryBatchTimeSeriesInterpolator(time, T) def network_factory(self): ''' Simple factory function to create the network ''' layers =  layers.append(nn.Linear(self.w_in.shape, self.w_in.shape)) layers.append(self.activation) for i in range(self.w_hid.shape): layers.append(nn.Linear(self.w_hid.shape, self.w_hid.shape)) layers.append(self.activation) layers.append(nn.Linear(self.w_out.shape, self.w_out.shape)) return nn.Sequential(*layers) # Redacted some other code
Specifically, the error is occurring points to the line
self.d0 = torch.zeros((1000,)).to(device). I have also initialized my device to ‘cuda:0’.
- This is the first time I am encountering this error, so what is the issue here?
- How can I resolve this issue?
- How can I prevent this issue from occurring again in the future?
Thanks a lot, and I appreciate the help.