I am trying to run a script in a Google Colab Notebook where I am using CUDA. However, I am running into the following error when I am trying to initialize my neural ODE network:
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Here is where the error is occurring:
# Some code above
# Set data type to doubles
torch.set_default_tensor_type(torch.DoubleTensor)
# Set the device
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# Some code below
class NeuralModel(nn.Module):
"""
A simple neural ODE with nlayers fully connected internal and ninternal internal variables
The network should account for (stress, internal_state, strain_rate, T) = 3 + nstate inputs
and have (stress, internal_state) = 1 + nstate outputs
Args:
w_in: (n_features, n_inputs) size tensor containing the weights for the input layer
b_in: (n_features,) size tensor containing the biases for the input layer
w_hid: (n_features, n_features, n_layers) size tensor containing the weights for each of n_layers hidden layers
b_hid: (n_features, n_layers) size tensor containing the biases for each of n_layers hidden layers
w_out: (n_outputs, n_features) size tensor containing the weights for the output layer
b_out: (n_outputs,) size tensor containing the biases for the output layer
activation (optional): the activation function to use in the hidden layers; default is ReLU
out_activation (optional): the activation function to use in the output layer; default is Sigmoid
"""
def __init__(self, w_in, b_in, w_hid, b_hid, w_out, b_out, erate, T, time, activation = nn.Sigmoid()):
super().__init__()
self.w_in = w_in
self.b_in = b_in
self.w_hid = w_hid
self.b_hid = b_hid
self.w_out = w_out
self.b_out = b_out
self.activation = activation
# Check that the number of output features is exactly 2 less than the number of input features
if self.w_in.shape[1] - self.w_out.shape[0] != 2:
raise ValueError("The number of input features must be exactly 2 greater than the number of output features")
self.model = self.network_factory()
self.initialize_weights()
self.model.nsize = self.w_out.shape[0]
self.d0 = torch.zeros((1000,)).to(device)
self.force1_interp = utility.ArbitraryBatchTimeSeriesInterpolator(time, erate)
self.force2_interp = utility.ArbitraryBatchTimeSeriesInterpolator(time, T)
def network_factory(self):
'''
Simple factory function to create the network
'''
layers = []
layers.append(nn.Linear(self.w_in.shape[1], self.w_in.shape[0]))
layers.append(self.activation)
for i in range(self.w_hid.shape[2]):
layers.append(nn.Linear(self.w_hid.shape[1], self.w_hid.shape[0]))
layers.append(self.activation)
layers.append(nn.Linear(self.w_out.shape[1], self.w_out.shape[0]))
return nn.Sequential(*layers)
# Redacted some other code
Specifically, the error is occurring points to the line self.d0 = torch.zeros((1000,)).to(device)
. I have also initialized my device to ‘cuda:0’.
- This is the first time I am encountering this error, so what is the issue here?
- How can I resolve this issue?
- How can I prevent this issue from occurring again in the future?
Thanks a lot, and I appreciate the help.