Hello,
I am using this in my training function:
for epoch in range(num_epochs):
train_mean_loss = 0
train_mean_acc = 0
rand_var = 0
for i, (train_input, train_label) in enumerate(train_dataloader):
if(train_input.device != device_available):
print("Train Data wasn't on cuda but now is.")
train_input = train_input.to(device_available)
train_label = train_label.to(device_available)
.
.
.
When I run the training loop second time (in case of any error), I get the illegal memory access error.
RuntimeError: CUDA error: an illegal memory access was encountered
I am confused about these things:
- Does cuda throw an error if you try to push a tensor to cuda if itt’s already on cuda?
- I faced the same issue with my model (I had to factory reset runtime to get it running), I first created an object for my model class and then pushed it to cuda but then I changed something in my model and tried to push it again on cuda, I got the same illegal memory access error.
I am using google colab and a big dataset so, it’s very difficult to debug when I have to factory reset my runtime everytime if I get error during training.