Module in a forloop gives an RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

C_J · December 20, 2021, 12:51pm

I combined traditional UNET module + my own custom function in my forward model. For some reason, I can’t give data as a batch tensor, but I need to give one by one using forloop.

____________________________________________
optvars = [{'params': custom_var, 'lr':lr_custom}]
optimizer1 = optim.Adam(optvars)
optimizer2 = optim.Adam(UNET.parameters(), lr=lr_UNET)

for a in range(batch_size)
     input_data=batch[a,:,:]
     temp_data=custom_function(input_data, custom_var)
     output_data=UNET(temp_data)
     loss_total=loss_total + loss(output_data)

loss_total.backward()  #  <-----where the error occur
custom_var.retain_grad() # <----- I think I need something like this for  UNET parameters...
optimizer1.step()
optimizer2.step()
____________________________________

I get this error:
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

It works when batch_size=1, but gives an error when batch_size>=2.
Also, when I don’t use UNET module in the forloop (only using custom_function), it works.
I think it’s something to do with using module more than twice before backward propagation, but I’m not sure what’s the problem.
Can anyone help please?

Thanks!

ptrblck · December 20, 2021, 9:50pm

Which PyTorch version are you using with which CUDA/cuDNN releases?
This error might be raised, if no workspace can be allocated due to a large memory usage.

C_J · December 21, 2021, 6:07am

I’m using 1.9.1 + cu111
Now it does look like a memory shortage in allocation for sure, although the message doesn’t say so.