Invalid argument error with 2080Ti and cuda10, this removed but along with other error

I train my network and raise the error: THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument.

Enviroment:

  • docker: yes
  • cuda: 10.0
  • python: 3.7
  • pytorch: 1.0
  • cudnn: 7

simple easy example:
import torch
from torchvision.models import vgg16
model = vgg16().cuda()
x = torch.zeros((32, 3, 227, 227)).cuda()
model(x)

out:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
tensor([[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
…,
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.]], device=‘cuda:0’,
grad_fn=)

I found that after this error, if i run it the second time, no error raise, and output the right answer.

model(x)
tensor([[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
…,
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.]], device=‘cuda:0’,
grad_fn=)

so in my code, if i add an simple example to the beginning of my code, the error raised, and then ignore it. after that my code runs normally to train.
crazy…:joy_cat::joy_cat:
but it works!!!
Anyone know the reason???

but then comes the evil…
I found that in training it works normal, but when i run with 'with torch.no_grad(): ', with the same net, same weights, same dataset, the output of conv are all nan.
Actually I don’t know if these issues are related… it’s also possible that i hit two…
anyone comes across the same issue?

Test on a different GPU, same docker image, and same code, works all fine.