I encounter this problem during training, which occurs in the first 400 iterations. I have tried to decrease the batch_size, but of no use. I also monitor the GPU memory during training, and it seems normal.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMathPointwise.cu line=124 error=9 : invalid configuration argument
Traceback (most recent call last):
File "train.py", line 669, in <module>
main()
File "train.py", line 665, in main
train(G_model, G_net, G_optimizer, G_pair_dataloader, G_unpair_dataloader, R_model, R_net, R_optimizer, R_pair_dataloader, R_unpair_dataloader)
File "train.py", line 565, in train
tmp_r_lips = generate_lg(G_model, tmp_g_txts, tmp_guide_imgs)
File "train.py", line 320, in generate_lg
G_imgs = G_net(guide_imgs, None, g_txts)
File "/home/WeicongChen/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/WeicongChen/codes/DualLip/LipGAN/model_G_att.py", line 322, in forward
text_z, text_h = self.text_encoder(text_inputs)
File "/home/WeicongChen/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/WeicongChen/codes/DualLip/LipGAN/model_G_att.py", line 77, in forward
hidden = torch.tanh(self.fc(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)))
RuntimeError: cuda runtime error (9) : invalid configuration argument at /pytorch/aten/src/THC/generic/THCTensorMathPointwise.cu:124