Hi @ptrblck, I think I find out where went wrong. I set torch.backends.cudnn.baenchmark = True
. This way, the cudnn will look for the optimal algorithm for that particular configuration. However, in my codes, the network are varying in each iteration so it will lead to this problem. The reason why fixed width is OK is that it can find the optimal algorithm for all 4 widths after some iterations. Details can be refered here
Thanks a lot for your help!