Strange bug about CUDA illegal memory access error

When I use cropped data(256x512) to train my module, it caused the error like this


But when I ues a larger cropped data(372x512), this error disappeared.
It seems that smaller data will cause this error. Could this be a bug?

Could you update PyTorch to the latest nightly release and check if you are still hitting this issue (in case you are using an older release).
If so, could you post a minimal, executable code snippet to reproduce the issue as well as the output of python -m torch.utils.collect_env, please?