Torch.nn.functional.pad seems to have maximum input length

Hi!
To reproduce:

import torch
from torch.nn.functional import pad
pad(torch.randn(4, 65535, 20).cuda(), (0,0), 'reflect') # ok
pad(torch.randn(4, 65536, 20).cuda(), (0,0), 'reflect') # wrong

traceback
Traceback (most recent call last):
File “”, line 1, in
File “/opt/conda/lib/python3.9/site-packages/torch/nn/functional.py”, line 4369, in _pad
return torch._C._nn.reflection_pad1d(input, pad)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Env:
cuda 11.3
torch 1.11

Thanks for reporting the issue! I can reproduce it and it seems the kernel launch fails. We’ll take a look at it.