Hi,
When I feed an input with batch size N > 65535 into Conv3d layers, the output after index 65535 is obviously incorrect. This happens only on P100 GPU, not on CPU or other GPU. Possibly it also happens on P4 GPU according to my previous tests, but now I don’t have P4 GPU in hand to test again. It happens only in Conv3d layer, not in Conv1d or Conv2d.
code, run in colab
import torch
from torch import nn
net = nn.Conv3d(1, 1, 1, bias=False)
input = torch.rand(70000, 1, 2, 2, 2)
out_cpu = net(input).cuda()
net.cuda()
out_gpu = net(input.cuda())
error = torch.sum((out_cpu - out_gpu).detach()**2, dim=(1,2,3,4))
print(error[65500:65600])
The output on P100 is
tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.2917, 0.4172, 0.7399, 0.4553, 0.3345, 0.8831, 0.3979, 0.5914, 0.5503,
0.5707, 0.5280, 0.7113, 0.5516, 0.8872, 0.6879, 0.4335, 0.7914, 0.4365,
0.2578, 0.2922, 0.1646, 0.7618, 0.5094, 0.3610, 0.6823, 0.8531, 0.6192,
0.3508, 0.2554, 0.9788, 0.3178, 0.6107, 0.2074, 1.0488, 0.2410, 0.1997,
0.8527, 0.5012, 0.3539, 0.6145, 0.4775, 0.5919, 0.7322, 0.6376, 0.5392,
0.6394, 0.5922, 0.6976, 0.4430, 0.4933, 0.5123, 0.3211, 0.2196, 0.6387,
0.2673, 0.1693, 0.2910, 0.4832, 0.3100, 0.4031, 0.3633, 0.5821, 0.4544,
0.2899], device='cuda:0')