I’ve made some quick tests with the following script:
import timeit
setup = """
import torch
import torch.nn.functional as F
device = "cuda"
x = torch.rand(200, 3, 200, device=device)
weight = torch.rand(256, 3, 3, device=device)
eps = 1e-5
"""
t1 = "F.conv1d(x, weight) + eps"
t2 = "F.conv1d(x, weight, torch.tensor(eps, device=device).expand(len(weight)))"
t3 = "F.conv1d(x, weight, torch.full((len(weight), ), eps, device=device))"
number = 100
print("%d ms" % round(1000 * timeit.timeit(stmt=t1, setup=setup, number=number)))
print("%d ms" % round(1000 * timeit.timeit(stmt=t2, setup=setup, number=number)))
print("%d ms" % round(1000 * timeit.timeit(stmt=t3, setup=setup, number=number)))
It turns out that the first convolution is the fastest on GPU, but the slowest on the CPU. Moreover, the second convolution throws a cuDNN error.
As for the 2nd question, I guess it calls contiguous()
, here’s a little snippet to confirm that:
import torch
import torch.nn.functional as F
device = "cuda"
x = torch.rand(200, 3, 200, device=device)
weight = torch.rand(256, 3, 3, device=device)
sample = torch.rand(512, device=device)
contiguous_bias = sample[::2].contiguous()
not_contiguous_bias = sample[::2]
res1 = F.conv1d(x, weight, contiguous_bias)
res2 = F.conv1d(x, weight, not_contiguous_bias)
res1.equal(res2) # Prints True
EDIT: I’m not sure I got the 2nd question right, the snippet above shows that even if you pass a non-contiguous bias to a convolution things are gonna work just fine. But if you are interested in a biasless convolution, I think your question will be more related to the summation kernel, which I think will make a contiguous call before running.