I’m trying to optimize the inference latency, but found that there’s a strange overhead happens in one single conv layer. Even though it still fits y=kx+b
relationship, but the b
is unacceptable large. How should I reduce such b
? Thanks in advance!
The profiling result goes as follows.
[filter, filter, input_channels, output_channels] | Latency |
---|---|
[3, 3, 16, 8] | 0.024658 |
[3, 3, 16, 16] | 0.032011 |
[3, 3, 16, 32] | 0.031948 |
[3, 3, 16, 64] | 0.037025 |
[3, 3, 16, 128] | 0.049538 |
[3, 3, 16, 256] | 0.062251 |
[3, 3, 16, 512] | 0.105888 |
The code for profiling goes as follows:
for i in range(7):
shape = [3,3,16,2**(i+3)]
kernel_value = np.random.rand(shape[0],shape[1],shape[2],shape[3]).astype(np.float32)
kernel = torch.as_tensor(np.transpose(kernel_value, (3,2,0,1)))
input_value = np.random.rand(1,16,32,32).astype(np.float32)
x = torch.as_tensor(input_value)
before = datetime.datetime.now()
for j in range(100):
if j==50:
before = datetime.datetime.now()
tmp = F.conv2d(x, weight=kernel,bias=None,stride=1,padding=(3-1)//2)
after = datetime.datetime.now()
interval = after-before
print(str(shape)+"\t"+str(get_seconds(interval)))