I was implementing low-rank approximation for reducing the number of filters (https://arxiv.org/pdf/1405.3866.pdf).
below is simple code to test the performance
part 1
import torch
import time
import numpy as np
img = torch.autograd.Variable(torch.rand(1, 3, 416, 416))
filters = torch.autograd.Variable(torch.rand(64, 3, 5, 5))
avg = 0
for x in xrange(10):
num_ops = 0
st = time.clock()
conv = torch.nn.functional.conv2d(img, filters, padding=1)
num_ops = np.prod(conv.size())*np.prod(filters.size())
# print result.size()
avg += time.clock() - st
print avg/10.0, 'Average conv operation time'
print 'Number of operation', num_ops
Part 2
print "================================================================="
filters = torch.autograd.Variable(torch.rand(8, 3, 5, 5))
A = torch.autograd.Variable(torch.rand(64, 8))
avg = 0
core_avg = 0
for x in xrange(10):
num_ops = 0
st = time.clock()
core_st = time.clock()
conv = torch.nn.functional.conv2d(img, filters, padding=1)
num_ops = np.prod(conv.size())*np.prod(filters.size())
core_avg += time.clock() - core_st
conv = conv.view(8, -1)
core_st = time.clock()
num_ops += np.prod(conv.size()) * A.size()[0]
conv = torch.mm(A, conv)
core_avg += time.clock() - core_st
# print conv.view(result.size()).size()
avg += time.clock() - st
print 'Number of reduced operation', num_ops
print avg/10.0, 'Average reduced operation time'
print core_avg/10.0, 'Average reduced core operation time'
output
0.0859299 Average conv operation time
Number of operation 52652851200
=================================================================
Number of reduced operation 910455552
0.0904023 Average reduced operation time
0.0897171 Average reduced core operation time
part 1 does the simple convolution and part 2 does low-rank approximation.
I am seeing that number of operation is less for low-rank approximation, but there is no reduction in time take. What is wrong in my implementation?
Is conv2d is well optimized than matrix multiplication?