Autograd's slowdown using multiple weights' matrices on F.linear

Hi to everyone,

I’m new with PyTorch and I’m having some troubles that I can’t resolve.
I’m realizing a task where, for a given batch of data X, I have to make linear transformations using different weights’ matrices.

This example can clearly explain what I’m doing:

X = torch.rand(5, 10)
W = torch.rand(5, 6, 10)
output = torch.Tensor(5, 6)
for i in range(5):
    output[i] = F.linear(X[i], W[i], None)
# ...
# loss computation and backward
# ...

But via torch.utils.bottleneck I’ve seen that using this approach, once I’ve computed the loss, the backward function is very slow:

--------------------------------------------------------------------------------
  cProfile output
--------------------------------------------------------------------------------
         3597927 function calls (3587509 primitive calls) in 41.697 seconds

   Ordered by: internal time
   List reduced from 4237 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   36.319   36.319   36.319   36.319 {method 'run_backward' of 'torch._C._EngineBase' objects}
        1    1.183    1.183    2.032    2.032 .\utilsGSCN.py:447(get_neighborhoods)
        1    0.724    0.724    1.722    1.722 .\utilsGSCN.py:250(forward)
    32768    0.444    0.000    0.444    0.000 {method 'matmul' of 'torch._C._TensorBase' objects}
    32774    0.272    0.000    0.899    0.000 C:\Python\Python386\lib\site-packages\torch\nn\functional.py:1669(linear)
   129540    0.268    0.000    0.300    0.000 C:\Python\Python386\lib\site-packages\networkx\classes\graph.py:820(add_edge)
   259080    0.264    0.000    0.264    0.000 {method 'item' of 'torch._C._TensorBase' objects}
        9    0.175    0.019    0.175    0.019 {method 'array_from_header' of 'scipy.io.matlab.mio5_utils.VarReader5' objects}
     4793    0.143    0.000    0.143    0.000 {built-in method nt.stat}
    32774    0.140    0.000    0.140    0.000 {method 't' of 'torch._C._TensorBase' objects}
        4    0.101    0.025    0.133    0.033 C:\Python\Python386\lib\site-packages\networkx\classes\graph.py:513(add_nodes_from)
        3    0.099    0.033    0.132    0.044 C:\Python\Python386\lib\site-packages\networkx\classes\graph.py:884(add_edges_from)
      679    0.094    0.000    0.094    0.000 {built-in method io.open_code}
        3    0.075    0.025    0.075    0.025 {method 'unbind' of 'torch._C._TensorBase' objects}
   145924    0.075    0.000    0.114    0.000 C:\Python\Python386\lib\site-packages\networkx\algorithms\traversal\breadth_first_search.py:14(generic_bfs_edges)

I suppose that the for loop I use affects the size of the Autograd’s graph (whose conseguence is the backward function’s slowdown), so I wanted to ask you: there exist a way to realize this kind of operation in a faster way, without the using of the for loop (using only one call to the F.linear function)?

Thanks in advance!

Hi,

Sure you can use matmul to do batch of matrix matrix multiplications:

import torch
from torch.nn import functional as F
X = torch.rand(5, 10)
W = torch.rand(5, 6, 10)
output = torch.Tensor(5, 6)
for i in range(5):
    output[i] = F.linear(X[i], W[i], None)

fast_output =  torch.matmul(X.unsqueeze(1), W.transpose(-1, -2)).squeeze(1)
print((output - fast_output).abs().max())
2 Likes

It worked!!!

Thanks a lot! :slight_smile: