I would like to feed through this input through a neural network.

The neural network as (num_features, num_observations) shape input and (num_outputs) outputs, giving me (num_samples, num_symbols, num_outputs) when I apply along

def apply_along_axis(function, x, axis: int = 0):
return torch.stack([
function(x_i) for x_i in torch.unbind(x, dim=axis)
], dim=axis)

This is an operation that is made very often in my network, and I just discover that 42% of the time is spent in the apply_along_axis.

I wanted to know if there is better way to do this ? (the for loop seems to be quite slow)

Is there a possibility to create identical versions of the same â€śfunctionâ€ť in the network and update all the weights automatically when grad is invoked ?

(I maybe have a similar question with more complex situation if there is good suggestion related to this issue)

I do not know of any functionality built into pytorch similar to your apply_along_axis(). And even if there were, it would still impose
a performance penalty, as it would still be breaking up what might
have been a single, larger tensor operation into many smaller, axis-wise
tensor operations (absent some kind of hypothetical JIT compilation).

As a general rule, if you find yourself looping over a tensor, you should
see if you can recast your computation into pure (that is, loop-free)
tensor operations. Sometimes you can and sometimes you canâ€™t.

Note that you can sometimes realize a net performance gain by getting
rid of loops, even if your loop-free approach is of higher computational
complexity.

Here is a simplistic, contrived example that illustrates replacing apply_along_axis() with a single pytorch tensor operation:

You suggest that your use case involves having the function you apply
be an entire neural-network model.

Although various model layers do have constraints on the shapes they
expect, the basic building blocks accept (and sometimes require) an
arbitrary batch dimension. This suggests that reworking your model so
that you donâ€™t need apply_along_axis could be plausible.

Two building-block examples: Linear accepts an arbitrary number
of leading â€śbatchâ€ť dimensions, so thatâ€™s likely to be easy. On the other
hand, Conv2d requires a tensor of exactly four dimensions, but itâ€™s
leading dimension is an arbitrary batch dimension, so you can use view() (or reshape()) to repackaged multiple â€śbatchâ€ť dimensions
into a single batch dimension. Thus:

Whether or not you would be able to push these kinds of techniques
through an entire model will depend on the modelâ€™s details, but there
are certainly some realistic models where you could.