Suppose I have a tensor like [[1,2,3],[4,5,6],[7,8,9]]. I want to call a certain method for each tensor in it to process it. Is there a generic way to accomplish this?

I have defined a new “convolution” method, where the convolution kernel is moved over the image to get the features. But the computation is so complex that I couldn’t find a suitable native implementation of Pytorch and had to divide the image into small pieces and compute them one by one.

The usual way is to use a for loop, but the performance is low. Is it possible to parallelize this process? Compute multiple regions at the same time. GPUs are said to be good at parallelism …