Hey there is there an efficient way to do a strided sum in pytorch? Particularly when the number of elements that fall under each stride is variable but specified? For example:

a = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
stride = torch.tensor([3, 3, 4])
result = torch.strided_sum(a, stride) (something like this)

Meaning, I want to sum the first 3 elements, the next 3 and the last 4 the resulting tensor would be

You can use scatter_add_ to accumulate these values to a new tensor.
Assuming you have already calculated the stride tensor, you would need to create the index tensor from it as seen here:

a = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
stride = torch.tensor([3, 3, 4])
idx = torch.tensor(sum([[i]*s for i, s in zip(range(stride.size(0)), stride)], []))
out = torch.zeros(stride.size(0), dtype=a.dtype).scatter_add_(0, idx, a)
print(out)
> tensor([ 6, 15, 34])

I’m running into an strange error with CUDA and Pytorch when I try to use torch_scatter.

RuntimeError: nvrtc: error: failed to open libnvrtc-builtins.so.11.1.
Make sure that libnvrtc-builtins.so.11.1 is installed correctly.
nvrtc compilation failed:

Would you happen to know why this is happening? I’m running pytorch 1.8.1 + cu111

This seems to be a known issue with the CUDA11.1 pip wheels as described here, so you might need to use the conda binaries, a source build, or the CUDA10.2 pip wheels.