Efficient indexing of tensors

This is more for my own curiousity and “ocd” than anything! We’re doing some heatmap regression, a part of which is non-maximal suppression of the predicted output. We have numerous instances, so can’t just apply .max() to the tensor. We also have volumetric data, so efficiency is a concern, though pytorch seems pretty quick :slight_smile:

I’m porting the code from lua torch, there I used tensor.data() to obtain an FFI pointer, and it was extremely fast. My current pytorch implementation uses the underlying storage, I just wondered if this is the most efficient way of doing it? Within reason, it’s pretty quick already so I probably wouldn’t be looking to write a lot of C code!