Accessors versus TensorIterators

What is the difference between Accessors and TensorIterators? And what is the use case of each?
I have read about TensorItreators in these:

And about accessors and packed accessors towards the end here:

Basically, if I want to loop over a tensor elements and carry out a simple computation which one should I use? does the use case of (Accessors(CPU Tensors)/PackAccesors (CUDA Tensors) VS TensorIterators) depend on the device of the tensor?

The TensorIterator can be used to execute e.g. elementwise kernels as seen in the first link while accessors are used to index data inside a kernel from the tensor instead of using the pointer with strides and indices.

Is there any preference over which one better or faster?! Is the speed of using an accessor the same? Also When one should use TensorIterator over a cuda tensor instead of defining a cuda kernel from scratch?

These objects are used for different use cases as already mentioned.
If you want to write an elementwise kernel you could use the TensorIterator and allow it to iterate all elements of your tensor. On the other hand, if you want to index a tensor manually and apply any operation the accessor can be used.
I don’t think you would see a huge difference between using the accessor vs. manually indexing the pointer.

specifically, I would like to reimplement scipy.ndimage.find_objects using pytorch, only on the cpu at the moment, since it is the basic block for cellular analysis in biomedical imaging: Extract the cells from the image using an already built mask and then extract measurements from them.

They use a Numpy Iterator object in the base C code to iterate over all entries of the numpy array and record the smallest and largest index of each object in each direction/dimension in a numpy array pointer called regions, while knowing how many objects there is. The base C code can be found here:

And its numpy iterator struct is found here:

Do you recommend implementing it using an Accessor or a TensorIterator?
Also if I write the code in torch C++ API the best I can, Can you please help me correct/debug/improve it, and add it as a new function to torch C++ API?
I would really appreciate it since it will help me build a new torch api for biomedical imaging applications.

@ptrblck I already started a thread that is slowly coming to life: