Atomic operations with PackedAccessor32 in cuda kernel?

I have a cuda kernel for raycasting where one of the dimensions of the grid involves comparing a computed minimum to the “current minimum” value, and replacing it if smaller.

The cuda kernel is blazing fast, and I’m really happy it works but I realized there is a race condition in there, and it is evident in the graphic output. I’m wondering if PyTorch has any atomic operations I could use on my PackedAccessor32 object?

I would assume you could directly use atomicMin as it’s also used e.g. in ATen/native/transformers/cuda/mem_eff_attention/attention_scaling_coefs_updater.h.

Thank you @ptrblck, I will try that out soon and see how it goes!