Avoid manual allocation of device memory?

I am currently implementing my own CUDA extension for Pytorch. Now I am wondering whether I should avoid manually allocating memory on the GPU and instead use ATen for this, so Pytorch’s memory manager can effectively handle allocation?

I also appreciate if you can show me where to read up in Pytorch internals such as the memory management!

Best regards,