More CUDA extension examples

I practiced the usage of ATen and CUDA according to the official example and implemented a demo. However, this seems useless and a bit stupid. Are there more examples? I want to know more about ATen’s API and how to use cuBLAS.