Why do we not need to write CUDA when defining new layers in Pytorch?

Most of speed efficient libraries like caffe and mxnet requires us to write CUDA when defining new layers. They do allow us to write in python too, but those layers will be slower.

Can I know if Pytorch has helped us to do the CUDA stuff or is it just like other libraries whereby the newly defined layers will be slower if written in Python, in which case Pytorch will be slow if we want to define new layers?

model.cuda() and all layers in the model will be executed on gpu.