Create nn.Module directly on device

Is it possible to create a module directly on a specific cuda device?
Normally I create mod=Linear(in_channels, out_channels) and get a module on cpu. Then I have to push a module to a cuda device with mod.to(“cuda:0”). How would I create a linear layer directly on a cuda device without having to transfer tensors through a cuda stream? In my case I do not need this module on cpu at all and would like to save time by directly instantiating my module on a specific device.
In general the task is to create a module that is dummy and does not store parameter tensors. Instead my dummy module will re-allocate its parameter tensors in forward() and then discard them. This will allow me to profile module’s execution wall time if my total model does not fit GPU RAM.
Thanks!

I don’t know, if there is a proper method for built-in nn.Modules, but if you are concerned about profiling or debugging, you could create a custom module and create the parameters directly on the device.

Thanks for the reply! Yes, seems that I need to copy-paste all the modules like Linear to my code and add a constructor argument what device to create module’s parameters on. Not as elegant as I wish it were :slight_smile: