How do modules like `nn.Conv2d` allocate memory?

I’m working throught the tutorials and I wanted to understand how nn.Conv2d is created. It looks like we can pass in a device param during the init function and I’ve checked that this param create a tensor in the GPU (rather than CPU by default). What I’m wondering is if I wanted to create an empty tensor instead of random values for the layer, how do I do that? It seems like by default random values are setup for that tensor.

A follow up question, If I use the load_state_dict method to load a model to a GPU allocated tensor, I’m assuming the method would just overwrite the values that were randomly setup?

PyTorch layers are first set to empty, and then call self.reset_parameters() to initialize them with a kaiming uniform distribution. See the init for ConvNd here: torch.nn.modules.conv — PyTorch 2.1 documentation

As to your second question, that is correct. The previous model values are overwritten by the saved state.

1 Like

To skip the initialization of parameters avoiding wasting compute as you will overwrite these, you could use torch.nn.utils.skip_init.