Why weight parameters in model are not moved to device when done model.to(device)

Hi,
I initialized a model weight and bias parameter in init to learn while training but during training these weights are not moved to device cuda on performing .to(device) while the rest of the parameters moved. Please see sample code snipped

class myModule(nn.Module):
	def __init__(self,in_dim,hid_dim,out_dim):
		super(myModule,self).__init__()
		self.encoder=nn.Sequential(
			nn.Linear(in_dim,hid_dim),
            nn.ReLU(),
            nn.Linear(hid_dim,hid_dim//2),
			nn.BatchNorm1d(hid_dim//2),
			nn.ReLU(),
			nn.Dropout(p=0.2),
            nn.Linear(hid_dim//2,hid_dim//3),
			nn.BatchNorm1d(hid_dim//3),
			nn.ReLU(),
			nn.Dropout(p=0.2),
			nn.Linear(hid_dim//3,out_dim)
			)
		# self.linear=nn.Linear(out_dim,1,bias=False)
		self.S=torch.randn((out_dim,out_dim),requires_grad=True)
		self.b=torch.randn(1,requires_grad=True)

	def forward(self,inp1,inp2):
		x=self.encoder(inp1)
		y=self.encoder(inp2)
		# out=torch.pow((out1-out2),2)
		out=torch.matmul(x,y.T).diag() - torch.matmul(torch.matmul(x,self.S),x.T).diag()-torch.matmul(torch.matmul(y,self.S),y.T).diag()+self.b
		return out

In the above module, I am getting error at out = torch.matmul line saying self.S and self.b are not on device cude.

Shouldn’t doing model.to(device) transfer all model parameters to cuda?
Thanks for your help

You need to either wrap them in nn.Parameter (for learnable) or use self.register_buffer to let the model know that they are part of its state.

Best regards

Thomas

1 Like

Thanks Tom, for your suggestion. updated as

		S=torch.randn((out_dim,out_dim),requires_grad=True)
		self.S = nn.Parameter(S,requires_grad=True)

		b=torch.randn(1,requires_grad=True)
		self.b = nn.Parameter(b,requires_grad=True)

It is working but performance is very slow on Tesla K80 (colab) not even faster than running of quadcore CPU(mac pro), any reason you can think of?

Hard to say beyond the obvious make your model larger / increase batch size.

1 Like