# Optimiser expects CPU when adding custom layer, runs fine without it

Hello, I understand most of the errors of the form “expected device cpu but got cuda” or the vice versa arise from not properly pushing tensors/models to proper devices. But, I seem to have run into something different. I am trying to add a custom layer to a simple GCN net.

I have checked all tensors and model params are on GPU, however, whenever i add the layer to the network the optimizer breaks with the following error:

File "driver_crf_gcn.py", line 35, in train
optimizer.step()
return func(*args, **kwargs)
File "/home/cmb-05/qbio/raktimmi/anaconda3/lib/python3.7/site-packages/torch/optim/adam.py", line 96, in step
RuntimeError: expected device cpu but got device cuda:0


I am quite helpless at this point as i am not super experienced, if anyone is able to let me know what’s happening it will be a great help.

Here’s what my custom layer looks like:

class CRFLayer(MessagePassing):
def __init__(self):
self.register_parameter('log_alpha',self.log_alpha)
self.register_parameter('log_beta',self.log_beta)
self.register_parameter('logsigmasq',self.logsigmasq)
self.sigmasq = torch.exp(self.logsigmasq).to(device)           # Using same process for > 0  constraint as \alpha and \beta for consistency
self.alpha = torch.exp(self.log_alpha).to(device)
self.beta = torch.exp(self.log_beta).to(device)
self.gij = None                                     # temporary memorisation of gij for each protein to avoid repeat computation
# will be of size (E x 1) when assigned in the first call to propagate

def forward(self, x, edge_index):                       # edge_index has shape [2, E]
b = x.clone()
x = self.propagate(edge_index, size=(x.size(0), x.size(0)), x=x, b=b)
self.gij = None                                     # set gij None for the next train graph
return x

def message(self, x_j, b_i, b_j):
''' For each (i,j) \in E, compute gij'''
if(self.gij == None):                               # size (E x 1)     ## Calculate g_ij only at the start
self.gij = torch.exp( torch.nn.functional.cosine_similarity(b_i, b_j, dim=-1)/(self.sigmasq)).to(device)
''' For each edge (i,j) \in E, copmute gij*xj'''
gijxj = self.gij.view(-1,1)*x_j                     # size (E x 2)      ## compute g_ijxj for each edge (i,j)
ret = torch.cat((self.gij.view(-1,1),gijxj),dim=1)  # size (E x 3) ## message (gij, gijxj) for each edge (i,j)
return ret

def update(self, aggr_out, b, x):                       # aggr_out has size (V x 3)
''' For each vertex i \in V, aggregated gij over  its neighbour j's, required in the denominator of the update equation'''
gij_aggregated_over_j = aggr_out[:,0]               # size (V x 1) # sum_{j \in N(i)}g_ij  \forall{i}
''' For each vertex i \in V, aggregated gij*xj over  its neighbour j's, required in the neumerator of the update equation'''
gijxj_aggregated_over_j = aggr_out[:,1:]            # size (V x 2) # sum_{j \in N(i)}g_ijxj \forall{i}

x = (self.alpha*b + self.beta*gijxj_aggregated_over_j)
x = x/((self.alpha + self.beta*gij_aggregated_over_j).view(-1,1) + EPS)
return x



Here’s the network, it runs perfect if I remove the new layer, also, it runs perfect on cpu with or without the layer.


class GCNNet(nn.Module):
def __init__(self, dataset):
super(GCNNet, self).__init__()
self.conv1 = GCNConv(dataset.num_features, 2)
self.crf = CRFLayer()
self.conv2 = GCNConv(2, dataset.num_classes)
self.crf_loss = CGNF_Loss()

def forward(self, data, y, A, E):
x, edge_index = data.x, data.edge_index
x = F.relu(self.conv1(x, edge_index))
x = F.dropout(x, training=self.training)
x = self.crf(x, edge_index)
x = self.conv2(x, edge_index)
x = F.log_softmax(x, dim=1)
loss = self.crf_loss(x,y,edge_index, E)
return x, loss



Ok. So, I found out the problem. It is the way I have been transforming the parameters at the init step of the custom layer. using the registered log_alpha, log_beta, logsigmasq params directly instead of exponentiating them in the init step solved the issue.