I’m trying to do a very simple FC Neural network. I have 2 GPUs on my machine, I’ve followed this tutorial to make the code use 2 GPUs: Data Parallel
For some reason, I’m unable to train the model, the error I keep getting is
RuntimeError: cudnn RNN backward can only be called in training mode
The solution seemed very trivial, to set the model to training mode before forward
call, but that doesn’t fix the issue at all. I tried many different ways to actually set the model to train mode, but none worked.
Here’s my code:
Class
class ABC(nn.Module):
def __init__(self, inp_dim_size, hid_dim_size, out_size):
super(ABC, self).__init__()
self.inp_dim_size = inp_dim_size
self.hid_dim_size = hid_dim_size
self.out_size = out_size
self.seq_layer = nn.Sequential(
nn.Linear( self.inp_dim_size, self.hid_dim_size ),
nn.ELU(),
nn.Dropout(0.4),
nn.Linear(self.hid_dim_size, self.hid_dim_size // 2 ),
nn.ELU(),
nn.Dropout(0.4),
nn.Linear(self.hid_dim_size // 2, self.hid_dim_size // 2),
nn.ELU(),
nn.Dropout(0.3),
nn.Linear(self.hid_dim_size // 2, self.out_size)
)
def forward(self, X_batch):
output_scores = self.seq_layer(X_batch)
return output_scores
Train code
:
for epoch in range(num_epochs): # loop over the dataset multiple times
rl, ns = 0.0, 0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
inputs = inputs.to(device)
scores = labels.to(device)
br, _, _ = scores.shape
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
predictions = model(inputs)
loss = criterion(predictions,scores.view(br, 1))
loss.backward(retain_graph = True)
optimizer.step()
...
model = ABC(103, 51, 1)
if torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
model.to(device)
model = model.train()
else:
model = model.to(device)
model = model.train()
It’s been pretty frustrating trying to solve seemingly easy issue without any results. Any inputs will be highly appreciated. TIA !