I am training a Sequence to Sequence LSTM model. The problem is that during the training loop, the loss calculation is complaining with the error: “Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!”.
So it seems like the computing the “score” from the model generates a tensor on the CPU instead of the GPU. I can fix this by just adjusting the line scores = model(data, targets)
to scores = model(data, targets).to(device)
, but that seems like an unnecessary passing of a tensor back and forth from the GPU to the CPU and then back to the GPU.
model = Seq2SeqPF(encoder_net, decoder_net).to(device)
load_from_checkpoint = False
if load_from_checkpoint:
load_checkpoint(torch.load(os.path.join(CHECKPOINT_DIRECTORY, CHECKPOINT_NAME)),model, device)
for epoch in range(EPOCHS):
print(f"Epoch: {epoch + 1}/{EPOCHS}")
kbar = pkbar.Kbar(target=batches_per_epoch, width=8)
if epoch % 5 == 0:
checkpoint = {'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict()}
save_checkpoint(checkpoint,
CHECKPOINT_DIRECTORY,
CHECKPOINT_NAME)
for batch_idx, (data, targets) in enumerate(train_loader):
data = data.to(device=device)
targets = targets.to(device=device)
# forward pass and compute error
scores = model(data, targets)
loss = criterion(scores, targets) # <---GENERATES ERROR ABOUT CPU AND GPU
# backward pass and apply gradients.
optimizer.zero_grad()
loss.backward()
# gradient descent step
optimizer.step()
kbar.update(batch_idx, values=[("loss", loss)])
The code seems standard. I also did push the model itself to the GPU device, and the encoder_net
, and decoder_net
are also network layers that are pushed to the GPU before the model code is pushed to the GPU.
Any suggestions on the right way to handle this?