Build a neural composer using RNN

@ptrblck That did the trick, thanks! Yes, I changed that earlier and the problem hasn’t appeared anymore.

  1. I’ve tried both MSELoss and BCEWithLogitsLoss which you suggested earlier as loss functions, but all the values seems to be more or less the same - do you have any suggestions on what other loss functions might work better for a multi-class domain like this?

  2. Since I need to output 0s (key not pressed) and 1s (key pressed), can I just set a threshold in the last output like this?

    outputs = torch.where(outputs > 0.5, torch.ones(1, requires_grad=True, device=device), torch.zeros(1, requires_grad=True, device=device))
    
  3. What is the correct way to train everything on the GPU?

    device = torch.device('cuda')     # Default CUDA device
    

I’ve followed the docs and loaded the model .to(device) and defined any new tensors using device (torch.randn(…, device=device)). However, the task manager shows that the GPU uses between 0-1 % under training. Even though it is indeed running faster than when I don’t include .to(device), the speed decreases as the training goes on - and the GPU usage drops to 0 %. Any idea why?

  1. nn.BCEWithLogitsLoss should work if you pass raw logits, i.e. without the sigmoid on the last layer. If your model is not training at all, try to overfit it on a small data sample (e.g. just 10 samples) and have a look, if the loss decreases to approx. zero.
    If that’s not the case, we would need to have a look at other possible bugs in the training procedure or model. However, if your model is learning using a small data sample, you could try to scale it up carefully and see, when the training breaks. Maybe your labels are highly imbalanced, so that the loss function would need some weighting.

  2. This code would work, if outputs was already passed through a sigmoid.
    It would be a bit easier to just use outputs = (outputs > 0.5).float().

  3. That should be correct. If some tensors are still on the CPU, you’ll fortunately get an error.
    I’m not sure, how reliable the Windows task manager is in showing the GPU activity, but maybe your code is bottlenecked by some preprocessing / data loading. Have a look at the ImageNet example to see how to time the data loading.
    Based on the code you’ve shared here, it looks like you are not using multiple workers in your DataLoader. You can use multiprocessing by setting it using num_workers.