My training code looks over complicated.
I searched for a good training example, but the heritage from the versions before <0.4 provides lot of noise for my search.
I created the example to explain what I mean:
# set the min-batch size, optimizer and the loss function
bs=512
opt = optim.Adam(m.parameters(), 0.0001)
loss_fn = nn.NLLLoss()
# set the number of epochs
num_epochs = 1000
# from data loader take the generator it
it = iter(dl)
# grap the first mini batch
mb, yt = next(it)
# train the model for num_epochs
for epoch in range(num_epochs):
    
    # all good when I still have the examples in dataloader, but
    # bs will be <512 at certain point doing the next() when this happens I will get the exception
    if(mb.shape[0]==bs):   #bs=512
        tup = torch.unbind(mb, dim=1) 
        # Forward pass to calculate the prediction   
        y_hat = m(*tup)
        # loss evaluation
        loss = loss_fn(y_hat, yt)
        # Backward and optimize
        opt.zero_grad()
        loss.backward()
        # update params
        opt.step()
        
        #next
        mb, yt = next(it)
        
    else:
        it = iter(dl)
        mb, yt = next(it)
    if (epoch+1) % 50 == 0:
        print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
This looks so complicated, since I need to pay attention on remaining size of the dataloader because next() will return at certain point less than bs examples from dataloader.
OK, I could set DataLoader suffle=True. In that case I could use mb, yt= next(iter(dl)) all the time.
Any feedback on training approach would be helpful for me at this moment.