What is wrong with my training procedure

arian · May 7, 2020, 8:33am

Hi, I’m training to copy a model structure from Keras to Pytorch, but the problem is that there is no progress in learning. It’s like there is no actual training.

###################
# train the model #
###################
model.train()
steps_train = math.ceil(loaders['train_size']/batch_size)
print(f"****training stpes: {steps_train}*****")
for batch_idx in tqdm(range(steps_train)):
    data, true_map, true_binary = next(loaders['train'])
    # move to GPU
    if use_cuda:
        data, true_binary, true_map = data.cuda(), true_binary.cuda(), true_map.cuda()
    ## find the loss and update the model parameters accordingly
    # clear the gradients of all optimized variables
    optimizer.zero_grad()
    with torch.set_grad_enabled(True):
        # forward pass: compute predicted outputs by passing inputs to the model
        output_map, output_binary = model(data)
        # calculate the batch loss
        loss = criterion(output_binary, output_map, true_binary, true_map)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()

This is my custom loss function:

def my_loss(output_binary,output_map, true_binary, true_map):
    loss_binary = torch.mean((output_binary - true_binary)**2)
    loss_map = torch.mean((output_map - true_map)**2)
    return 0.5*loss_binary + 0.5*loss_map

optimizer = optim.Adam(denseNet.parameters())

Are this part of the code legit? or have I made a rookie mistake here?

iamgroot42 · May 7, 2020, 4:41pm

Is denseNet the same variable as model ? Looking at this snippet, it looks live you’re defining the optimizer with the parameters of denseNet, but calculating all losses and metrics on model, which will remain unchanged because of this error.

arian · May 9, 2020, 5:08am

Thanks for your response, you are right, but the reason for mismatch in names is that train process is in a function, and model is the name of local variable and in fact is same as denseNet.