Input Tensor created on GPU and not moving to CPU despite forcing using .cpu()

I trained a model on GPU with DataParallel.

I saved the best model to disk.

full_state = {'epoch': epoch, 'arch': args.arch, 'model_state_dict': model.state_dict(),
                    'best_val_acc1': best_model_val_acc1, 'optimizer_state_dict': optimizer.state_dict()}
torch.save(full_state, best_model_file_path)

After all epochs, I loaded the best model saved like above using the code below:

    if os.path.isfile(model_file):
        print("\n=> loading checkpoint '{}'".format(model_file))
        checkpoint = torch.load(model_file)
        model.load_state_dict(checkpoint['model_state_dict'])
        print("=> loaded checkpoint '{}' (epoch {})".format(model_file, checkpoint['epoch']))
    else:
        print("=> no checkpoint found at '{}'".format(model_file))
        return

I am passing mode=“cpu” to the code below.

Nevertheless, I am receiving the following error:

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

How can this be possible if I am using .cpu() to send my inputs to CPU?

I do not even understand why my input tensor is being loaded directly to GPU in the first place.

def compute_total_inference_time(model, val_loader, mode):

    total_inference_time = 0

    if mode == "cpu":
        model.cpu()
    elif mode == "gpu":
        model.cuda()

    # switch to evaluate mode
    model.eval()

    with torch.no_grad():

        for input_tensor, _ in tqdm(val_loader):

            if mode == "cpu":
                input_tensor = input_tensor.cpu()
            elif mode == "gpu":
                input_tensor = input_tensor.cuda()

            # compute output
            initial_time = time.time()
            _ = model(input_tensor)
            final_time = time.time()

            instance_inference_time = final_time - initial_time
            total_inference_time += instance_inference_time

    return total_inference_time

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

How can this be possible if I am using .cpu() to send my inputs to CPU?

I guess you saved the model with cuda tensors, so it will load it into the GPU if you use torch.load(), and when you are feeding it the “CPU”-tensors, the error is raised because the arrays are incompatible.

Basically, how the Torch’s load & save function work is that they don’t do any alterations to the model to keep it simple: CPU models are loaded into the CPU, GPU models are loaded into the GPU.

So in your case, there are 2 options if you want to do the inference on the CPU after training a model on the GPU:

  1. you can call model.cpu() before you save it. This way, it will be automatically loaded into CPU (this is especially useful if you want to load the model on hardware that doesn’t support CUDA later (e.g., most MacBooks)

  2. After you load the model, simply call model.cpu() and then proceed with the code you have.

1 Like

If @rasbt’s suggestions don’t help, could you try to assign the model to the CPU version?
I’m not sure if this is needed in 0.4.0 but might be worth a try.

You mean

model = model.cpu()

? That’s how I typically do it out of habit, but I somewhere read that it should be an in-place operation. The assignment certainly doesn’t hurt though.

Yeah, I mean this line of code.
I can’t currently test it and I know it was an in-place operation in the old versions.
Now I’m wondering, if it’s just a call to tensor.to, which would make the assignment necessary.

I’ve check it and you are right.
.cuda(), .cpu() and .to still seem to be in-place operations on nn.Modules.

model = nn.Linear(10, 1)
model.to('cuda')
print(model.weight.type())
> torch.cuda.FloatTensor

model.cpu()
print(model.weight.type())
> torch.FloatTensor

model.cuda()
print(model.weight.type())
> torch.cuda.FloatTensor

For tensors you would still need an assignment.
Sorry for the confusion.

I would still use the assignment, but that shouldn’t cause the error as you said.

1 Like

I removed the DataParallel, and the code worked just fine.

I know that this should not be necessary, but it worked. Since I do not need to train simultaneity in more than one GPU, it is ok for me.

I was already moving from and to cpu and gpu appropriately.

Thank you for the help.