Saving Loading problem

Hi guys, I have a problem with loading model.
I am training model with batch size 32 of training DataLoader, then make predictions with batch size 1 of testing DataLoader and everything works fine. But if I save this model and then load

model_loaded.load_state_dict(torch.load(f=MODEL_SAVE_PATH))

it doesn’t allow me to predict anymore. There is such mistake

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 80])

I can’t understand what has been loaded wrong.

Could you please confirm whether or not you are using model.eval() while inferencing after loading the model?

No, it looks like this

y_pred_list =
with torch.inference_mode():
for X_batch in test_loader:
X_batch = X_batch.to(device)
y_test_pred = model_loaded(X_batch)
_, y_pred_tags = torch.max(y_test_pred, dim = 1)
y_pred_list.append(y_pred_tags.cpu().numpy())

Could you please try putting model_loaded.eval() outside the inference_mode context manager and post here if the error persists?

You gave me hint, thank you so much. I already did it and there is no mistake.
There is another problem, the data is very big and inference of this code takes too much time.
Do you have any idea how to speed up this code? I think the problem is in cycling and appending every batch. Thanks in advance.

Do you mean the error in the first post is gone with eval()?

I would think using multiple GPUs could help generally, but I personally cannot help on this part, sorry.
Since, you are already doing it in inference mode, you don’t have to worry about the expensive gradient computations.

Didn’t quite get you here, could you please elaborate?

  1. Yes, with model.eval() everything started to work well.
  2. It is idea, I will check.
  3. Well, as I understand this code, it takes every batch, makes prediction (in my case inference is a tensor of 4 digits), then takes max out of these 4 digits and append result to list. So, I was thinking is there any idea to optimize this cycle?

Please post an executable snippet of the part you’d like potential help on. I can then try to if there’s a scope for optimisation.

I don’t really know how to post executable snippet. :smiling_face_with_tear:
Post the whole model?
Don’t worry, it is ok. I can wait for inference.

You are synchronizing your code by moving the data to the CPU and transforming it into a numpy array:

y_pred_list.append(y_pred_tags.cpu().numpy())

If you want to avoid this sync, store the detached CUDATensors directly in the list assuming you have enough GPU memory:

y_pred_list.append(y_pred_tags.detach())

The .detach() call might not be needed if this tensor was already created in a no_grad() block, but it also doesn’t hurt to be explicit here.