I’m currently trying to fine-tune a Image-captioning model and getting this error:
ValueError: Expected input batch_size (3) to match target batch_size (27).
I’m sure this is the loss function that is incorrect and I’m new to PyTorch and don’t know how to correctly configure it.
The model
I’m so sorry, I picked a draft by accident, on the start of the question I said something about the batch_size, and solved that issue and know appears the "Not enough values to unpack (expected 3 , got 2)
Sure:
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
return img, torch.tensor(tokenized_captions)
Traceback (most recent call last):
File “/pytorch-tutorial/tutorials/03-advanced/image_captioning/finetuneme.py”, line 159, in
main(args)
File “/home/finetuneme.py”, line 105, in main
for i, (images, captions, lengths) in enumerate(dataloader):
^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 3, got 2)
Your dataloader object is only returning 2 items, instead of the 3 (images, captions, lengths) that you’ve placed inside your for-loop. I’d print out the contents of the dataloader and check what its iterating over and go from there.
Check the sizes of the outputs and captions tensors, and they should have different shapes, perhaps the flatten command should be over a specific dim (rather than the entire tensor).
For the nn.CrossEntropyLoss(), shouldn’t the input tensors be the same shape? If you do outputs.squeeze(0), the shapes passed to the loss function are, [1, 3, 9956] and [1, 3, 9] respectively, which aren’t the same.
Perhaps you need to map the outputs tensor to a reduced shape and then pass it to the loss function?
Yes, I’m sure they need to be the same shape, but I don’t fully understand how to do that, if I map the output to the same shape of the caption doesn’t lose its value?
I removed the flatten because the size was already the same as outputs: loss = criterion(outputs.squeeze(0), captions)
But now it changes to: ValueError: Expected input batch_size (3) to match target batch_size (1)., if I remove the squeeze(0) it’s still the same error.
EDIT:
I forgot that I separated the outputs from reduced outputs. I’ve fixed it and know the error is:
Error:
RuntimeError: Expected floating point type for target with class probabilities, got Long
line 119, in main
loss = criterion(reduced_outputs, captions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The two tensors you pass to your loss function are of different dtypes (One is torch.float32, the other is torch.long. You need to cast them to the same type (torch.float32), via .to(dtype=torch.float32).