Facing ValueError: Expected input batch_size (1) to match target batch_size (4)

I am trying to develop a skip-gram model of word2vec with the help of pytorch however, while training I am facing the above error. Please guide me what I am doing wrong. Here is my model:

class Skipgram(torch.nn.Module):
def init(self,vocab_size,embedding_dimensions,window_size):
super().init()
self.embeddings = nn.Embedding(vocab_size, embedding_dimensions)
self.linear1 = nn.Linear(embedding_dimensions, 128)
self.activation_1 = nn.ReLU()
self.linear2 = nn.Linear(128, window_size*vocab_size)
self.activation_2 = nn.LogSoftmax(dim=-1)
def forward(self, inputs):
embeds = self.embeddings(inputs)
embeds_1 = sum(embeds).view(1,-1)
out = self.linear1(embeds_1)
out = self.activation_1(out)
out = self.linear2(out)
out = self.activation_2(out)
return out
def get_context_embedddings(self,target):
target = [word_to_num[w] for w in target]
return self.embeddings(target).view(1,-1)

model = Skipgram(vocab_size,embedding_dim,window_size)

Here is the training code:

for epochs in range(50):
total_loss = 0

for center_word, target in data_1:
    center_vector = torch.tensor([word_to_num[center_word]], dtype=torch.long)
    y_val  = model(center_vector)
    idxs = torch.tensor([word_to_num[w] for w in target],dtype=torch.long)
    total_loss+= criterion(y_val,idxs)


#Updating gradients and parameters
optimizer.zero_grad()
total_loss.backward()
optimizer.step()

Here is the error that I am facing:


ValueError Traceback (most recent call last)
in
6 y_val = model(center_vector)
7 idxs = torch.tensor([word_to_num[w] for w in target],dtype=torch.long)
----> 8 total_loss+= criterion(y_val,idxs)
9
10

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
→ 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),

~\Anaconda3\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
1046 assert self.weight is None or isinstance(self.weight, Tensor)
1047 return F.cross_entropy(input, target, weight=self.weight,
→ 1048 ignore_index=self.ignore_index, reduction=self.reduction)
1049
1050

~\Anaconda3\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2691 if size_average is not None or reduce is not None:
2692 reduction = _Reduction.legacy_get_string(size_average, reduce)
→ 2693 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
2694
2695

~\Anaconda3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2383 if input.size(0) != target.size(0):
2384 raise ValueError(
→ 2385 “Expected input batch_size ({}) to match target batch_size ({}).”.format(input.size(0), target.size(0))
2386 )
2387 if dim == 2:

ValueError: Expected input batch_size (1) to match target batch_size (4).
Please tell me what I am doing wrong here. I would be grateful for your help.

I guess your view operations might change the batch size, which could yield this error, so you should check:

embeds_1 = sum(embeds).view(1,-1)

and

return self.embeddings(target).view(1,-1)

which both flatten the tensor so that it would have a batch size of 1.
Usually you would keep the batch size equal and flatten the other dimensions via:

tensor = tensor.view(tensor.size(0), -1)

Does the dimension of logsoftmax function also effects the dimensions. Like the final coding statement of this:
self.activation_2 = nn.LogSoftmax(dim=-1)
Does this also effects the shape of the tensor???

No, nn.LogSoftmax won’t change the shape of the tensor and you can verify it by printing the shape of the input and compare it to the shape of the output.

Now I am facing the following error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1 and 500x128)
tensor.size(0),-1 is also not doing any good

It now seems that some layers have a wrong shape so you would have to check the number of feature of the properly flattened tensor and make sure that the following layer uses the same number of features as its input.