Expected target size mismatch with nll_loss and cross_entropy

I am new to PyTorch, so I am trying to learn from many different examples and sources, so please excuse any glaring inconsistencies. Anyways, I am trying to implement an AutoEncoder for the Fashion-MNIST dataset, and then take the trained output of the bottleneck layer to send to an MLP classifier. The AE was not too troubling, but adding the classifier is where I am running into issues.

My MLP output layer has 10 nodes (corresponding to 10 classes), and the MNIST labels are integers by default rather than one-hot codes. Based on what I have read so far, it seems that this should be no problem for the nll_loss function or cross_entropy loss (which to my understanding calls nll_loss). However, in my case this is not so, as I get the following error when attempting to calculate the cross_entropy loss between my 10-dim predictions and the integer target labels:

Traceback (most recent call last):
  File "sae_fmnist.py", line 194, in <module>
    test_SAE(model, test_set, test_bs)	# see initial perf with random weights
  File "sae_fmnist.py", line 142, in test_SAE
    test_loss_c += F.cross_entropy(pred, target)
  File "/home/ryan/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 2009, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/ryan/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 1848, in nll_loss
    out_size, target.size()))
ValueError: Expected target size (1, 10), got torch.Size([1])

My model is defined as follows:

class Autoencoder(nn.Module):
	def __init__(self):
		self.encoder = nn.Sequential(
			nn.Linear(in_features=28*28, out_features=500),
			nn.Linear(in_features=500, out_features=200),
			nn.Linear(in_features=200, out_features=20),
		self.decoder = nn.Sequential(
			nn.Linear(in_features=20, out_features=200),
			nn.Linear(in_features=200, out_features=500),
			nn.Linear(in_features=500, out_features=28*28)
		self.classifier = nn.Sequential(
			nn.Linear(in_features=20, out_features=10)#,

	# define forward function
	def forward(self, x):
		encoded = self.encoder(x)
		decoded = self.decoder(encoded)
		pred = self.classifier(encoded)
		return decoded, pred

And the erroneous function is here:

def test_SAE(model, dataset, batch_size):
	model.eval()	# puts model in evaluation mode
	test_loss_d = 0
	test_loss_c = 0
	num_correct = 0
	num_digits = 10
	test_loader = torch.utils.data.DataLoader(dataset)
	with torch.no_grad():
		for data, target in test_loader:
			data = data.flatten(start_dim=2)
			output, pred = model(data)
			criterion_d = torch.nn.MSELoss(size_average=False)
			#criterion_c = torch.nn.CrossEntropyLoss(size_average=False)
			test_loss_d += criterion_d(output, data).item()
			#test_loss_c += criterion_c(pred, target).item()
			test_loss_c += F.cross_entropy(pred, target)
			#p_class = pred.data.max(1, keepdim=True)[1]
			#num_correct += p_class.eq(target.data.view_as(p_class)).sum()
			num_correct += pred.argmax(dim=1).eq(target).sum().item()
	test_loss_d /= len(test_loader.dataset)
	test_loss_c /= len(test_loader.dataset)
	print('\nTest set: Avg. loss: {:.4f}, {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
		test_loss_d, test_loss_c, num_correct, len(test_loader.dataset),
		100. * num_correct / len(test_loader.dataset)))

Some things of note:
-I have left a few comments of other attempted methods which too produced similar errors
-The dual output of the network is based on this AE: Autoencoder and Classification inside the same model
-I call test once before any training, which is why I get errors in my test function. I expect the same errors in my similar training function, but they should correspond to whatever the problem is here, I assume.

Any suggestions would be appreciated. Thanks!

Try changing data = data.flatten(start_dim=2) to data = data.view(data.size(0), -1).

I use start_dim=2 based on the dimensions of the MNIST data.

The data is brought in as follows:

# Fashion-MNIST dataset
train_set = torchvision.datasets.FashionMNIST(
	root = './data/FashionMNIST',
	train = True,
	download = True,
	transform = transforms.Compose([

test_set = torchvision.datasets.FashionMNIST(
	root = './data/FashionMNIST',
	train = False,
	download = True,
	transform = transforms.Compose([

Then performing this:

test_loader = torch.utils.data.DataLoader(dataset)
for data, target in test_loader:
			print("Init shape: {}, nreshape: {}, length: {}, target shape: {}".format(
				data.shape, data.flatten(start_dim=2).shape, len(test_loader.dataset), target.shape))

Yields this:

Init shape: torch.Size([1, 1, 28, 28]), nreshape: torch.Size([1, 1, 784]), length: 10000, target shape: torch.Size([1])

Do you still think the start_dim=1 would be helpful?

Try the solution in my latest updated reply.

I think the issue is that the extra channel dimension in your tensor is retained in the output so your model output has shape (n_examples, 1, 10), which makes your loss function expect a label of shape (1, 10). If you reshape your output to (n_examples, 10), then the loss will take scalar labels.

Ok, I see. Wouldn’t I need something closer to data = data.flatten(start_dim=2).view(data.size(0), -1) to ensure I keep all 784 dimensions?

No just try the fix I mentioned.

The extra dummy channel dimension makes the loss function think you are giving it multi-dimensional outputs, which you really aren’t. See the doc for more details.

Looks like that did the trick. I will look into the .view() documentation at some point, right now I have training to do… Thank you so much!