CrossEntropyLoss - Expected object of type torch.LongTensor

I tried to implement image classification of building images of my neighborhood. There two classes of images: panel and modern. Images of shape:

torch.Size([36, 3, 150, 200])

A shape of labels:

torch.Size([36, 2])

If you need I can provide a link to download images.
My code is:

import torch
import matplotlib.pyplot as plt
import numpy as np
from os import listdir
from sklearn.model_selection import train_test_split

def loadImages(path):
    imagesList = listdir(path)
    loadedImages = []
    for image in imagesList:
        loadedImages.append(plt.imread(path + image))   
    return np.array(loadedImages)
panel = loadImages('./photo_small/panel/') / 255
modern = loadImages('./photo_small/modern/') / 255
photo_up = np.concatenate((panel, modern), axis=0)
photo = photo_up.swapaxes(3, 1).swapaxes(3,2)
label_first = np.concatenate((np.zeros(20), np.ones(20)), axis=0)
label_second = np.concatenate((np.ones(20), np.zeros(20)), axis=0)
label_almost = np.vstack((label_first, label_second))
label = label_almost.swapaxes(1,0)
X_train, X_test, y_train, y_test = train_test_split(photo, label, test_size=0.1, random_state=42)
X_train_torch = torch.from_numpy(X_train).float()
X_test_torch = torch.from_numpy(X_test).float()
y_train_torch = torch.from_numpy(y_train).float()
y_test_torch = torch.from_numpy(y_test).float()

class Flatten(torch.nn.Module):
    def forward(self, x):
        return x.view(x.size()[0], -1)

model = torch.nn.Sequential(
        torch.nn.Conv2d(3, 64, kernel_size=(3, 3)),
        torch.nn.ReLU(),
        torch.nn.Conv2d(64, 64, kernel_size=(3, 3)),
        torch.nn.ReLU(),
        torch.nn.MaxPool2d(kernel_size=(2, 2)),
        torch.nn.Dropout(0.25),
        Flatten(),
        torch.nn.Linear(457856, 128),
        torch.nn.ReLU(),
        torch.nn.Linear(128, 2),
        torch.nn.Softmax(dim=0)
        )
loss_fn = torch.nn.CrossEntropyLoss()
learning_rate = 0.01
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(5000):
    y_pred = model(X_train_torch)
    loss = loss_fn(y_pred, y_train_torch)
    print(t, loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

When I execute the code I have: “RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 ‘target’”.

Traceback:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-38-c8423f8403d7> in <module>
      3 for t in range(5000):
      4     y_pred = model(X_train_torch)
----> 5     loss = loss_fn(y_pred, y_train_torch)
      6     print(t, loss.item())
      7     optimizer.zero_grad()

c:\program files\python36\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

c:\program files\python36\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
    860     def forward(self, input, target):
    861         return F.cross_entropy(input, target, weight=self.weight,
--> 862                                ignore_index=self.ignore_index, reduction=self.reduction)
    863 
    864 

c:\program files\python36\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   1548     if size_average is not None or reduce is not None:
   1549         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 1550     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   1551 
   1552 

c:\program files\python36\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   1405                          .format(input.size(0), target.size(0)))
   1406     if dim == 2:
-> 1407         return torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   1408     elif dim == 4:
   1409         return torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'target'".

When I change tensor to Long the following error appears: “RuntimeError: thnn_conv2d_forward is not implemented for type torch.LongTensor”.

However, if I change loss function to L1loss model is starting to work but a loss is not decreasing which is not the main problem for me right now.
Maybe I messed up with labels (y_train_torch).
Any help would be greatly appreciated.
Long life to Pytorch!

Recent update: tried Cifar architecture, the problem is the same with LongTensor.
There are two doubts that I have:

  1. Flatten code is incorrect.
  2. Labels are incorrect. Because in Cifar example labels of 1D: 0-9. Tried to do the same with my nn, PyTorch expect 36 by 2 which I had.

You can laugh but I continue to struggle:
y_pred or output in other word doesnt’ sum to 1:

tensor([[0.0290, 0.0283],
        [0.0276, 0.0276],
        [0.0281, 0.0279],
        [0.0285, 0.0275],
        [0.0276, 0.0276],
        [0.0277, 0.0282],
        [0.0274, 0.0274],
        [0.0277, 0.0281],
        [0.0276, 0.0280],
        [0.0278, 0.0276],
        [0.0278, 0.0277],
        [0.0276, 0.0277],
        [0.0275, 0.0279],
        [0.0272, 0.0271],
        [0.0273, 0.0281],
        [0.0280, 0.0271],
        [0.0277, 0.0276],
        [0.0277, 0.0280],
        [0.0289, 0.0281],
        [0.0283, 0.0281],
        [0.0282, 0.0269],
        [0.0277, 0.0274],
        [0.0274, 0.0277],
        [0.0280, 0.0276],
        [0.0286, 0.0277],
        [0.0268, 0.0279],
        [0.0275, 0.0280],
        [0.0277, 0.0285],
        [0.0276, 0.0284],
        [0.0273, 0.0279],
        [0.0273, 0.0284],
        [0.0281, 0.0274],
        [0.0279, 0.0279],
        [0.0276, 0.0276],
        [0.0274, 0.0276],
        [0.0278, 0.0276]], grad_fn=<SoftmaxBackward>)

What is the problem? torch.nn.Softmax(dim=0) seems to be right as I want sum by rows.
Why it is not giving me one?

Recent update:
Correct is dim=1 in Softmax.

There are a few small mistakes in your code.

In a vanilla classification use case, your target should be a LongTensor containing the class indices in the range [0, num_classes-1].
As it seems your target is one-hot encoded, so you could get the class indices by:

y_train_torch = torch.from_numpy(y_train).argmax()

This will also cast your target to torch.long, which is needed in nn.CrossEntropyLoss.

Since you are using nn.CrossEntropyLoss, you should pass raw logits into the loss function instead of the softmax probabilities. Just remove the last softmax layer and you should be good to go!

2 Likes

Epoch 38 but my data is far from being close to labels. I will try more for sure but TensorFlow and Keras give me pretty good accuracy on epoch 20. I deleted Softmax as was recommended. What’s wrong with my model?
First 20 labels should be 0 and the rest 16 should be 1.

38 0.6869633197784424
tensor([[ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205],
        [ 0.0988, -0.1205]], grad_fn=<ThAddmmBackward>)

I corrected several mistakes in my model and make it work with CrossEntropy but I don’t get good accuracy. And my output is far away from labels.

Labels shape:

torch.Size([36])

Labels content:

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

My model:

model = torch.nn.Sequential(
        torch.nn.Conv2d(3, 64, kernel_size=(3, 3)),
        torch.nn.ReLU(),
        torch.nn.Conv2d(64, 64, kernel_size=(3, 3)),
        torch.nn.ReLU(),
        torch.nn.MaxPool2d(kernel_size=(2, 2)),
        torch.nn.Dropout(0.25),
        Flatten(),
        torch.nn.Linear(457856, 128),
        torch.nn.ReLU(),
        torch.nn.Linear(128, 2)
        )

Are you using the same architecture for your TF/Keras model?
Could you post it just for the sake of debugging?

It looks like your model get’s stuck and just predicts constant logits.
Could you try to lower your learning rate a bit and see if something changes?

Sure:
Keras

model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

Metrics Keras:

Train on 36 samples, validate on 4 samples
Epoch 1/40
36/36 [==============================] - 11s 311ms/step - loss: 1.8301 - acc: 0.3889 - val_loss: 12.0227 - val_acc: 0.2500
Epoch 2/40
36/36 [==============================] - 6s 173ms/step - loss: 7.4835 - acc: 0.4722 - val_loss: 12.0227 - val_acc: 0.2500
Epoch 3/40
36/36 [==============================] - 6s 170ms/step - loss: 6.9669 - acc: 0.5278 - val_loss: 12.0227 - val_acc: 0.2500
Epoch 4/40
36/36 [==============================] - 6s 169ms/step - loss: 5.1954 - acc: 0.5278 - val_loss: 4.0076 - val_acc: 0.7500
Epoch 5/40
36/36 [==============================] - 7s 182ms/step - loss: 7.3551 - acc: 0.4722 - val_loss: 3.4672 - val_acc: 0.7500
Epoch 6/40
36/36 [==============================] - 7s 184ms/step - loss: 6.0102 - acc: 0.4722 - val_loss: 1.2699 - val_acc: 0.2500
Epoch 7/40
36/36 [==============================] - 6s 177ms/step - loss: 0.8226 - acc: 0.6944 - val_loss: 0.4165 - val_acc: 0.7500
Epoch 8/40
36/36 [==============================] - 6s 178ms/step - loss: 0.7371 - acc: 0.6389 - val_loss: 0.6003 - val_acc: 0.7500
Epoch 9/40
36/36 [==============================] - 6s 178ms/step - loss: 0.2003 - acc: 0.9444 - val_loss: 0.9687 - val_acc: 0.2500
Epoch 10/40
36/36 [==============================] - 7s 186ms/step - loss: 0.2384 - acc: 0.9167 - val_loss: 0.6743 - val_acc: 0.5000
Epoch 11/40
36/36 [==============================] - 6s 173ms/step - loss: 0.1137 - acc: 0.9722 - val_loss: 0.1711 - val_acc: 1.0000
Epoch 12/40
36/36 [==============================] - 6s 168ms/step - loss: 0.0440 - acc: 1.0000 - val_loss: 0.9161 - val_acc: 0.5000
Epoch 13/40
36/36 [==============================] - 6s 172ms/step - loss: 0.0463 - acc: 1.0000 - val_loss: 0.1495 - val_acc: 1.0000

TensorFlow:

nn = tf.layers.conv2d(X_train_placeholder, 32, kernel_size=(3, 3), activation='relu')
nn = tf.layers.conv2d(nn, 64, kernel_size=(3, 3), activation='relu')
nn = tf.layers.max_pooling2d(nn, pool_size=(2, 2), strides=2)
nn = tf.layers.dropout(nn, 0.25)
nn = tf.layers.flatten(nn)
nn = tf.layers.dense(nn, 128, activation='relu')
nn = tf.layers.dropout(nn, 0.5)
nn = tf.layers.dense(nn, 2)

Metrics TF:

Currently on step 0
Loss:  0.7080002
Training accuracy is:
0.5277778
Validation accuracy is:
0.25


Currently on step 2
Loss:  0.8956334
Training accuracy is:
0.4722222
Validation accuracy is:
0.75


Currently on step 4
Loss:  2.839597
Training accuracy is:
0.4722222
Validation accuracy is:
0.75


Currently on step 6
Loss:  0.9275551
Training accuracy is:
0.5555556
Validation accuracy is:
0.25


Currently on step 8
Loss:  0.81815374
Training accuracy is:
0.5277778
Validation accuracy is:
0.25


Currently on step 10
Loss:  0.97793007
Training accuracy is:
0.5277778
Validation accuracy is:
0.25


Currently on step 12
Loss:  0.80247444
Training accuracy is:
0.5277778
Validation accuracy is:
0.25


Currently on step 14
Loss:  0.600952
Training accuracy is:
0.9444444
Validation accuracy is:
1.0

Thanks for the code!
There are some small differences. While your TF/Keras models use 32 kernels in the first conv layer, you are using 64. Just change the second argument in your first conv layer to 32.
Also your reference models use another dropout layer before the last linear layer.
Could you change this and try to train your model again?

Also, PyTorch initializes the conv and linear weights with kaiming_uniform and the bias with a uniform distribution using fan_in by default.
If you want to copy Keras’ initialization you could use the following code:

def weight_init(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
        nn.init.xavier_uniform_(m.weight, gain=nn.init.calculate_gain('relu'))
        nn.init.zeros_(m.bias)

model.apply(weight_init)
4 Likes

Lowering the learning rate to TF learning rate helped but 20 epochs for PyTorch and accuracy still not the best.
Metrics PyTorch:
Loss:

0 0.6992619037628174
1 1.378086805343628
2 1.0771313905715942
3 0.904154360294342
4 0.5980193614959717
5 0.7647961378097534
6 0.6887813806533813
7 0.5120381712913513
8 0.5252910852432251
9 0.5683923959732056
10 0.48663294315338135
11 0.402211457490921
12 0.3984984755516052
13 0.396999716758728
14 0.35266047716140747
15 0.301003634929657
16 0.2879875898361206
17 0.28108495473861694
18 0.2524130642414093
19 0.2120990753173828

Accuracy on training data and test data:

Accuracy: 100.0%
Accuracy: 25.0%

I am still PyTorch believer and don’t go for performance, make it second in your priority list.
Ease of use first. Calculation after convolution. What a headache!
Second performance.
TIll now PyTorch is the best because of reinforcement learning.
And paso a paso I am moving forward. Can’t wait till my first reinforcement learning project. LSTM ahead and then reinforcement learning.
Very happy. PyTorch made me feel happy!

Good to hear PyTorch makes you happy! :wink:
How big is your validation dataset? Based on its accuracy it looks like you are only using 4 samples.
If that’s the case, you can try to train a bit longer and see it the validation accuracy jumps to 100% by chance.
Note that such a small validation dataset doesn’t give you a good signal of the performance of your model, in case I’m right about the size.

The dataset is small only 40 samples. I took photos of two different types of building: panel buildings (don’t know if it’s correct English translation) and modern in my neighborhood. I should have taken more I know but that’s what I have. The big plus is that it’s unique content, not another MNIST example.

I was just wondering, since your validation accuracy seems to only take these values [25%, 50%, 75%, 100%]. This looked like you are just using 4 samples.

Sure, a unique dataset makes things more interesting!
Good luck with your buildings data! :wink:

1 Like

But this initialization is different from kernel_initializer=‘he_uniform’ of keras. Isnt it?

Yes, he_uniform would correspond to kaiming_uniform in PyTorch and I think I was referring to the default Keras initialization based on their docs, which is glorot_uniform (so xavier_uniform in PyTorch).

1 Like

What arguments do we need to pass to kaiming_uniform_ in the case of initializing with this particular method?

The docs give the arguments as:

        tensor – an n-dimensional torch.Tensor

        a – the negative slope of the rectifier used after this layer (only used with 'leaky_relu')

        mode – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

        nonlinearity – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).