Why does the minimal pytorch tutorial not have MNIST images be onehot for logistic regression?

I was looking at a logistic regression tutorial and I noticed that the MNIST images were not one-hot. Does anyone know why?

import numpy as np

import torch
from torch.autograd import Variable
from torch import optim

from data_util import load_mnist

def build_model(input_dim, output_dim):
    # We don't need the softmax layer here since CrossEntropyLoss already
    # uses it internally.
    model = torch.nn.Sequential()
                     torch.nn.Linear(input_dim, output_dim, bias=False))
    return model

def train(model, loss, optimizer, x_val, y_val):
    x = Variable(x_val, requires_grad=False)
    y = Variable(y_val, requires_grad=False)

    # Reset gradient

    # Forward
    fx = model.forward(x)
    output = loss.forward(fx, y)

    # Backward

    # Update parameters

    return output.data[0]

def predict(model, x_val):
    x = Variable(x_val, requires_grad=False)
    output = model.forward(x)
    return output.data.numpy().argmax(axis=1)

def main():
    trX, teX, trY, teY = load_mnist(onehot=False)
    trX = torch.from_numpy(trX).float()
    teX = torch.from_numpy(teX).float()
    trY = torch.from_numpy(trY).long()

    n_examples, n_features = trX.size()
    n_classes = 10
    model = build_model(n_features, n_classes)
    loss = torch.nn.CrossEntropyLoss(size_average=True)
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    batch_size = 100

    for i in range(100):
        cost = 0.
        num_batches = n_examples // batch_size
        for k in range(num_batches):
            start, end = k * batch_size, (k + 1) * batch_size
            cost += train(model, loss, optimizer,
                          trX[start:end], trY[start:end])
        predY = predict(model, teX)
        print("Epoch %d, cost = %f, acc = %.2f%%"
              % (i + 1, cost / num_batches, 100. * np.mean(predY == teY)))

if __name__ == "__main__":

Do you mean that the labels are not one-hot? Storing indices is faster and more efficient to compute losses like NNL and cross entropy. :slight_smile:

1 Like

but everything should work fine? with the indices the CrossEntropyLoss would figure out what to do? I just want to make sure there is no bugs.

According to the docs CrossEntropyLoss takes predictions of shape (N, classes) and targets of shape (N).

So yes, CrossEntropyLoss is built to work with indices.

As for bugs, if the code runs without throwing any errors and the predictions get to be reasonably good, then your code is probably mostly bug free.

1 Like

Interesting. What I am really confused about what is the difference between Input vs Target…is Input suppose to be the output of our predictions? What confused me is that one is (N,C) while the other is (N) shape/size…do u know whats going on?

Yes, the input of the CrossEntropyLoss is the output of your model.

CrossEntropyLoss works exactly as it would if it expected a (N, C) target. It’s just more efficient to store and carry around the index of the class, and convert the index to one-hot inside the loss function.

1 Like