DataLoader and CrossEntropyLoss

When I use a dataset directly (not a dataloader) as follows:

import torch
from torch import nn
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

# Set the hyperparameters for data creation
NUM_CLASSES = 2
NUM_FEATURES = 2
RANDOM_SEED = 42


X_blob, y_blob = make_blobs(n_samples=1000,
    n_features=NUM_FEATURES, # X features
    centers=NUM_CLASSES, # y labels 
    cluster_std=1.5, # give the clusters a little shake up (try changing this to 1.0, the default)
    random_state=RANDOM_SEED
)

# 2. Turn data into tensors
X_blob = torch.from_numpy(X_blob).type(torch.float)
y_blob = torch.from_numpy(y_blob).type(torch.LongTensor)
print(X_blob[:5], y_blob[:5])

# 3. Split into train and test sets
X_blob_train, X_blob_test, y_blob_train, y_blob_test = train_test_split(X_blob,
    y_blob,
    test_size=0.2,
    random_state=RANDOM_SEED
)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params = model_4.parameters())

And I train it will work, but I had to make sure the labels were LongTensor type.

However, when I use a dataloader for the FashionMNISt dataset, I no longer had to convert the type to longfloat for the labels

train_data = datasets.FashionMNIST(
    root="data", # where to download data to?
    train=True, # get training data
    download=True, # download data if it doesn't exist on disk
    transform=ToTensor(), # images come as PIL format, we want to turn into Torch tensors
    target_transform=None # you can transform labels as well
)

# Setup testing data
test_data = datasets.FashionMNIST(
    root="data",
    train=False, # get test data
    download=True,
    transform=ToTensor()
)

class_names = train_data.classes

from torch import nn
class FashionMNISTModelV0(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(), # neural networks like their inputs in vector form
            nn.Linear(in_features=input_shape, out_features=hidden_units), # in_features = number of features in a data sample (784 pixels)
            nn.Linear(in_features=hidden_units, out_features=output_shape)
        )
    
    def forward(self, x):
        return self.layer_stack(x)

from torch import nn
class FashionMNISTModelV0(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(), # neural networks like their inputs in vector form
            nn.Linear(in_features=input_shape, out_features=hidden_units), # in_features = number of features in a data sample (784 pixels)
            nn.Linear(in_features=hidden_units, out_features=output_shape)
        )
    
    def forward(self, x):
        return self.layer_stack(x)

I don’t know what the question is, but in case you are wondering why you didn’t need to explicitly transform the labels, then note that FashionMNIST will return the right dtypes for a multi-class classification already.

My question is that the type for the labels for fashionMNIST is an integer, but I had an error in the non-dataloader method to make sure the labels were LongTensor.

I’m confused by what type I need to use going forward when using Crossentropy and for what situation.

If I had the labels as just a float, then I get this error

RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Float'

But if I convert the labels to LongTensor it works.

I don’t see how it works with FashionMnist because even though the labels are not a float, it is an integer type and not LongTensor

If I convert the labels to int then I get

RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int'

Which is why I am confused why the Mnist labels are int and it works

PyTorch tensors will use int64 as their default when creating a tensor from a Python int:

a, b = train_data[0]
torch.tensor(b).dtype
# torch.int64

The target is expected to be a LongTensor containing class indices in the default use case and can be a floating point tensor in case you are working with soft targets. Check the docs which explain it in more details.

It’s expected to be a LongTensor which makes sense.

Let me see if I understand your explanation. PyTorch will take a python int as int64.
When I test

y_blob = torch.from_numpy(y_blob).type(torch.int64)

and I try

y_blob_train.type()

it returns not int but LongTensor.

So you’re saying because it pytorch from python int to int64, pytorch internally will “equate” that to longtensor? I am assuming in the dataloader

int64 represents a LongTensor, yes:

x = torch.tensor(1)
print(x.dtype)
# torch.int64
print(x.type())
# torch.LongTensor

The tensor creation is done in the default collate_fn of the DataLoader.

collate_fn will take the list of targets to create a Tensor and since the label is int pytorch will use int64, which is the same as long tensor.