How to make the labels in the dataset start from 0?

leejunu · October 15, 2022, 7:14pm

hello. I am a student learning pytorch.

I wanted to use the CrossEntropyLoss function, but I got an error because the labels of the dataset I want to use start from 1.
How can I make the labels start from 0?

srishti-git1110 · October 15, 2022, 7:26pm

You could beforehand subtract 1 from the column containing labels. Or is there anything else you are looking for?

leejunu · October 15, 2022, 7:43pm

My English is not good, so I don’t know how to say it…

The dataset I want to use is this : Flowers102 — Torchvision 0.13 documentation
In that dataset, 102 kinds of flowers are numbered from 1 to 102. I would like to change this to be numbered from 0 to 101.
the error message is : IndexError: Target 102 is out of bounds.

transform = transforms.Compose([
        transforms.RandomRotation(30),
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
    ])
trainset = datasets.Flowers102(
    root = './.data/',
    split = 'train',
    download = True,
    transform = transform
)
train_loader = data.DataLoader(
    dataset = trainset,
    batch_size  = 64,
    shuffle=True
)

for data in train_loader:
    data[1] = data[1] - 1

I tried subtracting 1 in this way, but for some reason it doesn’t seem to work.

srishti-git1110 · October 16, 2022, 4:06am

According to me, the labels start from 0 only. Could you please post a minimum code that reproduces your error?

This works for me:

import torchvision.transforms as transforms
from torchvision import datasets
import torch.nn as nn
import torch
transform = transforms.Compose([
        transforms.RandomRotation(30),
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
    ])
trainset = datasets.Flowers102(
    root = './.data/',
    split = 'train',
    download = True,
    transform = transform
)
train_loader = torch.utils.data.DataLoader(
    dataset = trainset,
    batch_size  = 64,
    shuffle=True,
    drop_last=True
)
loss_fn = nn.CrossEntropyLoss()
for data in train_loader:
    loss = loss_fn(torch.randn(64, 102) ,data[1])
    print(loss)

leejunu · October 16, 2022, 4:35am

This is minimum code that reproduces my error :

from torchvision import datasets, transforms, utils
from torch.utils import data
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np

USE_CUDA = torch.cuda.is_available()
DEVICE = torch.device("cuda" if USE_CUDA else "cpu")

transform = transforms.Compose([
        transforms.RandomRotation(30),
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
    ])
trainset = datasets.Flowers102(
    root = './.data/',
    split = 'train',
    download = True,
    transform = transform
)

epochs = 10
batch_size = 64

train_loader = data.DataLoader(
    dataset = trainset,
    batch_size  = batch_size,
    shuffle=True
)

class Net(nn.Module) :
    def __init__(self):
        super(Net,self).__init__()
        self.fc1 = nn.Linear(150528,9408)
        self.fc2 = nn.Linear(9408,1176)
        self.fc3 = nn.Linear(1176,102)
    def forward(self, x):
        x = x.view(-1,150528)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = Net().to(DEVICE)
optimizer = optim.SGD(model.parameters(), lr=0.01)

def train(model, train_loader, optimizer):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(DEVICE), target.to(DEVICE)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()  

train(model, train_loader, optimizer)

srishti-git1110 · October 16, 2022, 6:51am

Your code runs fine and doesn’t produce any error on my end.
Could you please try restarting your notebook to see if it still produces the error?

leejunu · October 16, 2022, 7:35am

I restarted my notebook and tried running it again, but I still get the error.

the error message is :

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 train(model, train_loader, optimizer)

Input In [6], in train(model, train_loader, optimizer)
      5 optimizer.zero_grad()
      6 output = model(data)
----> 7 loss = F.cross_entropy(output, target)
      8 loss.backward()
      9 optimizer.step()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\functional.py:2996, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   2994 if size_average is not None or reduce is not None:
   2995     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2996 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

IndexError: Target 102 is out of bounds.

srishti-git1110 · October 16, 2022, 7:54am

I see. This shouldn’t be the case as I even tried printing the labels for each batch in train_loader and there are no 102s, and the labels start from 0.

Could you please post the versions of torch and torchvision you are using, like so:

import torch
import torchvision

print(torch.__version__)  # 1.12.1+cu113
print(torchvision.__version__)  # 0.13.1+cu113

leejunu · October 16, 2022, 8:07am

versions of torch : 1.11.0+cpu
versions of torchvision : 0.12.0+cpu

srishti-git1110 · October 16, 2022, 8:11am

Not sure, but I would think it might have something to do with the versions. Please update to the latest versions, restart your notebook and feel free to post if there’s still an error.

For me, your code runs error-free on both CPU and GPU devices.

leejunu · October 17, 2022, 4:25am

After the update, I ran it again and it worked fine. Thanks for your help.