Warning of Pandas when customizing a dataset

dmerinodel · March 5, 2024, 12:22pm

Hi all

I’m coding a custom (but simple) dataset for training an MLP with PyTorch. Basically, my data are numeric vectors of two classes so the whole dataset is a pandas.DataFrame whose first column is 0 or 1 (representing both classes) and the rest of columns are the numeric component of the vectors. Since each row represents a data, I consider that a natural PyTorch’s dataset class would be:

class MatrixDataset(Dataset):
    def __init__(self, data: pd.DataFrame, device):
        self.data = data
        self.device = device

    def __len__(self):
        return self.data.shape[0]

    def __getitem__(self, ind):
        row = self.data.iloc[ind]
        x = torch.tensor(row.iloc[1:], dtype=torch.float32, device=self.device)
        y = torch.tensor(row.iloc[0], dtype=torch.float32, device=self.device)
        return x, y

Actually, this Dataset works fine, but I receive the following warning

/path_to_my_script/script.py: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos]
x = torch.tensor(row.iloc[1:], dtype=torch.float32, device=self.device)

After searching about this warning, I still don’t get why it triggers because as many people say the warning is self-explanatory: ‘use .iloc method’ but I’m already using it. Furthermore, I try to replicate the error using Python’s interpreter in terminal and it doesn’t trigger.

Any help would be appreciated it, thanks in advance!

Deneme13 · March 19, 2024, 10:52am

did you take that warning from all iloc’s. or only with x= part. is it about iloc[1:]

dmerinodel · March 19, 2024, 11:49am

It is only from that part. I ‘patched’ it by passing .values instead of the actual serie