Hi, I am running into a slightly odd problem when using a Dataloader (wrapped in PyTorch Lightning DataModule). I’m trying to train a VGG network using the TinyImageNet data set. I have reorganized the validation set to have the same file structure as the training set. If I load my dataset like this:
def setup(self, stage=None):
if stage == "fit" or stage is None:
t = transforms.Compose(self.augment + self.normalize)
self.df_train = datasets.ImageFolder(os.path.join(self.data_dir, 'train'), transform=t)
t = transforms.Compose(self.normalize)
self.df_val = datasets.ImageFolder(os.path.join(self.data_dir, 'val'), transform=t)
the training does not converge (i.e. loss goes to a high value during the first epoch and then remains there. Validation accuracy remains at chance level during the entire time). However, if I change it to:
def setup(self, stage=None):
if stage == "fit" or stage is None:
t = transforms.Compose(self.augment + self.normalize)
ds_full = datasets.ImageFolder(os.path.join(self.data_dir, 'train'), transform=t)
### Pass training dataset through random split (this should be a no-op, no!?)
self.df_train, _ = td.random_split(ds_full, [100000, 0])
t = transforms.Compose(self.normalize)
self.df_val = datasets.ImageFolder(os.path.join(self.data_dir, 'val'), transform=t)
(with no other changes to code, hyperparameters, etc.)
The training loss and validation loss go down and accuracies go up like I would expect them to. Versions that I am using:
>>> import torch
>>> torch.__version__
'1.9.1+cu102'
>>> import pytorch_lightning as pl
>>> pl.__version__
'1.4.9'
Does anyone have any idea what is going wrong here? If this is a genuine issue, I’m happy to file a bug report. I just want to rule out the possibility that I am doing something wrong.