Dataset transform strange behavior


(Philippe) #1

Hello,

I hope someone here can help me here with this, I am a bit clueless what is causing my problem.

My setup is the following:
I load all my validation and training data into the same data set, since I am validation with n random splits and loading the data into different datasets would require me to reload the data with every new validation split, which is aboslutely not feasible for my application. This is advantageous because I can save the 15GB of samples into RAM for the whole training, which only requires me to load the dataset once at the beginning of training.

Of course, I want to maintain different transforms for training and evaluation. I do this by simply composing two transform routines and which routine I call during getitem depends on the mode of the dataset (I basically added dataset.train() and dataset.eval() ) as methods to my dataset class. Now even if I use np.copy() to generate a copy of a sample that goes from RAM into the transforms, I encounter the problem that images are altered even for evaluation. Here is an example:

dataset.eval()
original_img = dataset[idx]

dataset.train()
original_img = dataset[idx]

This gives expected output. First image is unaltered, second image is augmented. But now, if I do

dataset.eval()
original_img = dataset[idx]

again, I will get the same image as after the training transform. If I repeat this process a couple of times, it becomes apparent that the train transforms are applied to the image again and again, to some point altering it beyond recoginition.

So apparently, for some reason, during getitem(), the image seems to not be loaded from RAM where I stored the original samples.

Does anyone have an idea how this problem may come about? I know that my setup with the dataset is unusual, but it is highly advantageous for my situation of very long loading time of samples when evaluating with e.g. 50 random splits - I only have to load the data set once, not 50 times.

Any help would be appreciated.


(Philippe) #2

Solved (more or less). I will leave this here in case anyone has a similar problem.

The behaviour was caused by jupyter notebook, I cannot really determine what exactly was the problem, I cleared all variables carefully, but in proper IDEs this behaviour does not occur, so I guess it is some JN thing where something is not cleared out properly to increase efficiency if I reuse it.