I have been working on a pytorch implementation of FlowNet, as it will be useful for me and makes me train to use it. (convergence is still WIP)
However, there has been some issues that I had to solve in order to match my workflow. So I created this topic to either discuss about possible ameliorations in the dataset interface or ameliorations in my own workflow, which i like but may be far from perfect.
As dicussed here , currently, transform functions are not supporting coherent transformations between input and target, because the random parameters ( such as flip or not, or crop coordinates) are created within the function call. It won't be a problem for classification, but will be for all kinds of geometric problems, from Bounding boxes localisation to flow/depth estimation.
To address this issue, i created
co_transforms that take both input and target for arguments. that way you can create your own transformation that will keep your target coherent to you augment input.
See it here
The big problem to my mind is the fact that
target_transforms are ambiguous regarding the order we call them. Do we call the
co_transforms first or last ? If called last, all co_transforms must deal with pytorch tensors, while when called first, they will deal different kinds of data structures.
In my code, you will see here that I chose to do it first, but before that I hard-coded numpy conversion for images (calling imread from scipy instead of PIL load) to always deal with numpy arrays. This was to me the only way, because
target_transforms have the
ToTensor() functions (which i slightly modified in
ArrayToTensor() to have the correct
CxHxW conversion. What's more, dealing with PIL operations and numpy array at the same time is very risky as h and w are in reverse order when calling PIL or array functions.
I also decided to try a splitted dataset. As suggested by some, the simpliest way is to manually glob files and make two different datasets from image paths lists. But i think a splitted dataset is a common need when you are developping your own dataset, so I tried to get it all in a unified class with a
split parameter, which applies here to the flying chairs dataset which is just a folder with img pairs and associated flo files. (dataset code)
To move from train dataset to test dataset, you just have to call
dataset.train() before calling the data loader the same way you do it when putting the network in train or test mode.
Dataloader and splitted dataset
The main problem with this dataset is the fact that DataLoader are not particularly suited for dynamically changing datasets. Especially with samplers. There is currently two different samplers, sequential and random. And both get their dataset length at creation. Is it really saving cpu load to not get it from dataset's
__len__ function each time ? I thus created my own dataset sampler that does not assume dataset length, which permits it to change, here. I also wanted to control my epoch size, in order to run inference test on validation more often, which is useufull for rapidly changing networks. In this code you will see that samples selection is random but without replacement, to make sure we go through the whole dataset before having a chance to select the same sample twice, because when restarting an epoch, official random sampler is reinitialized.
Difference between dataset and Data Loader
The main issue I had to face and for which i could not find a good workaround is the data augmentation parameter. Basically, data augmentation transformations are dealt with during dataset's
__get__ function so when I call
dataset.eval() I shut down the
co_transforms which is not necessarily what I want. So you should have either 2x3 set of transformations, 1 set of 3 (
andco_transform` ) for test and for train, or deal with it somewhere else.
And i think it makes more sense to deal with it in Data Loaders, that way you have transformations that are independant from dataset sampling. What I usually did on torch7 was to test every now and then inference on train set, without data augmentation and network set to eval, to see how much we overfit. this makes sense when data augmentation can be e.g. adding noise or blur to input, you should be able to get easily train samples without it.
So to my mind, dataset would be the class where you decide where to take samples, and data loader is the one deciding if we apply specific data augmentation routines or not.
dataset = datasets.foo(data, split)
train_loader = torch.utils.data.DataLoader(
test_loader = torch.utils.data.DataLoader(
enumerate(train_loader) #*batches from train set with data augmentation
enumerate(test_loader) #batches from test set without data augmentation
enumerate(test_loader) #batches from train set without data augmentation
The result will be that all graphical operations should be done directly with Tensors (as current samplers expect lists of tensors), but that was already done with torch image toolbox. What's more, CUDA solutions as dicussed here and maybe can be done with CudaTensor ?
I also could find problems regarding numpy conversion from
CxHxW commented in the source. Dealing with tensors from beginning to the end could lead to time improvements regarding data loading.
Last problem will be that graphic functions will be involved in a module that is not from vision (because data loaders are from pytorch/utils) but i think vision was separated from the rest because it involved PIL operations which was not necessary for some other problems such as text embedding. But I think if we work with tensors, these transform don't have to be graphics and could be anything as long as tensors are given in output.
I hope my suggestions regarding dataset handling were not too naive, and feel free to suggest better ways of doing it (which can be the already existing one!).
If some of my ideas seem good to you, I'd be happy to contribute to it.
sum up of ideas
- splitted dataset
- dynamic samplers
- random sampling without replacement for epoch size < dataset size
- attach transformations to data loaders
- add tensor related image loading and transformations to avoid numpy HxWxC to CxHxW conversion