Hello, I’m currently writing a dataset for some images.
One question arises when I try to create my own transform, why is it implemented in dataset?
Wouldn’t it be more beneficial to put it inside the network so we can have different transforms for different networks (for some networks we need normalization and some we don’t, or different kind of normalization)?
Just curious about this design choice.
I don’t really understand the question. Transforms are applied over original data usually to reduce the dimensionality.
It’s not the same to have a workflow of full HD images which are 1080 p than having 120p images. The amount of data, working time etcetera you need is totally different.
Transforms are also used to normalize data or to create data augmentation (by fliping images for example).
But the most important part is that these transformations are non-differentiable operations, therefore they can never be part of a NN since you wouldn’t be able to backpropagate error through them.
In fact, activation functions are “kind of” non linear transformations but they are differentiable
Sorry for not giving enough context.
This is the problem for using multiple pretrained models, where your data transformation depends on the datasets that you trained the original networks on, not the current dataset that you use.
Right now, I am trying to put the same data through 2 networks trained with different dataset (hence with different normalization) and this become a problem that would be solved if we don’t put the transformation inside the dataset.
But I can see how non-differentiable can be a huge issue here, and my use case probably isn’t the most popular one.
Well even if that’s the case, those operations takes usually lot of time. I was using a dataset in which i choose random samples to create grountruth in a unsupervised way. Forward-backward takes 3 s to be done. This preprocess time of picking 2 samples and making gt was taking other 3 sec. In the end preprocess that way was very time consuming for doing it online and i had to fix it beforehand.
What i mean is that all those operations are very time consuming and they slow down the training process. That’s the main reason why it’s not implemented inside the dataset.
In addition, you are thinking in very simple cases, like feeding a net with images which just need few operations like cropping or rearranging, but there are much more complex nets in which you can’t generalize those transformations or which are computationally very expensive
I agree with @Chong_Toby. I believe that it would be a much better design decision to put transforms as part of the
torch.utils.data.DadaLoader() class. Here’s my experience with this.
It is very common to apply different data augmentation to training and test sets (see Resnet, VGG and etc). Now I have a set of images with some labels and create my own Dataset class. For the sake of modularity and flexibility I would much rather use the
torch.utils.data.random_split() class to split my dataset into training and validation than manually split the dataset.
However, this means that I cannot apply different transformations to the split datasets because the transforms are applied to the original combined dataset. However, if they were applied as part of the
DataLoader it would fix this issue.
Bumping this, as I also wonder why transforms are not at the data loader level
As I conceive it, dataset should reflect original data, whereas data-loader reflects, well, loaded data just before training.
How is the same original data intended to be used with different augmentations?
I think the experimental
TorchData pipelines might be interesting for you as they might provide more flexibility for your use cases. Tutorials can be found here and any feedback on this API is more than welcome.