We have a dataset of texts. One item = 1 text.
And we have 3 transform-pipelines: one for X1, one for X2 and one for Y.
Question: what is the “correct” way to implement transforms?
- Inside of Dataset? So that each item returns tensors. But then it is not reversible, in case I want to see the original text.
- Outside of dataset and before dataloader (my current implementation)
- Or inside of
collate_fn
of Dataloader. Then each item is converted to X1, X2 and Y on batch level. Somehow,collate_fn
doesn’t seem to me like the right place for such operations.
Ideally, I need transforms to run only when training starts.
I will appreciate your ideas.