hello everyone.
I’m trying to lip reading task using PyTorch.
so, I want to use the owm lip database, to training sets
.
cut/ speaker1 / utterance 1
cut/ speaker1 / utterance 2
…
cut/ speaker30 / utterance 49
cut/ speaker30 / utterance 50
each utterance [1~50] directory has 1~40.jpg, lip image
In this case, is it right torchvision.datasets.ImageFolder ?
Yes, if each utterance corresponds to a different class and you would like to classify each image separately.
However, if you would like to classify a “sequence” of these images into one of the classes, I would rather write an own Dataset and use e.g. a sliding window approach.
You can find a small example for a comparable use case here. In your case, you shouldn’t use the sampler, since you are not dealing with different persons.
Let me know, if you can adapt this code for your use case.