API differences of torchvision and torchtext

pietz · December 20, 2019, 8:24am

Getting started with torchtext, I noticed a few API choices that seem to head into a different direction compared to the torchvision counterpart. I understand that these are very different types of problems based on very different types of data. Still, wouldn’t it be cleaner if the classes had a similar structure? On one hand we have transforms and data loaders on the other we use fields and iterators.

More specifically, chapter 5 of this article makes a solid argument of wrapping the iterator into another class so we don’t need to change the code of our torchtext training loop every single time. I believe, this change would essentially create the API of the DataLoader class making this part of the torchtext and torchvision libraries quite similar. I like this. It feels consistent.

Are there any particular reasons why the 2 APIs seem to drift apart or did it just happen over time?
Would you agree that a unified API between torchvision and torchtext has its benefits?
Is this something someone is working on?
Am I overlooking something?

Thanks,
pietz