API differences of torchvision and torchtext

Getting started with torchtext, I noticed a few API choices that seem to head into a different direction compared to the torchvision counterpart. I understand that these are very different types of problems based on very different types of data. Still, wouldn’t it be cleaner if the classes had a similar structure? On one hand we have transforms and data loaders on the other we use fields and iterators.

More specifically, chapter 5 of this article makes a solid argument of wrapping the iterator into another class so we don’t need to change the code of our torchtext training loop every single time. I believe, this change would essentially create the API of the DataLoader class making this part of the torchtext and torchvision libraries quite similar. I like this. It feels consistent.

  • Are there any particular reasons why the 2 APIs seem to drift apart or did it just happen over time?
  • Would you agree that a unified API between torchvision and torchtext has its benefits?
  • Is this something someone is working on?
  • Am I overlooking something?