I am trying to load files using the
HTTPReaderIterDataPipe from pytorch/data.
How do I handle exceptions (e.g. timeouts) while iterating through the URLs?
I would like to skip the URL causing problems and just move on to the next one. This seems to be impossible when using the functional form.
Is there a DataPipe (catching exceptions and moving on) exactly for this purpose or is my understanding of how to do exception handling when using DataPipes incorrect?
Thanks for using TorchData. While
timeout is accepted as an argument, I don’t think there is a built-in way within
HTTPReader to handle exception in a bespoke way.
If you would like to skip the URL causing problems, you can consider using a
.filter prior to
HTTPReader to check if it is possible to establish a connection.
I don’t think that will fully address your issue so I think other options are:
- Build on top of
HTTPReader but overrides its exception handling (probably rewrite
- Write a new DataPipe that is able to catch exception coming from a source DataPipe
- I think catching the exception is feasible, I’m less sure about resuming the DataPipe/Iterator after an exception is raised
We would accept a PR for 2 if you have a good implementation. Happy to discuss further.
option 1 is what I currently use. Wouldn’t it be better to just add this functionality (as an option) to the
HTTPReader? I have opened an issue regarding this (Modify exception handling of online Datapipes · Issue #963 · pytorch/data · GitHub)
As I personally do not think that option 2 is that useful (for me at least) if continuing is impossible, I will not be working on this atm. If you think otherwise please let me know. I would be happy to help.