Hi, my dataset is a tensorflow_dataset source
which unfortunately cannot be serialized (see image below). If I use “spawn” or “forkserver” as my torch distributed start method, I get the same error in the image when each process attempts to retrieve batches from the dataset.
If I use “fork” then the dataloading works but then I cannot send the tensors to CUDA (because “fork” is not CUDA supported).
Anyone knows how to run torch distributed and get around a dataset that cannot be serialized?