Greetings,
I have been working on a project to implement distributed training and data parallelization on a deep learning object detection model. I’ve been using PySpark, PyTorch, and SparkTorch, although unfortunately it was not working properly.
Could anyone provide resources or insights into these questions?
- Is the usage of SparTorch (or similar tools such as TensorFlowonSpark or Horovod) necessary in this task? (Distributed Training?)
- I have successfully loaded the necessary files from HDFS in PySpark.Dataframe format. Could I potentially bypass the use of SparkTorch and feed this dataset directly into a PyTorch deep learning model?
- If the above is feasible, could you guide me on the process of converting PySpark.dataframe into a format that PyTorch can utilize?
- If we can indeed bypass SparkTorch, would this mean that the data used in training is still parallelized? And it is trained distributed?
- In this scenario, what role would SparkTorch(or similar tools) typically play?
For additional context, I have attempted to use the lates version of PySpark to take advantage of the built-in torch.distributor module. However, due to certain constraints, I’m currently working with PySpark 2.4 and Torch.