I have a training script when I run it on my local machine, the loading time of the data for one epoch is around 30 minutes but when I run the same script on much powerful server with Docker the loading time takes around 5 hours.
I don’t have much experience with Docker. I used the following Dockerfile to build the image.
RUN apt-get update && apt-get install -y ffmpeg
RUN pip install pandas
RUN pip install scikit-video
RUN pip install ffmpeg-python
RUN pip install scikit-learn
RUN pip install opencv-python
RUN pip install tqdm
RUN pip install torchsummary
RUN pip install tensorboardX
CMD [“python”, “./train_classifier.py”]
and I run the Docker container with the following command:
I checked the shared memory of the server and it has over 50% free space during training.
In addition, when I set the number of workers in the training script to 0 no noticeable difference in the loading time would be achieved.
Please, if you have any suggestion to solve the problem let me know.
Are those 5h together with container build, or the script execution have just slowed down so much. Good practice is to define the versions of every package, build process would be much faster.
If the slow-down is only the code slowdown, can you show what happens in code, how do you load data?