Data Loading takes very long time with Docker compared to local Pytorch without Docker

Hello

I have a training script when I run it on my local machine, the loading time of the data for one epoch is around 30 minutes but when I run the same script on much powerful server with Docker the loading time takes around 5 hours.

I don’t have much experience with Docker. I used the following Dockerfile to build the image.

FROM nvcr.io/nvidia/pytorch:21.08-py3

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y ffmpeg
RUN pip install pandas
RUN pip install scikit-video
RUN pip install ffmpeg-python
RUN pip install scikit-learn
RUN pip install opencv-python
RUN pip install tqdm
RUN pip install torchsummary
RUN pip install tensorboardX
CMD [“python”, “./train_classifier.py”]

and I run the Docker container with the following command:

sudo docker run --ipc=host -it --rm --gpus=device=0 --name train_container --network=host -v /home/h/data/:/workspace train_image:3.0 bash

I checked the shared memory of the server and it has over 50% free space during training.
In addition, when I set the number of workers in the training script to 0 no noticeable difference in the loading time would be achieved.

Please, if you have any suggestion to solve the problem let me know.

Your help is much appreciated.

Hello,
Are those 5h together with container build, or the script execution have just slowed down so much. Good practice is to define the versions of every package, build process would be much faster.
If the slow-down is only the code slowdown, can you show what happens in code, how do you load data?

Thanks @Michal_Bogacz for your help.

The script execution is just went extremely slow, The 5h is the time to load the data and do some augmentations on it only NOT the container image build.

I am trying to re run the training of the following paper:

This is the data loader code from the paper:

I suspect that it could be a limitation on reading speed from the SSD or maybe the
cpu speed. I don’t know exactly but it could that there are some restrictions made by
the Docker.

Many Thanks :slight_smile:

Did you test the data loading on the server without using docker as it seems you think this issue is docker-related?