I use pytorch with a dataloader having multiple num_worker setting on the remote server with GPUs and got the error "DataLoader worker (pid xxx) is killed by signal: Bus error. ". I read Shm error in docker and now know that I need to build a new container with --shm-size option in docker run command. (That is one I use ). The question is that how to make all the programs and data existing in my current container available in anew container ? The volume is huge, like 500G. Or is there any other way to expand the shm size without making a new container ?
You don’t need to rebuild the container, but can use:
docker run -it --ipc=host your_docker_name
or specify --shm-size=""
in the docker run
command.
@prtbick Thank you for your reply. I am a beginner with docker and didn’t know that docker run can be applied to a built container. When I would like to use GPU on my server, which is better --ipc=host or --shm-size= ? Or Doesn’t it matter ?
Than you in advance.
Hiroshi
I threw the following command and got the error message ? What is wrong?
$ docker run -it --gpus all --shm-size=1g my_existing_container_name bash
Unable to find image ‘my_existing_container_name’ locally
docker: Error response from daemon: pull access denied for my_existing_container_name, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.
See ‘docker run --help’.
I’m using --ipc=host
, while others prefer --shm-size
so you might stick to what works for you.
It seems that my_exsisting_container_name
is not recognized. Are you sure you’ve tagged the container with this name?
During your build you can create a tag for the container using:
docker build -t container_name .
Thank you for your comment on --ipc=host or --shm-size.
Regarding your saying
$ docker build -t container_name,
I am confused. By dockerfile docker build command document, is container_name an image instead of a container ?
Let me explain the history leading up to here.
Firstly, my administrator built an image, original_image, from nvidia hub for us and I made my_existing_container_name from that image by
$ docker run -it --gpus all --name my_existing_container_name original_image bash
Then I worked on this container, creating many new applications and data for a while and then found that the container needs more shared memory to run a specific program I created. This initiates the question of this thread.
What I think I must do from now on is;
- To do docker commit to make a new image from the current my_existing_container_name by
docker commit my_existing_container_name myrepository:new_image
where myrepository and new_image are names which I can arbitrarily specify
- To do docker run for creating a my_new_container_name with a bigger shared memory from myrepository:new_image by
docker run --gpu all --ipc=host --name my_new_container_name myrepository:new_image bash
Are these right things to do ?
Thank you.
P.S. Even if the above are right things to do, I am another problem. The disk looks short of empty area to do commit because the my_existing_image_name gets too big. It says running up disk space. I am trying to increase empty disk space by deleting unnecessary files in the container or in the host.
You can commit your container, if you want to store your changes (e.g. your code).
I’m usually not saving the container and mount a folder to store my source code files, but your workflow might be different of course.
Otherwise, you could simply kill it and restart it with --ipc=host
.
Thank you for your comments. I understood the way you do. You share directories and files on the host disk with the container you used by docker run -v xxxx option.
Thank you !.