Compiling Pytorch with CUDA 9.1 inside a docker container


I’m trying to compile pytorch inside a docker container - it compiles successfully but when I use the image created to train a model - it crashes with the following stacktrace (this does not happen when I run it with the binary cu91/torch-0.3.1-cp36-cp36m-linux_x86_64.whl):

    model = model.cuda(cuda_device)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/", line 216, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/", line 146, in _apply
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/", line 146, in _apply
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/", line 123, in _apply
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/", line 111, in flatten_parameters
    params = rnn.get_parameters(fn, handle, fn.weight_buf)
  File "/opt/conda/lib/python3.6/site-packages/torch/backends/cudnn/", line 165, in get_parameters
    assert == filter_dim_a[0]

I’ve copied my docker file below for reference:

FROM nvidia/cuda:9.1-cudnn7-devel-ubuntu16.04

RUN apt-get update && apt-get install -y --no-install-recommends \
		build-essential \
		cmake \
		git \
		curl \
		vim \
		ca-certificates \
		libjpeg-dev \
		libpng-dev && \
	rm -rf /var/lib/apt/lists/

RUN curl -o ~/ -O  && \
	chmod +x ~/ && \
	~/ -b -p /opt/conda && \
	rm ~/ && \
	/opt/conda/bin/conda install numpy pyyaml scipy ipython mkl && \
	/opt/conda/bin/conda install -c soumith magma-cuda91 && \
	/opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/bin:$PATH

RUN git clone --recursive --single-branch -b v0.3.1
WORKDIR /opt/pytorch
RUN CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" pip install -v .

Can anyone help me?