Cuda() hangs python

trying to make computation on GPU hangs execution for a long period of time

I’m using pytorch 0.4.0a0 compiled from source (git clone --recursive https://github.com/pytorch/pytorch && cd pytorch && python setup.py install) inside NVIDIA docker image (9.0-cudnn7-devel-ubuntu16.04) on Tesla V100, driver version: 384.111

works fine on Tesla P100 and driver 390.12 though

any ideas? I’m trying to rebuild pytorch inside CUDA 9.1 docker image