0.3 _cudnn_convolution_ deterministic flag

(Danlu Chan) #1


I am maintaining the efficient DenseNet codebase. I found pytorch 0.3 changed the API a little bit. For _cudnn_convolution_*, we have to pass a deterministic flag for conv op. I made the changes accordingly. See this branch. It is runnable under 0.3, but the result is wrong (error rate ~= 0.9 all the time).

checkout the pytorch0.3 branch and run CUDA_VISIBLE_DEVICES=0 python demo.py --efficient True --data ./data should reproduce the problem.

Can anyone help on this issue?


(Edward Z Yang) #2

I am going to take a look.

(Edward Z Yang) #3

What’s the last known good version of PyTorch for which CUDA_VISIBLE_DEVICES=0 python demo.py --efficient True --data ./data on master converges, and how quickly does it converge? It is possible I have messed up but master doesn’t seem to converge for me on 0.2 (e02f7bf8a344b04f51527f233bd4727b6ebb1ebe) either.

(Danlu Chan) #4

Thanks for your help!

PyTorch 0.1.12 should pass all the test.

According to my experiment, PyTorch 0.2 works well in terms of the final accuracy, but it does not pass one of the tests.

(Edward Z Yang) #5

That’s interesting. I’ve run 167 epochs of the 1 GPU test on PyTorch 0.2, but the error is still at 0.896. (Do you mean something different in terms of final accuracy?) I did confirm that it converges on 0.1.12. So it sounds like the behavior change dat some point between 0.1 and 0.2.

(Danlu Chan) #6

I also believe something changed between 0.1.12 and 0.2 since we cannot pass all the test after upgrading to PyTorch 0.2. And I think you’re right. Just created a new conda environment and installed 0.2 from scratch. It did not converge. My bad!

However, I asked Trevor Killeen about convnd API between 0.1.12 and 0.2 and he replied that:

I don’t think that the CuDNN calls should have changed but I don’t have as much context to the switch from running Conv in Python to the C++ autograd implementation. Someone in one of the above channels can probably provide better help.

(Edward Z Yang) #7

Yeah, I’m not too familiar with the 0.2 release so I’ll have to do some digging. Would be worth figuring this out though.


Has there been any progress on this? Any recommendation on where to start looking to hunt down this bug?

(Danlu Chan) #9

Any updates? I also have time to investigate now.

(Edward Z Yang) #10

Sorry, I haven’t had a chance to attempt a bisect between 0.1 and 0.2 to see what might have changed. An easier thing to check that might be illuminating is to see if the problem repros (1) with cuDNN turned off (torch.backends.cudnn.enabled = False) and (2) with CPU.

(Matthew Kleinsmith) #11

torch.backends.cudnn.enabled = True:

  • Stuck at 0.89; 80 epochs.

torch.backends.cudnn.enabled = False:


  • nvidia-docker, Ubuntu 14.04, CUDA 8.0, cuDNN 6, PyTorch 0.2.0


Base environment:

docker run --runtime=nvidia --init -it --rm --ipc=host mwksmith/efficient_densenet_pytorch

Here’s the Dockerfile. I uploaded the Docker image to Docker Hub at mwksmith/efficient_densenet_pytorch.

Environment for cudnn = True:
Titan X Pascal, Driver Version: 384.111

CUDA_VISIBLE_DEVICES=1 python demo.py

Environment for cudnn = False:

1080 Ti, Driver Version: 387.34, different machine

I put torch.backends.cudnn.enabled = False after the import statements in demo.py.


CUDA_VISIBLE_DEVICES=0 python demo.py

@Danlu_Chan, @nikostr, @ezyang

(Edward Z Yang) #12

I still plan on looking into this, but at this point I am seriously considering just reimplementing efficient DenseNet from scratch on PyTorch HEAD and seeing if it works or not (major porting seems necessary, since a lot of the internal APIs DenseNet are using have been moved around and are no longer available).

(Danlu Chan) #13

Thanks for your hard work!
I am curious how you would implement this. Roughly I have one question: no matter what kind of approach you’re going to implement, I guess you would need to manually allocate the memory for the several op. I am afraid this really violates the design of PyTorch…

(Edward Z Yang) #14

Yeah. My thinking is that we might be able to do it if we expose enough out= variants of operators.


Very interested in the outcome of this effort.
Currently using “conventional” DenseNet and would like to increase model size.
Thank you all involved for your work.