I am maintaining the efficient DenseNet codebase. I found pytorch 0.3 changed the API a little bit. For _cudnn_convolution_*, we have to pass a deterministic flag for conv op. I made the changes accordingly. See this branch. It is runnable under 0.3, but the result is wrong (error rate ~= 0.9 all the time).
checkout the pytorch0.3 branch and run CUDA_VISIBLE_DEVICES=0 python demo.py --efficient True --data ./data should reproduce the problem.
What’s the last known good version of PyTorch for which CUDA_VISIBLE_DEVICES=0 python demo.py --efficient True --data ./data on master converges, and how quickly does it converge? It is possible I have messed up but master doesn’t seem to converge for me on 0.2 (e02f7bf8a344b04f51527f233bd4727b6ebb1ebe) either.
That’s interesting. I’ve run 167 epochs of the 1 GPU test on PyTorch 0.2, but the error is still at 0.896. (Do you mean something different in terms of final accuracy?) I did confirm that it converges on 0.1.12. So it sounds like the behavior change dat some point between 0.1 and 0.2.
I also believe something changed between 0.1.12 and 0.2 since we cannot pass all the test after upgrading to PyTorch 0.2. And I think you’re right. Just created a new conda environment and installed 0.2 from scratch. It did not converge. My bad!
However, I asked Trevor Killeen about convnd API between 0.1.12 and 0.2 and he replied that:
I don’t think that the CuDNN calls should have changed but I don’t have as much context to the switch from running Conv in Python to the C++ autograd implementation. Someone in one of the above channels can probably provide better help.
Sorry, I haven’t had a chance to attempt a bisect between 0.1 and 0.2 to see what might have changed. An easier thing to check that might be illuminating is to see if the problem repros (1) with cuDNN turned off (torch.backends.cudnn.enabled = False) and (2) with CPU.
I still plan on looking into this, but at this point I am seriously considering just reimplementing efficient DenseNet from scratch on PyTorch HEAD and seeing if it works or not (major porting seems necessary, since a lot of the internal APIs DenseNet are using have been moved around and are no longer available).
Thanks for your hard work!
I am curious how you would implement this. Roughly I have one question: no matter what kind of approach you’re going to implement, I guess you would need to manually allocate the memory for the several op. I am afraid this really violates the design of PyTorch…
Very interested in the outcome of this effort.
Currently using “conventional” DenseNet and would like to increase model size.
Thank you all involved for your work.