In my long history in commercial software I’ve never seen a major product go GA where nearly every single feature is listed as in Beta or Prototype state. Even what they call as the “main API” for the new PyTorch 2.0 is listed as Beta.
GA yes, but then the announcement list 16 features. 3 are just perf improvements. Only 1 is listed as stable and the other 12 are of either Beta or Prototype quality.
I am not complaining as I’m used to being on the bleeding edge of pre-release stuff. I’m just saying something doesn’t look right. If I were releasing PostgreSQL with a list of feature like full outer joins, data partitioning, message aggregation, x, y, and z and in the release notes saw “beta” or “prototype” next to most of them I can only imagine how our customers would react.
I would argue the main difference would be an open-source project release vs. commercial software, which might certainly have stricter requirements.
With that being said, I’m only pointing towards the “Stable” tag so I’m mostly the messenger here.
Ok. But this is a Major version release. I was expecting a number of new “GA” quality features even in open source software like postgresql and so many others. I get the feeling that 2.0 was rushed out the door to make the desired release time frame. Perception is important.
Sadly the huge perf boost many people were waiting for in the stable diffusion community, somehow got lost even though it was in the 2.0 nightly for over a month. The nightly build was pulling in cuDNN 8.7 but the GA, which is built or bundled differently, still has cuDNN 8.5 as a dependency.
One of two things happened. Either there was a conscience decision to not include the 8.7 fix in the GA OR because the two are built or packaged differently it opened up mistakes where whoever fixed the problem didn’t realize they also had to make a different change to the “small binary” GA build so it would match the “large binary” wheel. If the later happened that is even more reasons to make the change we are discussing on github.
The CUDA 11.7 wheels use cuDNN==8.5 while the CUDA 11.8 wheels use cuDNN=8.7, which was the same case for the nightly release.
Update your 2.0.0 release to 2.0.0+cu118 and the updated cuDNN version should be used again:
None of this happened, as we used the cuDNN version matching the CUDA runtime. I don’t decide which wheels are pushed to PyPI (and are thus the “small” binaries) vs. which ones are uploaded to pytorch.org (as the “large” binary), and we are discussing these releases with Meta. Our current agreement is to host one “stable” version (currently +cu117) on PyPI and to release an updated “large” binary (currently +cu118). The nightlies would even run ahead and I will start updating to CUDA 12.x soon.
Once we had the “future” version in the nightly for a few weeks/months without seeing any major issues, we will update the packages in the next “stable” release.
E.g. for PyTorch 2.1.0 the binaries using CUDA 11.8 might be the default (small) ones while the ones using CUDA 12.x might be hosted on pytorch.org (this is all work in progress, so pure speculation from my side as I can give recommendations but don’t decide these things).
Thanks. I didn’t realize the cuDNN 8.7 fix was specific to the cu118 AND that the default install source for “pip install torch” only has the cu117 version. So I guess I have to install using the extra-index-url to reference the developer download site. I’ll let people know as they were expecting Torch 2.0 to have this fix. Now I know there are two different one, cu117 and cu118. Also cu117 has the small wheel and large wheel.
It really seems to me that the large wheel is intended for nightly build developers. I just checked and found the reason the torch shared libraries are so different in size, between the small/large, is because the large have “debug_info” and they also linked in things like the nccl library directly into libtorch_cuda.so. But a spot check of several common “functions” in the large and small wheels shows the function sizes are identical so I don’t have to worry they weren’t optimized the same.