Welcome to the 1st issue of Pytorch Weekly, a weekly newsletter covering development in Pytorch AI development platform. You can subscribe the newsletter with pytorchweekly@freelists.org or Lingcc/pytorchweekly (github.com) .
News and articles from around the web and events
-
Hidet is introduced on
PyTorch
blog as a deep learning compiler for Efficient Model serving.Triton
andHidet Script
both allow tensor program developers can easily handle the tile-based programming model.While, compared toTriton
,Hidet Script
simplifies tensor programming by handling the fine-grained computation and memory resources (e.g., warps, shared memory) manipulation. -
TorchBench is introduced by
Yueming Hao
and other guys fromMeta Platforms, Inc
.TorchBench
is a novel benchmark suite to study the performance ofPyTorch
software stack and has been used to identify the GPU performance inefficiencies in PyTorch, and it has also been integrated into the PyTorch continuous integration system. - Towards Data Science published an amazing article: Build your own Transformer from scratch using Pytorch writen by Arjun Sarkar. It teaches the reader to build a transformer model step by step in PyTorch.
- The latest
PyTorch 2.0 Ask the Engineers Q&A Series
broughtTorchRL
by Vincent Moens and Shashank Prasanna fromMeta
. - Zachary DeVito contribute to the Pytorch Forum about Fast combined C++/Python/TorchScript/Inductor tracebacks
- David Stutz proposed a way for Loading and Saving PyTorch Models Without Knowing the Architecture in Advance
- Want to check the differences between
PyTorch
andJax
? check JAX vs. PyTorch: Differences and Similarities [2023]
On the forums and maillists
-
Run PyTorch on Multiple GPUs thread was actived again since
SM2023
tried to fine tune the GPT-2 model on multiple GPUs. Run model on multiple GPUs are not easy to handle, especially for load balance and parallel optimizations. Fresh guys are always recommanded to go through the Multi-GPU examples tutorials. Thanks toptrblck
- According to Would pytorch for cuda 11.6 work when cuda is actually 12.0, PyTorch binary currently shipped directly with
CUDA
,CUDNN
, andcuBLAS
, etc, it uses11.7
and11.8
by default. And only when build PyTorch from source, will it use the loca installed CUDA toolkit. You are recommanded to use the install method - Result reproducibility is always a headache for ML training. The thread Different training results on different machines has lasted for more than 2 years disscussing about this. PyTorch doc Reproducibility has also mentioned that Pytorch does not guarante completely reproducible results. The thread added a new difference between Windows and Linux which might cause unproducable result since
os.listdir
orglob.glob
on Windows produce an ordered list by default, however Linux output the random file list. it lead to different result. -
JOROZCO proposed a way to convert PyTorch model to
ONNX
format -
How to fix “CUDA error: device-side assert triggered” error? introduced
CUDA_LAUNCH_BLOCKING=1
to disable asynchronous kernel launhches.
PyTorch commits Highlight
- PyTorch main develop branch changed from
master
tomain
- CUDA 12.1 build is enabled again on windows
- Plenty of
Dynamo
,Triton
bug fixes and improvement, such as add support for serializing real tensor data in after AOT minifier, Basic dynamo support for traceable collectives, Introduce FXGraphExtractor into torch.onnx.dynamo_export -
Dan Dale fix a CPU offload performance issue for
ShardedGradScaler
. The performance analyze of the work is amazing. - Related changes to remove CUDA 11.6 support
- Improve the debug method for after AOT accuracy debugging
- Improve New Architecture support: Making FSDP device-agnositc for custom-backend which implement cuda-semantics, New hook for MTIA architecture
- Optimized EMA implementation
- Update Cutlass to v3.1
Other project or company weekly updates highlight
- Modular AI annouced its two init products. The first one is the fastest unified AI inference engine in the world. The second one is a new programming language for all AI developers
Mojo