|
CUDA Graphs with FSDP 2 and flex attention?
|
|
0
|
162
|
October 3, 2025
|
|
Module 'torch.library' has no attribute 'custom_op'
|
|
6
|
2470
|
September 24, 2025
|
|
How many patterns does torch compile match and replace for now?
|
|
0
|
129
|
September 24, 2025
|
|
PyTorch Error: Compiler: cl is not found
|
|
4
|
4047
|
September 23, 2025
|
|
Torch.compile + flex_attention with custom score_mod fails (FlexibleLayout / NoValidChoicesError)
|
|
0
|
215
|
September 19, 2025
|
|
Are there any related works on "megakernel"?
|
|
0
|
110
|
September 17, 2025
|
|
In Dynamo+AOTAutograd, why run Faketensor through the code multiple times?
|
|
0
|
80
|
September 15, 2025
|
|
Better understanding why AOTAutograd decomposes `fused_rms_norm_backward` for CUDA, but not for Meta tensors
|
|
0
|
71
|
September 15, 2025
|
|
Compile: `DataLoader.__setattr__` should not be traced
|
|
2
|
94
|
September 13, 2025
|
|
Torch.compile error on backward
|
|
0
|
68
|
September 12, 2025
|
|
Do I need to rerun torch.compile between training and inference?
|
|
1
|
119
|
September 9, 2025
|
|
Autofunctionalize v2 / Reinplace
|
|
0
|
92
|
September 9, 2025
|
|
Torch.compile does not work after first graph break
|
|
0
|
65
|
September 5, 2025
|
|
How to understand the files generated by TORCH_COMPILE_DEBUG
|
|
0
|
80
|
September 4, 2025
|
|
Dynamo logs generated by torch.distributed
|
|
5
|
169
|
September 3, 2025
|
|
Reducing compiled call overhead
|
|
0
|
137
|
August 29, 2025
|
|
How can we reliably figured out what the parameters of a generated PTX kernel are with torch.compile()?
|
|
0
|
61
|
August 25, 2025
|
|
C++/Cuda / aot.compile and cuda graph
|
|
2
|
628
|
August 23, 2025
|
|
Training with flex attention is extremely slow due to torch.compile settings
|
|
0
|
501
|
August 22, 2025
|
|
Is it possible that node's name can change during torch.compile?
|
|
0
|
53
|
August 22, 2025
|
|
Torch.compile Numpy code throws mean() arguments error
|
|
0
|
145
|
August 21, 2025
|
|
Using RNG Generator in torch.compile
|
|
2
|
311
|
August 20, 2025
|
|
How to turn off inlining / force materialization in TorchInductor during torch.compile?
|
|
0
|
49
|
August 16, 2025
|
|
Some questions about torch.compile
|
|
0
|
68
|
August 15, 2025
|
|
`torch.compile` (w/ Torch Inductor) benchmarks/models for Multi GPU
|
|
0
|
154
|
August 7, 2025
|
|
CUDA Graph Error with Residual Connections in `torch.compile` (RuntimeError: accessing tensor output of CUDAGraphs)
|
|
0
|
232
|
August 1, 2025
|
|
The CUDA kernel produces different results when running in CUDA Graph mode compared to non-CUDA Graph mode
|
|
0
|
71
|
July 30, 2025
|
|
Skip dynamo when using 'cudagraph' backend
|
|
0
|
58
|
July 28, 2025
|
|
Getting the fx graph of submodules, instead of 'call_module' nodes?
|
|
1
|
819
|
July 24, 2025
|
|
Using megacache vs saving /tmp/torchinductor_root
|
|
0
|
253
|
July 21, 2025
|