Scaled_dot_product_attention higher head num cost much more memory
|
|
1
|
16
|
November 28, 2024
|
CUDA memory allocation for result tensor
|
|
0
|
40
|
November 26, 2024
|
Compile and vmap in custom op with quantile
|
|
0
|
25
|
November 25, 2024
|
Compiling vmapped custom op
|
|
5
|
109
|
November 25, 2024
|
Closures are being gc'd and causing failures to compile
|
|
1
|
40
|
November 24, 2024
|
Why does the inductor reduction Triton Codegen use the Welford algorithm instead of the Naive?
|
|
1
|
28
|
November 20, 2024
|
Image_process.postprocess slow after torch.compile
|
|
0
|
54
|
November 18, 2024
|
Compiling a method other than forward
|
|
2
|
41
|
November 19, 2024
|
Error module torchvision in CUDA 11.4
|
|
2
|
78
|
November 19, 2024
|
Increased memory footprint with custom kernel and all reduce
|
|
2
|
44
|
November 18, 2024
|
The forward graphs captured by torch.export and aot_export_module are different
|
|
2
|
56
|
November 17, 2024
|
Dynamic slicing torch.export
|
|
2
|
110
|
November 16, 2024
|
Discrepancies Between Compiled and Non-Compiled Models with Convolutional Layers in PyTorch
|
|
1
|
49
|
November 16, 2024
|
Any chance to preserve some ops while decomposing PT2E model?
|
|
1
|
24
|
November 16, 2024
|
Multiple compiled versions of the same model
|
|
2
|
67
|
November 16, 2024
|
Torch.compile - what is the best scope of compilation?
|
|
7
|
2462
|
November 16, 2024
|
Dynamo Trace with Parameter Lifting
|
|
1
|
65
|
November 16, 2024
|
AOTInductor autograd support
|
|
1
|
40
|
November 16, 2024
|
Is it possible to ignore part of the code for torch compile
|
|
4
|
92
|
November 16, 2024
|
Inconsistent Results with torch.compile on Identical Environments and GPUs
|
|
3
|
87
|
November 8, 2024
|
Torch compile with forward-mode automatic differentiation
|
|
2
|
27
|
November 6, 2024
|
Profiling torch.compile CUDA code
|
|
3
|
107
|
November 5, 2024
|
Handling LSTM states as model's inputs/outputs using fx.symbolic_trace
|
|
0
|
14
|
November 5, 2024
|
How to limit torch.compile to CPU only?
|
|
3
|
171
|
November 1, 2024
|
Does max-autotune is useful for A5000?
|
|
0
|
19
|
October 31, 2024
|
What caused matrix inputs to no longer be transposed in PyTorch 2.5?
|
|
0
|
36
|
October 29, 2024
|
Torch.fx.symbolic_trace with multiple GPUs
|
|
1
|
50
|
October 29, 2024
|
CUDA 12.6 and torch.cuda.is_available return false
|
|
7
|
1661
|
October 28, 2024
|
Slow convolutions in triton
|
|
1
|
411
|
October 25, 2024
|
Flex attention - gaps in profiler
|
|
0
|
102
|
October 22, 2024
|