Composite RoPE backward gives a large ToCopyBackward0 in profiling trace
|
|
3
|
79
|
December 27, 2024
|
Extra GPU usage on custom Qwen2-VL
|
|
0
|
230
|
October 28, 2024
|
What's the best way to resize tensors for alignment purpose?
|
|
3
|
75
|
October 23, 2024
|
How to liberate CUDA Memory succesfully?
|
|
1
|
219
|
December 11, 2024
|
Is Intel® Iris Xe Graphics compatible with torch==2.6.0?
|
|
0
|
162
|
February 11, 2025
|
Flex attention benchmarking
|
|
0
|
185
|
November 2, 2024
|
Torch 2.7.0 Segmentation fault on import on linux
|
|
3
|
78
|
June 6, 2025
|
Extremely slow training, high single CPU usage
|
|
1
|
195
|
January 5, 2025
|
Cutlass kernel causes no grad in backward
|
|
1
|
171
|
December 12, 2024
|
Python kernel crashes with GPU
|
|
1
|
179
|
September 28, 2024
|
Why does autograd.backward go one edge further than `inputs`?
|
|
3
|
51
|
July 8, 2025
|
Subprocess groups w/ DeviceMesh Blocking
|
|
2
|
133
|
May 3, 2025
|
Is it possible to keep a chunk of continuous gpu memory (e.g. 20G) in DDP mode for gradient synchronization?
|
|
5
|
63
|
December 3, 2024
|
Efficiently Combine Two Tensors Based on a Boolean Mask
|
|
1
|
107
|
November 18, 2024
|
Inconsistent Output for Identical Inputs When Using Linear Projection with Different squence length
|
|
4
|
75
|
June 16, 2025
|
Autograd about CUDAExtension
|
|
4
|
119
|
May 13, 2025
|
Question about default allowed globals
|
|
0
|
155
|
May 3, 2025
|
Differentiating With Respect to Learning Rate
|
|
3
|
170
|
February 12, 2025
|
DDP with imbalanced loss values
|
|
2
|
112
|
May 17, 2025
|
Checking what autodiff is differentiating
|
|
1
|
147
|
November 28, 2024
|
Forward function not being compiled by default
|
|
1
|
178
|
December 17, 2024
|
How to add custom operators to export_for_training?
|
|
1
|
101
|
December 12, 2024
|
Which activation function for multi-class classification gives true probability?
|
|
1
|
106
|
November 17, 2024
|
Tensor and torch problem
|
|
8
|
46
|
April 18, 2025
|
Model works on CPU but breaks when passed to cuda
|
|
3
|
109
|
March 9, 2025
|
Gradcheck fails for custom activation function
|
|
3
|
97
|
January 26, 2025
|
Contrastive Learning Loss Function
|
|
0
|
208
|
December 13, 2024
|
Are batchnorm buffers handled automatically by DataParallel?
|
|
4
|
169
|
December 9, 2024
|
Accessing state_dicts in c++
|
|
4
|
77
|
November 11, 2024
|
Pytorch Build Error
|
|
2
|
95
|
July 24, 2025
|
Torch.distributed.all_reduce causes memory trashing
|
|
2
|
95
|
January 6, 2025
|
Request for Support in Integrating the RISC-V Vector ISA Extension
|
|
2
|
95
|
October 31, 2024
|
v2.Compose([v2.ToImage(), v2.ToDtype(torch.float32, scale=True)]) return Image instead of Tensor?
|
|
1
|
115
|
March 26, 2025
|
Cpp torch.sparse usage?
|
|
1
|
201
|
December 26, 2024
|
Run backward without accumulating gradients
|
|
1
|
178
|
October 25, 2024
|
Setting max_split_size_mb
|
|
0
|
171
|
January 30, 2025
|
Gemma 3 throws RuntimeError CUDA misaligned address
|
|
1
|
99
|
June 3, 2025
|
Can you do bf16 x bf16 -> FP32 matmul?
|
|
1
|
100
|
February 7, 2025
|
Embedded Python can't import torch in a C++ project
|
|
1
|
107
|
December 1, 2024
|
Understanding "mask" dtype from TransformerEncoder forward
|
|
1
|
206
|
September 18, 2024
|
Model loss not decreasing even after increasing learning rate
|
|
3
|
79
|
July 22, 2025
|
Can't run forward pass of WaveRNN model due to unsuccessful GPU RAM allocation
|
|
3
|
103
|
May 24, 2025
|
Bug on running TorchScript on H100
|
|
3
|
101
|
April 13, 2025
|
Bad reconstruction with LSTM
|
|
3
|
175
|
February 5, 2025
|
Is it possible to speedup the value assignment process in cuda tensor?
|
|
3
|
212
|
December 2, 2024
|
Add custom distributed backend
|
|
2
|
91
|
June 3, 2025
|
LSTM model does not change predictions
|
|
2
|
94
|
January 6, 2025
|
Increase computational cost while repeating use the trained CNN models
|
|
6
|
65
|
April 22, 2025
|
New AI TTS System with Emotion Control
|
|
0
|
207
|
January 16, 2025
|
Debugging slow TorchDynamo Cache Lookup
|
|
1
|
114
|
March 18, 2025
|