Right way to insert QuantStub and DeQuantStub in eager mode quantization
|
|
6
|
75
|
April 12, 2025
|
Making autograd saved tensors hooks specific to certain arguments
|
|
7
|
107
|
April 14, 2025
|
Torch.cuda.max_memory_allocated() raise torch._dynamo.exc.BackendCompilerFailed
|
|
1
|
205
|
October 21, 2024
|
TorchRL cpu-only installation
|
|
4
|
270
|
February 28, 2025
|
Dino training- input image
|
|
4
|
296
|
December 9, 2024
|
Compiled matmul is slower than vanilla matmul
|
|
1
|
87
|
April 24, 2025
|
Errors when running Llama3 8b on executorch QNN backend
|
|
3
|
315
|
December 2, 2024
|
RuntimeError: Unexpected error from cudaGetDeviceCount(); Error 802: system not yet initialized
|
|
1
|
146
|
April 22, 2025
|
Run inference manually on a pytorch model layer by layer
|
|
1
|
124
|
September 12, 2024
|
Custom pruning method for Global Unstructured Pruning
|
|
2
|
124
|
December 6, 2024
|
Initializing tensor inside custom loss fn causes cuda memory err
|
|
5
|
122
|
March 12, 2025
|
Torch.compile execution slower (or at par) with eager execution
|
|
0
|
178
|
October 20, 2024
|
Torch.compile: Generated Triton kernel seems wrong
|
|
4
|
148
|
February 4, 2025
|
Torch not enabled with CUDA
|
|
6
|
67
|
June 6, 2025
|
Cuda is not working
|
|
6
|
259
|
September 20, 2024
|
Model performs well on both validation and testing dataset, when saved and load, performs poor on the same dataset
|
|
6
|
70
|
August 4, 2024
|
How to remove backpropagation for specific tokens from the output of a transformer decoder only?
|
|
7
|
86
|
January 6, 2025
|
Slow convolutions on CPU with autocast
|
|
2
|
232
|
December 14, 2024
|
Issues with custom torch.autograd.Function and custom jvp method
|
|
2
|
109
|
August 13, 2024
|
How to save torch.compile so we don't need to re-compile
|
|
1
|
127
|
January 26, 2025
|
Question about quantized model save & load
|
|
5
|
120
|
April 18, 2025
|
Global structure pruning in pytorch
|
|
5
|
269
|
September 30, 2024
|
Conflict between `dataclass` and `nn.Module`
|
|
4
|
178
|
December 21, 2024
|
Optimize training data after training stage
|
|
4
|
275
|
October 14, 2024
|
Compiling stack of Conv3d increases runtime with redundant format conversions
|
|
4
|
100
|
August 7, 2024
|
Summation of a tensor is giving -inf as output even though no element has -inf value in it
|
|
3
|
89
|
September 12, 2024
|
PyTorch compatible GPU rental?
|
|
2
|
125
|
March 26, 2025
|
How should we use Single GPU for validation while doing multigpu training using DDP
|
|
2
|
98
|
January 10, 2025
|
DDP training get slower than first few iteration
|
|
2
|
231
|
November 1, 2024
|
Computing the trace of the jacobian of the score function
|
|
2
|
114
|
September 13, 2024
|
Inconsistencies between PyTorch and NumPy when performing 32-bit floating-point sums
|
|
4
|
76
|
November 13, 2024
|
Torch version + cuda stable compatibility details:
|
|
2
|
92
|
February 25, 2025
|
Segmentation fault (core dumped) in torch 2.1.0
|
|
3
|
101
|
July 1, 2025
|
Need help setting up mask rcnn
|
|
8
|
90
|
April 28, 2025
|
QAT model is not performing as expected when compared to the original model
|
|
7
|
81
|
April 9, 2025
|
Reduce time to first kernel when using CUDA graphs
|
|
1
|
196
|
December 19, 2024
|
What is accumulating in /tmp/torchinductor_{USER}/triton?
|
|
1
|
168
|
October 7, 2024
|
How to resume training when using DataLoader persistent_workers
|
|
1
|
236
|
August 26, 2024
|
Why is a tensor non-contiguous on one machine, but contiguous on another?
|
|
2
|
94
|
August 1, 2024
|
Can someone help me understand how to use a .pt file?
|
|
2
|
100
|
July 16, 2024
|
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation when using ReLU(inplace=False)
|
|
4
|
72
|
November 27, 2024
|
Can't build 2.3.1 from source
|
|
3
|
118
|
April 3, 2025
|
Use Executorch's Module Extension API on arm64 Mac
|
|
3
|
237
|
January 8, 2025
|
How to Implement Flash Attention in a Pre-Trained BERT Model on custom dataset?
|
|
0
|
154
|
January 8, 2025
|
What caused this in-place modification error?
|
|
3
|
97
|
August 4, 2024
|
Stacked images look different converted from Tensor in dataloader
|
|
3
|
232
|
July 12, 2024
|
Partial cuda graphs are slowere than original model?
|
|
5
|
194
|
December 12, 2024
|
Loss.backward() not updating model parameters
|
|
5
|
83
|
November 20, 2024
|
Why kernels different streams can't in parallel
|
|
1
|
112
|
January 20, 2025
|
Error in PyInstaller executable with Pytorch
|
|
0
|
236
|
July 12, 2024
|