|
FSDP OOM when forwarding 7B model on 16k context length text
|
|
0
|
63
|
March 14, 2025
|
|
Full finetune, LoRA and feature extraction take the same amount of memory and time to train
|
|
0
|
65
|
March 14, 2025
|
|
Pytorch OCR models for deploying to ESP32?
|
|
0
|
185
|
March 12, 2025
|
|
How to train two independent networks
|
|
2
|
140
|
March 4, 2025
|
|
How to implement skip-gram or CBOW in pytorch
|
|
10
|
12159
|
March 4, 2025
|
|
How to handle last batch in LSTM hidden state
|
|
8
|
6443
|
February 22, 2025
|
|
Slow attention when using kvCache
|
|
1
|
109
|
February 21, 2025
|
|
Why facing "CUDA error: device-side assert triggered" while training LSTM model?
|
|
5
|
110
|
February 14, 2025
|
|
LSTM for classification (fraud detection) over several lines of text
|
|
0
|
165
|
February 7, 2025
|
|
Importing torchtext
|
|
1
|
440
|
February 3, 2025
|
|
Left / right side padding
|
|
0
|
59
|
February 1, 2025
|
|
Feed a model with cumulative sum of sampled classified sequences
|
|
0
|
52
|
January 30, 2025
|
|
TransformerDecoder masks shape error using model.eval()
|
|
3
|
330
|
January 27, 2025
|
|
What is the right way to structure `input` and `label` while fine-tuning decoder only model
|
|
0
|
50
|
January 27, 2025
|
|
combining TEXT.build_vocab with BERT Embedding
|
|
0
|
98
|
January 27, 2025
|
|
Multi-node, Multi-gpu training
|
|
0
|
128
|
January 24, 2025
|
|
Why my Traing accuracy remains constant
|
|
2
|
210
|
January 20, 2025
|
|
My Accuracy remains constant
|
|
1
|
115
|
January 18, 2025
|
|
Getting NaN training and validation loss when training BERT model on pytorch
|
|
2
|
240
|
January 17, 2025
|
|
How to properly apply causal mask for next char prediction in MLP
|
|
1
|
547
|
January 10, 2025
|
|
Documents as parametric memory
|
|
0
|
124
|
January 11, 2025
|
|
Need help with Recurrent lstms
|
|
0
|
54
|
January 10, 2025
|
|
Embedding a float into a vector for transformer models
|
|
1
|
213
|
January 7, 2025
|
|
Building a Model for Multi-Output Embedding Generation: Seeking Advice and Insights
|
|
0
|
89
|
January 4, 2025
|
|
Is the code correct for character level generation in lstm?
|
|
12
|
1604
|
December 27, 2024
|
|
Correct way to batch custom masks in SDPA
|
|
0
|
89
|
December 12, 2024
|
|
Weight Decay for tied weights (embedding and linear layers)
|
|
1
|
1369
|
December 10, 2024
|
|
RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect
|
|
10
|
116006
|
December 7, 2024
|
|
Model performance decrease to nearly 1/4 when loading a checkpoint, but works fine for "simpler" data and in-script
|
|
5
|
1886
|
December 6, 2024
|
|
Help Needed: Transformer Model Repeating Last Token During Inference
|
|
3
|
867
|
December 5, 2024
|