About the nlp category
|
|
2
|
3193
|
November 30, 2022
|
How can I successfully fine-tune a pruned LLM?
|
|
6
|
23
|
July 9, 2025
|
Very slow training with nn.Embedding
|
|
2
|
25
|
July 8, 2025
|
Multi-Head Self Attention in Transformer is permutation-invariant or equivariant how to see it in practice?
|
|
4
|
49
|
July 3, 2025
|
Inconsistent Output for Identical Inputs When Using Linear Projection with Different squence length
|
|
4
|
63
|
June 16, 2025
|
Using a bidirectional nn.GRU Gated Recurrent Unit understand forwarding process
|
|
0
|
11
|
June 12, 2025
|
Gemma 3 throws RuntimeError CUDA misaligned address
|
|
1
|
56
|
June 3, 2025
|
A Simple LSTM stuck into label flipping
|
|
4
|
45
|
May 15, 2025
|
Flex Attention for full score_mod matrix
|
|
0
|
68
|
April 28, 2025
|
Right vs Left Padding
|
|
7
|
10153
|
April 17, 2025
|
What's a good replacement for torchtext?
|
|
2
|
685
|
April 2, 2025
|
(When using multiple GPUs) RuntimeError: NCCL Error 1: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
|
|
6
|
2131
|
March 19, 2025
|
Transformer Stuck in Local Minima Occasionally
|
|
0
|
51
|
March 18, 2025
|
Initial D_KL loss is high and going down really slow
|
|
0
|
24
|
March 15, 2025
|
FSDP OOM when forwarding 7B model on 16k context length text
|
|
0
|
29
|
March 14, 2025
|
Full finetune, LoRA and feature extraction take the same amount of memory and time to train
|
|
0
|
28
|
March 14, 2025
|
Pytorch OCR models for deploying to ESP32?
|
|
0
|
102
|
March 12, 2025
|
How to train two independent networks
|
|
2
|
67
|
March 4, 2025
|
How to implement skip-gram or CBOW in pytorch
|
|
10
|
11837
|
March 4, 2025
|
How to handle last batch in LSTM hidden state
|
|
8
|
6304
|
February 22, 2025
|
Slow attention when using kvCache
|
|
1
|
63
|
February 21, 2025
|
Why facing "CUDA error: device-side assert triggered" while training LSTM model?
|
|
5
|
43
|
February 14, 2025
|
LSTM for classification (fraud detection) over several lines of text
|
|
0
|
131
|
February 7, 2025
|
Importing torchtext
|
|
1
|
302
|
February 3, 2025
|
Left / right side padding
|
|
0
|
26
|
February 1, 2025
|
Feed a model with cumulative sum of sampled classified sequences
|
|
0
|
26
|
January 30, 2025
|
TransformerDecoder masks shape error using model.eval()
|
|
3
|
220
|
January 27, 2025
|
What is the right way to structure `input` and `label` while fine-tuning decoder only model
|
|
0
|
20
|
January 27, 2025
|
combining TEXT.build_vocab with BERT Embedding
|
|
0
|
60
|
January 27, 2025
|
Multi-node, Multi-gpu training
|
|
0
|
92
|
January 24, 2025
|