About the nlp category
|
|
2
|
3196
|
November 30, 2022
|
Left padded transformer input with causal mask
|
|
1
|
670
|
July 21, 2025
|
Freezing training without no reason
|
|
5
|
1745
|
July 21, 2025
|
Tensorflow-esque bucket by sequence length
|
|
27
|
16578
|
July 21, 2025
|
Obtaining outputs and attention weights from intermediate Transformer layers
|
|
8
|
7123
|
July 17, 2025
|
How can I successfully fine-tune a pruned LLM?
|
|
6
|
74
|
July 9, 2025
|
Cannot reproduce BERT training results despite following all reproducibility guideness
|
|
3
|
768
|
July 16, 2025
|
Very slow training with nn.Embedding
|
|
2
|
46
|
July 8, 2025
|
Multi-Head Self Attention in Transformer is permutation-invariant or equivariant how to see it in practice?
|
|
4
|
60
|
July 3, 2025
|
Inconsistent Output for Identical Inputs When Using Linear Projection with Different squence length
|
|
4
|
68
|
June 16, 2025
|
Using a bidirectional nn.GRU Gated Recurrent Unit understand forwarding process
|
|
0
|
11
|
June 12, 2025
|
Gemma 3 throws RuntimeError CUDA misaligned address
|
|
1
|
75
|
June 3, 2025
|
A Simple LSTM stuck into label flipping
|
|
4
|
47
|
May 15, 2025
|
Flex Attention for full score_mod matrix
|
|
0
|
78
|
April 28, 2025
|
Right vs Left Padding
|
|
7
|
10330
|
April 17, 2025
|
What's a good replacement for torchtext?
|
|
2
|
755
|
April 2, 2025
|
(When using multiple GPUs) RuntimeError: NCCL Error 1: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
|
|
6
|
2207
|
March 19, 2025
|
Transformer Stuck in Local Minima Occasionally
|
|
0
|
55
|
March 18, 2025
|
Initial D_KL loss is high and going down really slow
|
|
0
|
25
|
March 15, 2025
|
FSDP OOM when forwarding 7B model on 16k context length text
|
|
0
|
32
|
March 14, 2025
|
Full finetune, LoRA and feature extraction take the same amount of memory and time to train
|
|
0
|
32
|
March 14, 2025
|
Pytorch OCR models for deploying to ESP32?
|
|
0
|
108
|
March 12, 2025
|
How to train two independent networks
|
|
2
|
71
|
March 4, 2025
|
How to implement skip-gram or CBOW in pytorch
|
|
10
|
11860
|
March 4, 2025
|
How to handle last batch in LSTM hidden state
|
|
8
|
6307
|
February 22, 2025
|
Slow attention when using kvCache
|
|
1
|
66
|
February 21, 2025
|
Why facing "CUDA error: device-side assert triggered" while training LSTM model?
|
|
5
|
45
|
February 14, 2025
|
LSTM for classification (fraud detection) over several lines of text
|
|
0
|
134
|
February 7, 2025
|
Importing torchtext
|
|
1
|
327
|
February 3, 2025
|
Left / right side padding
|
|
0
|
26
|
February 1, 2025
|