Latest nlp topics - PyTorch Forums

Topic	Replies	Views	Activity
About the nlp category	2	3235	November 30, 2022
Multihead Attention in_proj is initialized inconsistently	3	35	October 29, 2025
How to Implement Flash Attention in a Pre-Trained BERT Model on custom dataset?	1	203	October 29, 2025
Utilizing GPUs for a machine-translation model	0	9	October 14, 2025
RuntimeError: The size of tensor a (2) must match the size of tensor b (0) at non-singleton dimension 1	0	21	October 6, 2025
RNN isn't learning, unsure what I'm doing wrong	15	186	August 21, 2025
A question for batch-training RNN	3	40	August 19, 2025
What are some common reasons that loss may increase towards the end of an epoch?	2	39	August 18, 2025
nn.Transformer explaination	39	19053	August 18, 2025
Tensorflow-esque bucket by sequence length	28	16803	August 6, 2025
Left padded transformer input with causal mask	1	725	July 21, 2025
Freezing training without no reason	5	1845	July 21, 2025
Obtaining outputs and attention weights from intermediate Transformer layers	8	7301	July 17, 2025
How can I successfully fine-tune a pruned LLM?	6	143	July 9, 2025
Cannot reproduce BERT training results despite following all reproducibility guideness	3	786	July 16, 2025
Very slow training with nn.Embedding	2	71	July 8, 2025
Multi-Head Self Attention in Transformer is permutation-invariant or equivariant how to see it in practice?	4	96	July 3, 2025
Inconsistent Output for Identical Inputs When Using Linear Projection with Different squence length	4	79	June 16, 2025
Using a bidirectional nn.GRU Gated Recurrent Unit understand forwarding process	0	18	June 12, 2025
Gemma 3 throws RuntimeError CUDA misaligned address	1	116	June 3, 2025
A Simple LSTM stuck into label flipping	4	72	May 15, 2025
Flex Attention for full score_mod matrix	0	95	April 28, 2025
Right vs Left Padding	7	10936	April 17, 2025
What's a good replacement for torchtext?	2	995	April 2, 2025
(When using multiple GPUs) RuntimeError: NCCL Error 1: unhandled cuda error (run with NCCL_DEBUG=INFO for details)	6	2404	March 19, 2025
Transformer Stuck in Local Minima Occasionally	0	63	March 18, 2025
Initial D_KL loss is high and going down really slow	0	33	March 15, 2025
FSDP OOM when forwarding 7B model on 16k context length text	0	48	March 14, 2025
Full finetune, LoRA and feature extraction take the same amount of memory and time to train	0	39	March 14, 2025
Pytorch OCR models for deploying to ESP32?	0	127	March 12, 2025