About the nlp category
|
|
2
|
3042
|
November 30, 2022
|
Correct way to batch custom masks in SDPA
|
|
0
|
2
|
December 12, 2024
|
Weight Decay for tied weights (embedding and linear layers)
|
|
1
|
824
|
December 10, 2024
|
RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect
|
|
10
|
104015
|
December 7, 2024
|
Model performance decrease to nearly 1/4 when loading a checkpoint, but works fine for "simpler" data and in-script
|
|
5
|
1531
|
December 6, 2024
|
Embedding a float into a vector for transformer models
|
|
0
|
16
|
December 5, 2024
|
Help Needed: Transformer Model Repeating Last Token During Inference
|
|
3
|
103
|
December 5, 2024
|
Understanding logits in GPT2
|
|
0
|
9
|
December 5, 2024
|
Flex_attention returning logits
|
|
0
|
5
|
December 4, 2024
|
Unable to import torchtext (from torchtext.datasets import IMDB from torchtext.vocab import vocab)
|
|
4
|
891
|
December 1, 2024
|
How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?
|
|
0
|
40
|
November 29, 2024
|
How to compute the Validation loss
|
|
2
|
16
|
November 24, 2024
|
Computation of nn.Linear and nn.Embedding
|
|
1
|
21
|
November 22, 2024
|
Log softmax probabilities all equal in rnn decoder because pointer network scores are all < -90.0
|
|
0
|
15
|
November 19, 2024
|
How to correct TypeError: zip argument #1 must support iteration training in multiple GPU
|
|
6
|
1099
|
November 13, 2024
|
Training starting again in sampling code
|
|
3
|
18
|
November 9, 2024
|
AutoModelForCausalLM dataset process
|
|
1
|
31
|
November 9, 2024
|
Can someone explain the benefits of Batches?
|
|
2
|
46
|
November 8, 2024
|
Teacher forcing ratio
|
|
0
|
27
|
November 8, 2024
|
Search in documents
|
|
2
|
22
|
November 7, 2024
|
Torch using two GPUs with NV link
|
|
8
|
86
|
November 5, 2024
|
Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None
|
|
6
|
1925
|
October 29, 2024
|
Regarding Scaled Dot Product Attention
|
|
4
|
95
|
October 25, 2024
|
Memory Leak with a simple code
|
|
3
|
42
|
October 22, 2024
|
Build Auto Tagging System
|
|
7
|
40
|
October 22, 2024
|
Why transformer model is predicting only one random word repetatively in every iteration
|
|
1
|
30
|
October 19, 2024
|
LogSoftmax vs Softmax
|
|
26
|
54103
|
October 15, 2024
|
Why transformer model is behaving like this?
|
|
1
|
27
|
October 14, 2024
|
Variable length time series data
|
|
1
|
12
|
October 12, 2024
|
I want to eliminate the accumulation of memory usage during the learning loop
|
|
0
|
17
|
October 7, 2024
|