ValueError: Expected input batch_size (8) to match target batch_size (280)
|
|
1
|
221
|
October 22, 2023
|
Why is memory bandwidth peaked at 250 GB / sec where as Nvidia A100 has peak of 1.9 TB/sec
|
|
0
|
227
|
October 21, 2023
|
Why is memory bandwidth peaked at 250 GB / sec where as Nvidia A100 has peak of 1.9 TB/sec
|
|
0
|
223
|
October 21, 2023
|
LLAMA : Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
|
|
0
|
459
|
October 19, 2023
|
I am seeing an error mesage in my nn.Transformer model?
|
|
2
|
419
|
October 17, 2023
|
Transformers(hugging face) module calls layer norm on embedding multiple times
|
|
0
|
266
|
October 12, 2023
|
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when using roberta
|
|
4
|
660
|
October 11, 2023
|
Model using too much memory when initialising
|
|
13
|
410
|
September 30, 2023
|
Is there a way to implement RoPE around `nn.MultiheadAttention` somehow?
|
|
6
|
3797
|
September 28, 2023
|
Issues with Bart model finetuning
|
|
0
|
243
|
September 26, 2023
|
Question about Temporally Fully Connected LSTM in PyTorch
|
|
0
|
221
|
September 23, 2023
|
Broadcast a tensor
|
|
2
|
357
|
September 22, 2023
|
ValueError: Cannot find backend for cpu in flash_attn/ops/triton/rotary.py
|
|
1
|
734
|
September 21, 2023
|
Doubt regarding Implementation of Hierarchical Attention Network
|
|
0
|
220
|
September 18, 2023
|
Bidirectional lstm batched padded sequence
|
|
3
|
1157
|
September 15, 2023
|
Coupling Forget Gate and Input Gate of LSTM
|
|
3
|
966
|
September 13, 2023
|
PyTorch Chatbot Tutorial: Loss not decreasing as expected in MPS device (Apple Silicon)
|
|
0
|
298
|
September 12, 2023
|
CrossEntropyLoss getting value > 1
|
|
8
|
650
|
September 12, 2023
|
How to know OOV of nlp models with subword tokenization
|
|
2
|
487
|
September 12, 2023
|
Forward_pre Hook in Quantized Model
|
|
0
|
223
|
September 11, 2023
|
How to code out a LSTM text gnerator
|
|
0
|
225
|
September 10, 2023
|
RuntimeError: Expected target size [2, 30000], got [2]
|
|
7
|
725
|
September 11, 2023
|
How to convert Tensorflow Multi-head attention to Pytorch equivalent?
|
|
0
|
429
|
September 10, 2023
|
Understand Memory Usage of Pytorch Tensors for Inference
|
|
0
|
370
|
September 8, 2023
|
Contrastive sentence embeddings loss implementation
|
|
0
|
276
|
September 8, 2023
|
Can NLP augmentation affect the performance of certain categories of data?
|
|
3
|
375
|
September 8, 2023
|
Quantized weights of transformer
|
|
0
|
269
|
September 7, 2023
|
TypeError: expected Tensor as element 0 in argument 0, but got str
|
|
0
|
443
|
September 5, 2023
|
Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None
|
|
5
|
1577
|
September 4, 2023
|
Model overfits and does not improve on validation accuracy
|
|
1
|
268
|
September 4, 2023
|