I would like to benchmark several PyTorch transformer models on the
AG_NEWS dataset (torchtext.datasets)
- using google colabs pro with 15 GB GPU available memory.
While ‘BERT’,‘DistilBert’,‘Electra’,…,‘XLM-RoBERTa’ and their tokenizers work without a problem, both ‘GPT 2’ and ‘XML’
do not fit in the memory.
Since 15 GB seems to be a lot, I wonder wether
- it is a coding problem on my site or both models are just very large
- there is a light-weight version of GPT and XML