I am training BERT model for sentiment analysis, with train data size 80k, but getting out of memory error for batch size 128,256 and above.
Here is the stack trace,
/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions)
337 encoder_hidden_states,
338 encoder_attention_mask,
–> 339 output_attentions,
340 )
341 attention_output = self.output(self_outputs[0], hidden_states)
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
–> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions)
257 # Take the dot product between “query” and “key” to get the raw attention scores.
258 attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
–> 259 attention_scores = attention_scores / math.sqrt(self.attention_head_size)
260 if attention_mask is not None:
261 # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
RuntimeError: CUDA out of memory. Tried to allocate 844.00 MiB (GPU 0; 15.90 GiB total capacity; 14.36 GiB already allocated; 377.88 MiB free; 14.63 GiB reserved in total by PyTorch)
Can someone please suggest on how to resolve this.
I am using Colab GPU, is there any limit on size of training data for GPU with 15gb RAM?
Thanks