I’m trying to reproduce the codet5 fine-tuning results (GitHub - salesforce/CodeT5: Code for CodeT5: a new code-aware pre-trained encoder-decoder model.)
The script being used is:
python3 /home/ubuntu/CodeT5/run_gen.py \
--task summarize \
--sub_task python \
--summary_dir /home/ubuntu/CodeT5/summary \
--cache_path /home/ubuntu/CodeT5/cache \
--data_dir /home/ubuntu/CodeT5/data \
--res_dir /home/ubuntu/CodeT5/res \
--output_dir /home/ubuntu/CodeT5/output \
--save_last_checkpoints \
--always_save_model \
--do_eval_bleu \
--model_name_or_path='Salesforce/codet5-base-multi-sum' \
--tokenizer_name='Salesforce/codet5-base-multi-sum' \
--train_filename /home/ubuntu/CodeT5/data/summarize/python/train.jsonl \
--dev_filename /home/ubuntu/CodeT5/data/summarize/python/valid.jsonl \
--test_filename /home/ubuntu/CodeT5/data/summarize/python/test.jsonl \
--do_train \
--do_eval \
--do_test \
--save_steps=500 \
--log_steps=100 \
--local_rank=-1
Running it leads to the following error:
Traceback (most recent call last):
File "/home/ubuntu/CodeT5/run_gen.py", line 387, in <module>
main()
File "/home/ubuntu/CodeT5/run_gen.py", line 234, in main
outputs = model(input_ids=source_ids, attention_mask=source_mask,
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1561, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 998, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 639, in forward
self_attention_outputs = self.layer[0](
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 546, in forward
attention_output = self.SelfAttention(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 483, in forward
scores = torch.matmul(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
My guess is something fishy is going on with the source_ids
, but I haven’t been able to figure it out.
UPD: a simple test shows that source_ids
has shape torch.Size([8, 64])
, while target_ids
have torch.Size([8, 64])
.