Model seems to converge fast but catches up quickly

Hi, I’d like to ask a quick question. Suppose there is a baseline that uses a pre-trained model from B task to do task A. If I attach a pre-trained module in a C task to the B task module and finetune it, the validation acc will initially be higher than baseline, but it will catch up quickly after a few epochs and performance will not improve.
What is the reason it works well at first?
Is it hard to expect a good final result if I combine the pre-trained modules in two different tasks then finetune it for the other task?