Fine tuning on a pre-trained ViT as a backbone

I am fine tuning “vit_small_patch16_224” for a downstream task (learning rate is 0.001 and has a step scheduler of 0.1) dose anyone know the reason for loss curve to look like this?