Is this issure the problem of overfitting?

I am doing the classification of 13 different characters, like english ,chinese and so on. And the training set size is about 8000 pictures, dev set size is 1300 pictures, and test set size is 6500(every class size is 500)。

I used the transfer learning for my program. I used vgg16, vgg16_bn, resnet_50, lstm, vgg16_bn.features+SPP+vgg16_bn.classification。 All of them got a high accuracy in training set, like over 96%, though the accuracy in dev set is 60%~80%, in test set is less than 45%。

So I want to ask whether this is overfitting or else? And what should I do to improve the accuracy?

What’s more, I am going to do data augmentation, is it necessary? If it is, what kind of augmention should I do? THX so much.