Thank you for you advice!
In addition,I found that by resizing image size and then use the pretrained resnet18 it can achieve about 94% accuracy in cifar10. But using a adaptive pooling layer the best performance is only 84%. So why they are so different?