ConvNext training in Torch 20x slower than TF

Colab link

I am trying to build ConvNeXt Tiny models to fine-tune them on CIFAR-100 in both PyTorch and TensorFlow, but I am unable to resolve this problem. PyTorch’s implementation either crashes when I set too big batch size or is a lot slower, and by a lot I mean 10-20 times slower. Please help me find out what might be causing this problem, as I am out of ideas. I’ve tested it on multiple versions and platforms. Nothing helps.