I am fine-tuning a pretrained ViT on CIFAR100 (resizing to 224), the training starts out well with decreasing loss and decent accuracy. But then suddenly the loss goes to NaN with the accuracy equaling random guess.

The learning rate I used was 0.0001 and Adam optimizer. With a learning rate of 0.001 a similar issue occurs, but a few epochs earlier.

This is my code:

```
model = torchvision.models.vit_b_16(weights='IMAGENET1K_V1')
model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
for epoch in range(total_epochs):
model.train()
train_loss = 0
correct = 0
for i, (x, y) in enumerate(trainloader):
optimizer.zero_grad()
x = x.to(device)
y = y.to(device)
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
train_loss += loss.item()
_, predicted_train = outputs.max(1)
correct += predicted_train.eq(y).sum().item()
```

When learning rate is 0.0001, NaN values occur after 25~ epochs.

When learning rate is 0.001, NaN values occur after 5~ epochs.