Dear programmers,
I am very new to Pytorch and with very poor programming experience. I have built a network and the training process is as follows:
Epoch_num=5
for e in range(Epoch_num):
train_loss = 0
model= model.train()
for idx, data in tqdm (enumerate(train_loader)):
optimizer.zero_grad()
x, y_true = data
if torch.cuda.is_available():
x, y_true = x.cuda(), y_true.cuda()
# forward
out = model(x)
out = F.log_softmax(out, dim=1) # (b, n, h, w)
loss = criterion(out, y_true)
# backward
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_loss += loss.item()
label_pred = out.max(dim=1)[1].data.cpu()
label_true= y_true.unsqueeze(1)
label_true = label_true .data.cpu()
acc= get_accuracy(label_true,label_pred)
print("Epoch {}/{}, Loss: {:.3f}, Accuracy: {:.3f}".format(e+1,Epoch_num, train_loss, acc))
The training process is illustrated as follows:
Epoch 1/5, Loss: 765.190, Accuracy: 0.513
0it [00:00, ?it/s]
1it [00:00, 6.00it/s]
2it [00:00, 6.06it/s]
3it [00:00, 6.12it/s]
4it [00:00, 6.11it/s]
5it [00:00, 6.11it/s]
6it [00:00, 6.16it/s]
7it [00:01, 6.11it/s]
8it [00:01, 6.17it/s]
9it [00:01, 6.14it/s]
10it [00:01, 6.14it/s]
11it [00:01, 6.03it/s]
12it [00:01, 5.90it/s]
13it [00:02, 5.96it/s]
14it [00:02, 6.03it/s]
15it [00:02, 6.13it/s]
16it [00:02, 6.05it/s]
17it [00:02, 6.11it/s]
18it [00:02, 6.17it/s]
19it [00:03, 6.13it/s
51it [00:08, 6.13it/s]
52it [00:08, 6.13it/s]
53it [00:08, 6.09it/s]
54it [00:08, 6.04it/s]
55it [00:09, 6.11it/s]
56it [00:09, 6.10it/s]
57it [00:09, 6.16it/s]
58it [00:09, 5.97it/s]
59it [00:09, 6.02it/s]
60it [00:09, 6.08it/s]
61it [00:09, 6.10it/s]
62it [00:10, 6.06it/s]
63it [00:10, 6.02it/s]
64it [00:10, 6.13it/s]
65it [00:10, 6.11it/s]
66it [00:10, 6.11it/s]
67it [00:10, 6.15it/s]
68it [00:11, 6.14it/s]
69it [00:11, 6.15it/s]
70it [00:11, 6.19it/s]
71it [00:11, 6.11it/s]
72it [00:11, 6.11it/s]
73it [00:11, 6.14it/s]
74it [00:12, 6.14it/s]
75it [00:12, 6.04it/s]
76it [00:12, 6.06it/s]
77it [00:12, 6.09it/s
1101it [03:00, 6.08it/s]
1102it [03:00, 6.12it/s]
1103it [03:00, 6.20it/s]
1104it [03:00, 6.22it/s]
Epoch 2/5, Loss: 765.112, Accuracy: 0.514
1103it [02:54, 6.36it/s]
1104it [02:55, 6.29it/s]
Epoch 3/5, Loss: 764.840, Accuracy: 0.535
1104it [03:00, 6.10it/s]
Epoch 4/5, Loss: 761.322, Accuracy: 0.704
It can be seen that the loss does decrease while the accuracy increases. However, the loss is still very high. Please, could you help me check what is wrong with my implementation? Moreover, how can I modify the codes in such a way that not every mini-batch will be displayed during training?
Thank you very much for your time and guidance