import torch
from torch import nn
from torch.optim import adam
from tensorboardX import SummaryWriter
device = "cuda" if torch.cuda.is_available() else "cpu"
net = Model()
net.to(device)
loss_fn = nn.BCELoss() # MSELoss()
optimizer = adam.Adam(params=net.parameters(), lr=0.0001, weight_decay=0.5)
writer = SummaryWriter("logs")
for epoch in range(50):
for i, x_batch, y_batch in enumerate(train_loader):
y_pred = net(x_batch.to(device))
loss = loss_fn(y_pred, x_batch)
writer.add_scalar("loss/train", loss, global_step=epoch * len(train_loader) + i)
writer.flush()
optimizer.zero_grad()
loss.backward()
optimizer.step()
writer.close()
I want to see the loss change during the training rather than after the training.
Firstly I trained the model in Titan-xp. When I check the event file during training, the size is 0. In the tensorboard webpage, I got ‘No scalar data was found’. When I terminate the training, the event becomes 100K and I can see the loss in the tensorboard webpage.
Then I trainined the model in cpu, and everything works fine. Can anyone tell me what the problem it is?
Hello,
I have met such problem before. In my case, writer.flush() solved the problem but sometimes, eventhough several add_scalar have been done, it’s not possible to visualize the result.
Could the problem be in the loss argument in add_scalar since it should be a torch.tensor?
Maybe using loss.item() solves your issue ?
well, I think we cannot do much about it for now to make this work. It seems that there are a lot of people with the same problems online. See: https://github.com/lanpa/tensorboardX/pull/451
Thanks for your patient reply. I changed from tensorboardX import SummaryWriter to from torch.utils.tensorboard import SummaryWriter and it works. That’s so wired.