Change ProgressBar into simpler output for ascii file

Jean-Eric_Campagne · April 22, 2024, 11:16am

Hello,
I’ve got a code that uses Ignit although I have never used it before (I have just practiced plain PyTorch).
The code is using


pbar = ProgressBar(ascii=True)
pbar.attach(trainer, metric_names=monitoring_metrics)
   
 @trainer.on(Events.EPOCH_COMPLETED)
    def print_times(engine):
        pbar.log_message(
            f"Epoch {engine.state.epoch} done. Time per batch: {timer.value():.3f}[s]"
        )
        timer.reset()

and I get in my output log ascii file

Epoch [1/100]: [2/312]   1%|          , total_loss=6.12 [00:01<05:13]
Epoch [1/100]: [4/312]   1%|1         , total_loss=6.14 [00:03<05:09]
Epoch [1/100]: [6/312]   2%|1         , total_loss=6.14 [00:05<05:07]
Epoch [1/100]: [8/312]   3%|2         , total_loss=6.13 [00:07<05:05]
Epoch [1/100]: [10/312]   3%|3         , total_loss=6.1 [00:09<05:04]
Epoch [1/100]: [12/312]   4%|3         , total_loss=6.1 [00:11<05:02] 
Epoch [1/100]: [13/312]   4%|4         , total_loss=6.07 [00:13<05:00]
Epoch [1/100]: [15/312]   5%|4         , total_loss=6.04 [00:15<04:58]
Epoch [1/100]: [17/312]   5%|5         , total_loss=6.02 [00:17<04:57]
...
Epoch [1/100]: [310/312]  99%|#########9, total_loss=4 [05:11<00:02]
Epoch [1/100]: [310/312]  99%|#########9, total_loValidation Results - Epoch: 1 total_loss: 3.93
Epoch 1 done. Time per batch: 1.008[s]
Epoc12]   0%|           [00:00<?]                                   
Epoch [2/100]: [2/312]   1%|          , total_loss=3.9 [00:02<05:13]
Epoch [2/100]: [4/312]   1%|1         , total_loss=3.91 [00:04<05:11]
Epoch [2/100]: [6/312]   2%|1         , total_loss=3.94 [00:06<05:09]

So, I wander if I can instead of ProgresBar, aving just at the end of each Epoch (training & valisation phses) le result of the total_loss(es) (and btw the learning rate of the minimizer)

Thanks.

vfdev-5 · April 22, 2024, 11:43am

Hi @Jean-Eric_Campagne ,

It is unclear to me whether you would like totally remove the progress bar or have less lines like:

Epoch [1/100]: [2/312]   1%|          , total_loss=6.12 [00:01<05:13]
Epoch [1/100]: [155/312]   50%|######          , total_loss=6.12 [00:01<05:13]
Epoch [1/100]: [310/312]  99%|#########9, total_loss=4 [05:11<00:02]

or just a single line with a large list of all total_losses, learning rates ?

Btw, you can also ask question and discuss on our discord server: PyTorch-Ignite and have a quicker conversation.

vfdev-5 · April 22, 2024, 11:55am

You can also give a try to this logger (introduced in v0.5.0):

import logging

import torch
import torch.nn as nn
import torch.optim as optim

from ignite.engine import create_supervised_trainer, Events
from ignite.handlers.fbresearch_logger import FBResearchLogger
from ignite.utils import setup_logger


model = nn.Linear(10, 5)
opt = optim.SGD(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

data = [(torch.rand(4, 10), torch.randint(0, 5, size=(4, ))) for _ in range(100)]

trainer = create_supervised_trainer(model, opt, criterion)

logger = setup_logger("trainer", level=logging.INFO)
logger = FBResearchLogger(logger=logger)
logger.attach(trainer, name="Train", every=20, optimizer=opt)

trainer.run(data, max_epochs=4)

Output:



2024-04-22 11:54:03,407 trainer INFO: Train: start epoch [1/4]
2024-04-22 11:54:03,424 trainer INFO: Epoch [1/4]  [20/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0007 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,440 trainer INFO: Epoch [1/4]  [40/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0007 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,455 trainer INFO: Epoch [1/4]  [60/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0007 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,471 trainer INFO: Epoch [1/4]  [80/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0007 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,487 trainer INFO: Epoch [1/4]  [100/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,488 trainer INFO: Train: Epoch [1/4]  Total time: 0:00:00  (0.0008 s / it)
2024-04-22 11:54:03,494 trainer INFO: Train: start epoch [2/4]
2024-04-22 11:54:03,510 trainer INFO: Epoch [2/4]  [19/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0007 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,526 trainer INFO: Epoch [2/4]  [39/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,542 trainer INFO: Epoch [2/4]  [59/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,558 trainer INFO: Epoch [2/4]  [79/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,577 trainer INFO: Epoch [2/4]  [99/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,578 trainer INFO: Train: Epoch [2/4]  Total time: 0:00:00  (0.0008 s / it)
2024-04-22 11:54:03,581 trainer INFO: Train: start epoch [3/4]
2024-04-22 11:54:03,599 trainer INFO: Epoch [3/4]  [18/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,615 trainer INFO: Epoch [3/4]  [38/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,632 trainer INFO: Epoch [3/4]  [58/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,648 trainer INFO: Epoch [3/4]  [78/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,665 trainer INFO: Epoch [3/4]  [98/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,667 trainer INFO: Train: Epoch [3/4]  Total time: 0:00:00  (0.0009 s / it)
2024-04-22 11:54:03,670 trainer INFO: Train: start epoch [4/4]
2024-04-22 11:54:03,688 trainer INFO: Epoch [4/4]  [17/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0007 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,706 trainer INFO: Epoch [4/4]  [37/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,721 trainer INFO: Epoch [4/4]  [57/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,737 trainer INFO: Epoch [4/4]  [77/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,752 trainer INFO: Epoch [4/4]  [97/100]:  ETA: 0:00:00  lr: 0.00100  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 11:54:03,754 trainer INFO: Train: Epoch [4/4]  Total time: 0:00:00  (0.0008 s / it)
2024-04-22 11:54:03,756 trainer INFO: Train: run completed  Total time: 0:00:00

Jean-Eric_Campagne · April 22, 2024, 12:00pm

Nice,

I not sure that I will use Discord social network…
for your first question I would chose “have less lines like”
what would be the net benefit of FBResearchLogger?
JE

vfdev-5 · April 22, 2024, 12:06pm

what would be the net benefit of FBResearchLogger?

I’d say it is more adapted for logging when stdout is captured into a file vs the progressbar. It has also other important info like ETA, lr, time taken for data prep, time taken for training step, and I’ll update the above example to show the total loss as well:

import logging

import torch
import torch.nn as nn
import torch.optim as optim

from ignite.engine import create_supervised_trainer, Events
from ignite.handlers.fbresearch_logger import FBResearchLogger
from ignite.utils import setup_logger


model = nn.Linear(10, 5)
opt = optim.SGD(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

data = [(torch.rand(4, 10), torch.randint(0, 5, size=(4, ))) for _ in range(100)]

trainer = create_supervised_trainer(
    model, opt, criterion, output_transform=lambda x, y, y_pred, loss: {"total_loss": loss.item()}
  )

logger = setup_logger("trainer", level=logging.INFO)
logger = FBResearchLogger(logger=logger, show_output=True)
logger.attach(trainer, name="Train", every=20, optimizer=opt)

trainer.run(data, max_epochs=4)

Output:

2024-04-22 12:05:47,843 trainer INFO: Train: start epoch [1/4]
2024-04-22 12:05:47,861 trainer INFO: Epoch [1/4]  [20/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.5999  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:47,877 trainer INFO: Epoch [1/4]  [40/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9297  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:47,893 trainer INFO: Epoch [1/4]  [60/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9985  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:47,910 trainer INFO: Epoch [1/4]  [80/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9785  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:47,925 trainer INFO: Epoch [1/4]  [100/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.6211  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:47,927 trainer INFO: Train: Epoch [1/4]  Total time: 0:00:00  (0.0008 s / it)
2024-04-22 12:05:47,930 trainer INFO: Train: start epoch [2/4]
2024-04-22 12:05:47,949 trainer INFO: Epoch [2/4]  [19/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.5981  Iter time: 0.0009 s  Data prep time: 0.0000 s  
2024-04-22 12:05:47,965 trainer INFO: Epoch [2/4]  [39/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9013  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:47,981 trainer INFO: Epoch [2/4]  [59/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9811  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:47,997 trainer INFO: Epoch [2/4]  [79/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9434  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,016 trainer INFO: Epoch [2/4]  [99/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.6116  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,017 trainer INFO: Train: Epoch [2/4]  Total time: 0:00:00  (0.0009 s / it)
2024-04-22 12:05:48,020 trainer INFO: Train: start epoch [3/4]
2024-04-22 12:05:48,038 trainer INFO: Epoch [3/4]  [18/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.5972  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,055 trainer INFO: Epoch [3/4]  [38/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.8753  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,076 trainer INFO: Epoch [3/4]  [58/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9657  Iter time: 0.0009 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,092 trainer INFO: Epoch [3/4]  [78/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9112  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,108 trainer INFO: Epoch [3/4]  [98/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.6035  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,109 trainer INFO: Train: Epoch [3/4]  Total time: 0:00:00  (0.0009 s / it)
2024-04-22 12:05:48,112 trainer INFO: Train: start epoch [4/4]
2024-04-22 12:05:48,129 trainer INFO: Epoch [4/4]  [17/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.5969  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,145 trainer INFO: Epoch [4/4]  [37/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.8516  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,161 trainer INFO: Epoch [4/4]  [57/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.9521  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,181 trainer INFO: Epoch [4/4]  [77/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.8816  Iter time: 0.0008 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,205 trainer INFO: Epoch [4/4]  [97/100]:  ETA: 0:00:00  lr: 0.00100  total_loss: 1.5966  Iter time: 0.0009 s  Data prep time: 0.0000 s  
2024-04-22 12:05:48,207 trainer INFO: Train: Epoch [4/4]  Total time: 0:00:00  (0.0009 s / it)
2024-04-22 12:05:48,209 trainer INFO: Train: run completed  Total time: 0:00:00

Jean-Eric_Campagne · April 22, 2024, 12:13pm

That sound good
Now the code have several

 @trainer.on(Events.STARTED)
def resume_training(engine):
....
@trainer.on(Events.STARTED)
 def init(engine):
        model.train()
...
@trainer.on(Events.EPOCH_COMPLETED)
def evaluate(engine):
    evaluator.run(test_loader)
....

and finally

   timer = Timer(average=True)
    timer.attach(
        trainer,
        start=Events.EPOCH_STARTED,
        resume=Events.ITERATION_STARTED,
        pause=Events.ITERATION_COMPLETED,
        step=Events.ITERATION_COMPLETED,
    )

    @trainer.on(Events.EPOCH_COMPLETED)
    def print_times(engine):
        pbar.log_message(
            f"Epoch {engine.state.epoch} done. Time per batch: {timer.value():.3f}[s]"
        )
        timer.reset()

    trainer.run(train_loader, epochs)

I’m not sure how to rewrite then to use the code you propose.

…

vfdev-5 · April 22, 2024, 12:20pm

If you still want to use ProgressBar, here is how to reduce the number of printed lines:

import logging

import torch
import torch.nn as nn
import torch.optim as optim

from ignite.engine import create_supervised_trainer, Events
from ignite.handlers import ProgressBar
from ignite.utils import setup_logger


model = nn.Linear(10, 5)
opt = optim.SGD(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

data = [(torch.rand(4, 10), torch.randint(0, 5, size=(4, ))) for _ in range(100)]

trainer = create_supervised_trainer(model, opt, criterion)


pbar = ProgressBar(ascii=True, persist=True)
pbar.attach(trainer, output_transform=lambda x: {"total_loss": x}, event_name=Events.ITERATION_COMPLETED(every=20))

trainer.run(data, max_epochs=4)

I’m not sure how to rewrite then to use the code you propose.

There is almost nothing to change:

You keep this:

 @trainer.on(Events.STARTED)
def resume_training(engine):
....
@trainer.on(Events.STARTED)
 def init(engine):
        model.train()
...
@trainer.on(Events.EPOCH_COMPLETED)
def evaluate(engine):
    evaluator.run(test_loader)
....

and you can either remove def print_times(engine) handler or replace pbar.log_message with logger:

   timer = Timer(average=True)
    timer.attach(
        trainer,
        start=Events.EPOCH_STARTED,
        resume=Events.ITERATION_STARTED,
        pause=Events.ITERATION_COMPLETED,
        step=Events.ITERATION_COMPLETED,
    )

    logger = setup_logger("trainer", level=logging.INFO)
    fbrlogger = FBResearchLogger(logger=logger, show_output=True)
    fbrlogger.attach(trainer, name="Train", every=20, optimizer=opt)

    @trainer.on(Events.EPOCH_COMPLETED)
    def print_times(engine):
        logger.info(
            f"Epoch {engine.state.epoch} done. Time per batch: {timer.value():.3f}[s]"
        )
        timer.reset()

    trainer.run(train_loader, epochs)

Jean-Eric_Campagne · April 22, 2024, 12:26pm

Thanks very much. I will try asap.

vfdev-5 · April 22, 2024, 12:33pm

Sure, sounds good, do not hesitate to ping if something is unclear