this works but It shows the elapsed time in 3 epochs and I suppose the file overwrites every 3 epochs but what I want is to calculate the average time just once for the first epoch and stop the timer. I don’t want to divide timer value between 3, also I don’t want to generate one file per epoch. One option is to run my code with just one epoch but I am running several models and I would like to generate this just once every time I am going to train.
How can I achieve this?
pd: the documentation show an example doing it by iteration, but I am unable to translate this example to epoch. I believe the documentation should extend to epoch example as well.
An other option is to use directly Timer (without step to do not average)
timer = Timer()
@trainer.on(Events.STARTED)
def time_start():
# reset the timer at the beginning
timer.reset()
@trainer.on(Events.EPOCH_STARTED(every=3))
def time_start():
# use value() to get the time
print("timer of 3 epochs :", timer.value())
# reset the timer from now for 3 next epochs
timer.reset()
Last point I would like to mention, we recently added epoch timers in trainer.state.timers. That is another option which should help you to mesure epoch times
@trainer.on(Events.EPOCH_COMPLETED)
def print_timers():
# trainer.state.timers is a dict
print("timers=", trainer.state.timers)
If you need more support, don’t hesitate to ask, it would be a pleasure to help
Hi all, I am sorry to bum an old conversation, but my question is related to the question of the OP.
I’m trying to measure the average wall-clock time per training epoch with Ignite’s Timer. I then want to log it to wandb when the training is completed or terminated (e.g. due to a Ctrl+C exception).
However, the logged time-per-epoch seems unreasonably low (like 0.01, while I was expecting something like 2-4 seconds). Am I doing something wrong in the definition of the Timer?
@gnufabio hard to see what can go wrong without some code.
It may depend on whether you do some work on EPOCH_COMPLETED or EPOCH_STARTED events and attached the timer first before attaching these time consuming handlers ?
Maybe, as you average the times, you have something less expected ?
Alternatively, you can try to use engine.state.times[Events.EPOCH_COMPLETED.name], State — PyTorch-Ignite v0.5.1 Documentation for that.
I think I found the problem - it was due to log_timer being called both for Events.TERMINATE and Events.COMPLETE. timer.value() was low because I was resetting the timer after Events.TERMINATE was fired. I solved by removing the log_timer handler after its first call.
That being said, I am not sure why Events.TERMINATE and Events.COMPLETE are both fired, since in the official documentation they seem to be mutually exclusive.
Just as a reference, I have a bunch of other handlers attached to the engine:
Handler to run the validation loop at the end of each epoch
ModelCheckpoint used to save the checkpoint and restore the weights when trainer fires the TERMINATE or COMPLETE event
EarlyStopping fired on evaluator’s Events.COMPLETE
Handler to catch Ctrl+C and gracefully terminate the training
Glad you could find the problem and hope it works now as expected. Feel free to ask other questions here or on our discord.
That being said, I am not sure why Events.TERMINATE and Events.COMPLETE are both fired, since in the official documentation they seem to be mutually exclusive.
It was a question whether we should skip or not Events.COMPLETE when terminate() was called and it was decided to keep it. Maybe, it could be a feature to add a flag to terminate(skip_complete_event=True) to skip Events.COMPLETE.
I think that would make things more intuitive and flexible. While from the Engine’s docs it is clear that trainer.terminate() fires both Events.COMPLETE and Events.TERMINATE, in the Events’ docs there is a table that looks misleading…