for epoch in range(num_epochs):
batch = 0
for _ in range(num_epoch_repeats):
for data in self.train_data_loader:
losses = self.train_step(data, global_step=step_id)
....................................................
....................................................
I do not understand the necessity of this loop "for _ in range(num_epoch_repeats):" instead you can increase the num_epochs, right?
Are there any other reasons for this kind of implementation?
You should see how authors use the outer loop - likely they do some periodic actions, and decided to change epoch numeration for that (e.g. there are validation epochs every 10 training epochs, and num_epochs counts the former). Not very good idea overall, IMHO.
Yeah, I checked, It seems that learning rate is updated(self.lr_scheduler.step()) for every self.num_epoch_repeats But this would just be the same as counting on epochs and then performing self.lr_scheduler.step() for every required number count. Right?
for epoch in range(self.num_epochs):
self.writer.add_scalar(
"lr", self.optim.param_groups[0]["lr"], global_step=step_id
)
batch = 0
for _ in range(self.num_epoch_repeats):
for data in self.train_data_loader:
losses = self.train_step(data, global_step=step_id)
loss_str = fmt_loss_str(losses)
if (
batch == self.num_total_batches - 1
or batch % self.accu_grad == self.accu_grad - 1
):
self.optim.step()
self.optim.zero_grad()
self.post_batch(epoch, batch)
step_id += 1
batch += 1
progress.update(1)
if self.lr_scheduler is not None:
self.lr_scheduler.step()
seems that they also do it for gradient accumulation - as it interacts with the data loader a bit poorly (if dataset_size : batch_size*accum_batches ratio is low)