Should It output loss to tensorboard each epoch or each iteration? Why doesn’t the loss drop continuously?
- It depends on your use case and how many loss values you would like to plot.
- Some minibatches might contain “hard” samples, which could increase the loss for this iteration. Generally there is no guarantee to decrease the loss in each iteration using (mini-batch) SGD.