I am training a BCE multi-label classifier on timeseries data using a common nested cross validation scheme where I train on the first t data points and predict t+1 then retrain from scratch up to t+1 and predict t+2, etc.
As you can see the batch size is increasing at each iteration and this is causing the training loss to grow logarithmically with the batch size. The way I understanding it is that the model becomes harder to fit by some average error log(1+N*eps). This is undesirable as it is equivalent to increasing the learning rate at every iteration and makes training losses not readily comparable.
Is there a way to scale the BCE loss to prevent this effect? I am using the default mean reduction.