Hi,
I am wondering is it deterministic for the order how the gradients get computed if the model is fixed?
And if we started a distributed training job, are the orders on the computations nodes are all the same?
Thanks!
Hi,
I am wondering is it deterministic for the order how the gradients get computed if the model is fixed?
And if we started a distributed training job, are the orders on the computations nodes are all the same?
Thanks!