I need to immediately backward some loss in the middle of the model training for parameter adjustment, but in the case of multiple GPUs, a batch is parallelized and cannot compute lost and backward.
I need to immediately backward some loss in the middle of the model training for parameter adjustment, but in the case of multiple GPUs, a batch is parallelized and cannot compute lost and backward.