How to perform backward pass only on sample with minimum loss and improve performance?

lmnt · January 30, 2019, 9:08pm

This thread Disconnected Gradient in Pytorch might be helpful. It discusses why you would do what @rasbt described and/or detach each time in the loop.