How to perform backward pass only on sample with minimum loss and improve performance?

This thread Disconnected Gradient in Pytorch might be helpful. It discusses why you would do what @rasbt described and/or detach each time in the loop.