Understanding loss.backward() and cpu usage

Using pytorch’s profiling tools revealed that the CPU usage was actually right before my call to backward(): Model() uses GPU but backwards() doesn't - #3 by neoncube