Weird behavior observed with Autograd

Ravi_Tej_Akella · September 28, 2019, 10:57pm

@ptrblck I am trying to run the TRPO algorithm which uses a batch size of 15000. Autograd takes 27 seconds to backpropagate when the batch size is 15000 but only takes 19 seconds to sequentially compute (loop over batch_size) the individual gradients (batch size 1). What I find strange here is that parallelism is hurting the performance w.r.t. time (I agree that the latter is space inefficient).

dejanbatanjac · September 29, 2019, 12:44am

What kind if batch norms you have implemented?