Thanks for the inputs – I think your suggestions removing the explicit copy and using the Variable to wrap the tensor largely helped in speed-up, but I encountered something quite strange about the inference which I think warranted a new post here : [pytorch0.3.1] Forward pass takes 10x longer time for every 2nd batch inference