Training with batch_size = 1, all outputs are the same and trains poorly

I thought it might be this (Outputs from a simple DNN are always the same whatever the input is), but model.state_dict() suggests the weights and biases are all on a similar scale. @ptrblck could you lend a hand?