Comparing keras and pytorch documents of rmsprop, it seems that pytorch’s default lr us 10x as large as keras’s. Do you have some other code that changes lr somewhere?
keras: https://keras.io/optimizers/#rmsprop
pytorch: http://pytorch.org/docs/0.2.0/optim.html#torch.optim.RMSprop