What should I do if I want to optimise different modules in one network(module) with different learning rate?

If I put all layers in one module. is it possible to train different layers with different learning rate?

Yes, it is, you should use parameter groups when defining your optimizer.
Take a look at the section “Per-parameter options” in http://pytorch.org/docs/master/optim.html