Suggestions for tuning hyperparameter

I am using CASIA webFace dataset for training, and the model is 64 layer architecture with conv and prelu layers, and finally 2 FC layers.
I am having hard time to tune the model so it could learn.

First I went with 22 layer model which gives result if i train the model with 100k images, instead of 900k, but fails when I used 100k+ more images.

learning rate, momentum are the only two hyperparameters.