How to adjust training hyper-params to train a multi-gpu project on less-cards or even a single-card environment?

Hi, all
I have a torch DDP project which can be well trained on a 8-card GPU machine, while will sometimes go fail on 4-card machine [ Here fail means the training loss is downing as normal, but the validation result is not good ], or if I train it on a single-card env, it always go to fail. The batchsize can not increase any more, so I can not train it on single-card env with larger batch size, So which hyper parameters I can adjust to make it train good on single-card or less card [eg ,2-cards, 3-cards ] environment? will a larger learning-rate and warmup steps help?