Slow loss decreasing on big batch when accumulating gradient

I’m training encoder-decoder module with 2 GRUs. Training algorithm is something like that:

criterion = NLLLoss(...)
opt = Adam(...)
for epoch in range(epochs):
    for batch in split_to_batches(dataset, batch_size):
        ... # preparing data
        predicted = model.forward(batch_encoder_inputs, batch_decoder_inputs)
        loss = criterion(predicted, batch_targets)

So, loss is accumulated through all batches and weights are updated once for 1 epoch.
When I trained network with batch_size = 200, it was okay and loss was decreasing with average speed. But when I increased batch_size to 400, it became really hard to make the loss to decrease at all. And only when I set learning rate to something like 0.0001 it started to decrease, but veeeeery slowly. I also tried to use gradient clipping (from 4 to 8) before opt.step with no luck.

Why does it happening? The more batch_size the more the training performance should be. Am I wrong? How can I fix the training algorithm?

Thanks in advance!