Unexpected results with multi head-attention for NMT task

Hello all : )

I have applied multi-head attention (TRANSFORMER) R2L and L2R for NMT task. I have a few questions about the final results.

When I applied multi-pass decoding the precision is decreased by two points and recall increased two points. Why this happened? and Why precision is decreased? Is that any way to manipulating this issue?
Also, I have tried to continue training both models over new dataset and I have changed the learning rate from 1e-3 to 1e-1 but the performance is don’t improve which is unexpected and confused results.

Could you explain these two points?