Dropout across multiple GPU

If I seed each process differently, then dropout would behave differently for each process. So conceptually the tensor that is dropped in one process is picked in another then when we are doing all reduce on gradients there would be no zeroed-out gradient which is different from single GPU training if we use dropout. So do we have to necessarily set the same manual seed if we use a dropout layer across all the processes to make it conceptually work?

Double post from here.