Could you post the code for your models?
If you didn’t implement some weight sharing, your models should be completely independent.
Also, could you share your training code, if that’s possible?
Is this behavior reproducible?
Thanks, I find the bug is that I want to use different streams to overlap computation and data transferring in my model definition but stream will run in an uncontrolled way actually. I know tensorflow will put computation and data transferring in different streams by default to try to overlap. But it seems not easy to do this overlapping in Pytorch.
Anyway, Thanks for your response.