I’m not sure, if you’ll get the best answer about Tensorflow in this discussion boad and I would recommend to use their discussion platforms (stack overflow and github issues).
That being said, may I ask, how you’re optimizing the performance using torch.cuda.synchronize()
calls?