Suboptimal convergence when compared with TensorFlow model

Thanks for all the useful comments above. I took the initiative to bump this issue, after spending a couple of days on the reproduction attempts of CycleGAN-VC. CycleGAN-VC seems to be not reproducible due to Adam optimizer/sparsity differences according to this issue:

I’m very unfamiliar with TF, but I will do my best to get a minimal reproducible example with the two frameworks.

I spent a bit of time with it but I struggled to implement two identical matrix multiplications in TF and PyTorch, even though I’m loading the weights initalised by PyTorch in TF, same input, but mean of feedforward output seems to be different (0.003 vs -0.006).

1 Like