The same model works on 2080ti but can not work on 1080ti

I am training a seq2seq model for text to speech task. And I found that when training the same model on a 1080ti and a 2080ti respectively (both with the same default config from the repository, i.e., the same batch size,the same learning rate and so on).
The model on 2080ti can pick up a meaningful alignment while the model on 1080ti can not。Although the loss of the model on 1080ti is descending normally.

I am quite confused,is the difference between GPUs resulting that problem? I have checked the versions of pytorch to be consistent. The code also has setted a specific random seed. So where is the problem? Anybody had encountered similar problems?

Could you explain what “alignment” means in this context?
Is your model not predicting anything useful, while the losses decrease on both GPUs?

Thanks for your reply. When training a seq2seq text-to-speech model, the alignment usually means a diagonal line that aligns the outputs with the inputs based on the attention mechanism. When a neural model picks up this alignment, it can be can be used to predict. When the neural model fails, it can not work normally(not predicting anything useful).

And i found that the model can be trained normally on 1080ti just now. However, It seems that the 1080ti just need more iterations to pick up this aligment compared to 2080ti. For example, when training the model with same config, the model picks up an meaningful alignment about 30k iterations on the 2080ti while the model on 1080ti needs about 50k iterations to pick up this alignment.

How reproducible is this effect, i.e. out of 10 runs with different seeds how often is the 1080 worse by which margin compared to the 2080?
Also, I assume you are using the exact same environment?

Only the software enviroment are same, which means the same CUDA version, cudnn version, pytorch version, numpy version and linux version. But they are training on different devices.

I am sorry that i haven’t tried it for many runs. And I have only trained the model with one specific seed, perhaps 3 runs. For each run, the 2080ti seems more “smart” to learn the alignment at a early iteration than 1080ti.