Training pytorch model in Google Colab vs local GPUs results in less accurate models using same hyperparams. How to fix?

Keith72 · August 31, 2020, 12:50am

I recently discovered Google Colab and I uploaded my Pytorch project related to training models for processing audio. I got it training models using google’s TPUs, but I noticed that the models were less accurate than the ones I trained on my local machine. It turns out the the state dict weights and biases have about half the decimal places as the locally trained model. After some searching I read that Colab uses float16 By default instead of float32 precision to increase speed, but since the audio I’m training is in float32, it really needs to train in float32 precision. Is there a way to change this in Colab? Or is there a way to change my pytorch model to ensure float32 precision is kept? My model uses a stack of 1d convolutional layers, if that matters.

ptrblck · August 31, 2020, 1:22am

PyTorch uses float32 by default on CPU and GPU. I’m not deeply familiar with TPUs, but I guess you might be using bfloat16 on them? Could you try to call float() on the model and inputs and check, if the TPU run is forcing you to use this format?

Keith72 · August 31, 2020, 1:30am

I went back and looked at the stat_dict of both the locally trained GPU model and the cloud TPU model, and they do have the same precision, around 4 decimal places, so what I said in the original question was incorrect.

After some more reading it sounds like the built in bfloat16 type is part of what makes TPUs so fast, and I don’t understand all the math but I think it can produce the same range of values, so that might not be my issue. I also should note that I’m using a pytorch_lightning module, although I wouldn’t think that would matter. I might try calling float() to see if that changes the output, thanks!

Keith72 · August 31, 2020, 1:34am

Here’s the reference I found on bfloat32:

ptrblck · August 31, 2020, 1:43am

While bflaot16 uses the same range as float32, it does not provide the same “step size”.
As I’m not deeply familiar with this numerical format, I don’t know if you would have to adapt your model to this format.