I am training that takes as input 16-frame videos + 16 corresponding photovoltaic power (PV) output values. I would like to have the PV values be preserved in float32, but for the sake of model training efficiency I am using mixed precision with float16 for the weights. I tried to convert the float16 to float32 in a way that seemed “local”, but since the linear layer is connected to the rest of the model chain, deepspeed throws an error, saying “found dtype Float but expected Half”. It also won’t let me pass in the float32 into the float16 layer without conversion. How do I get around this?
This sounds like “raw` FP16 instead of mixed-precision training which would keep the parameters in FP32 and use lower precision activations for safe operations.
Based on this I assume you are manually casting the model to .half(). If so, you might need to manually cast the specific layer back to .float() as well as it’s input/output.