Scaler.step(optimizer) in FP16 or FP32?

When using the recipe for training with AMP and GradScaler, i.e.:

                scaler.scale(loss).backward()
                scaler.step(optimizer)              
                scaler.update()

Is the optimizer step performed in full or half precision? And could this lead to issues in regard to very, very small learning rates?

The model parameters are kept in float32 and not transformed to the lower precision dtype. The optimizer thus applies the update on the float32 parameters.