Noop flag device error with apex optimizers

I’m getting the following RuntimeError using apex optimizer FusedSGD, (but I got it for all apex optimizers). I have not idea what it means, I checked, inputs, targets, loss, weights, all seems to be on the same cuda device.
Any idea ? Should I open a bug issue on apex github ?

I’m using EfficientDet model by Ross Wightman (efficientdet-pytorch)

  File "mylib/apps/training/lib/models/detection/effdet/", line 96, in train_one_epoch
  File "mylib/torchenv/lib/python3.6/site-packages/torch/cuda/amp/", line 321, in step
    retval = optimizer.step(*args, **kwargs)
  File "mylib/torchenv/lib/python3.6/site-packages/apex/optimizers/", line 222, in step
  File "mylib/torchenv/lib/python3.6/site-packages/apex/multi_tensor_apply/", line 30, in __call__
RuntimeError: expected noop flag to be on the same device as tensors

Here’s my code to do the forward/backward step:

for images, targets in progress_bar(self.train_loader, parent=mb):
    targets = {k: for k, v in targets.items()}

    with torch.cuda.amp.autocast():
        loss_dict = model(images, targets)

    # Get loss values from dict
    loss = loss_dict["loss"]

    # Scales the loss, and calls backward()
    # to create scaled gradients
    # Unscales gradients and calls
    # or skips optimizer.step()
    # Updates the scale for next iteration

and scaler is just scaler = torch.cuda.amp.GradScaler()

Mixing native amp (via torch.cuda.amp) and apex/amp might yield these issues and we recommend to stick to the native implementation. multi_tensor_apply is already implemented in upstream PyTorch and more optimizers are being worked on to enable it.