Errors using Deep Speed 3

To the superstars that have worked on py torch and deep speed, I need
help can anyone help me with

This is what we tried,
Set multiple GPUS as accelerator in pl.Trainer
Add DeepSpeed Strategy to pl.Trainer
Update optimizer in TemporalFusionTransformer to be compatible with DeepSpeed
Runs successfully in DeepSpeed stage=2
Returns an error in DeepSpeed stage=3:

This is the actual error,
File “/home/ubuntu/.venv/pl-forecast/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py”,
line 39, in _apply_to_tensors_only
return outputs.class(touched_outputs)
TypeError: output.new() missing 7 required positional arguments:
‘encoder_attention’, ‘decoder_attention’, ‘static_variables’,
‘encoder_variables’, ‘decoder_variables’, ‘decoder_lengths’, and
‘encoder_lengths’