I am trying to run this code here and the snippet below is giving me the error in the subject, Pytorch nightly build 2.1.0.dev20230801, cuda 12.1 Python 3.11.4
model = FSDP(
model,
process_group=fs_init.get_data_parallel_group(),
auto_wrap_policy=functools.partial(
transformer_auto_wrap_policy,
transformer_layer_cls=[] if model.is_peft else [TransformerBlock],
),
limit_all_gathers=True,
use_orig_params=True,
sync_module_states=True,
mixed_precision=MixedPrecision(
param_dtype=mixed_precision_dtype,
reduce_dtype=mixed_precision_dtype,
buffer_dtype=mixed_precision_dtype,
),
sharding_strategy={
"sdp": ShardingStrategy.SHARD_GRAD_OP,
"ddp": ShardingStrategy.NO_SHARD,
"fsdp": ShardingStrategy.FULL_SHARD,
}[args.data_parallel],
device_id=device,
ignored_parameters=[param for param in model.parameters() if not param.requires_grad]
)
Can you please help me understand why is this happening, in the docs ignored_parameters is indeed a argument, so why does it say an expected argument?