On HPU device we have an autograd override (C++) implementation of aten::linear op. To support DTensors, I implemented op_strategy (register_op_strategy) for linear (fwd and bwd). My implementation of op strategy hits an ValueError when bias = None.
ValueError on TensorMeta=None in _wrap_output_spec_tensor_meta() inside torch/distributed/tensor/_sharding_prop.py
for i, spec in enumerate(output_specs):
if isinstance(spec, DTensorSpec):
output_tensor_meta_i = output_tensor_meta[i]
if not isinstance(output_tensor_meta_i, TensorMeta):
raise ValueError(
f"ShardingPropagator error: output {i} does not have an associated TensorMeta"
)
spec.tensor_meta = output_tensor_meta_i
My question - does this ValueError check hold when a case like optional bias tensor is not supplied for linear (bias=None). Note that I am hitting this case only with autograd override implementation for linear. Without autograd override aten::linear gets decomposed into multiple other ops and we won’t hit this ValueError.