From what I read, if no initializer is passed to the get_variablehere, the glorot_uniform_initializer will be used which is I think it equivalent to xavier_uniform_

Two questions here:

Is that conversion valid?

Should I expect the original embeddings self.embeddings to backpropagate and update its values? Is that the expected behavior from the tensorflow version as well? Should I add requires_grad to the embeddings tensor?

Thanks for your response. I donâ€™t get the difference here. Why a tensor with requires_grad will not get gradients while the Parameter will do? I am not sure if I understand required_grad well though. Any explanation?

Both will get gradients, but in your initial post you didnâ€™t set requires_grad=True in the tensor creation, so self.embeddings wouldnâ€™t get any gradients.
Additionally to that, a registered nn.Parameter will be automatically pushed to the device, if you call model.to() and will also be returned in model.parameters(), while a tensor will not.

I donâ€™t know how TF initializes the bias, but as the error message claims, xavier_uniform cannot be used on parameters with less than 2 dimensions (which is the case for your bias parameter).

def _calculate_fan_in_and_fan_out(tensor):
dimensions = tensor.dim()
if dimensions < 2:
raise ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions")
num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = 1
if tensor.dim() > 2:
receptive_field_size = tensor[0][0].numel()
fan_in = num_input_fmaps * receptive_field_size
fan_out = num_output_fmaps * receptive_field_size
return fan_in, fan_out

Would you then set both values to bias.size(0)?

If thatâ€™s the case, you could manually apply it to the xavier_uniform method, which is defined as:

fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)
std = gain * math.sqrt(2.0 / float(fan_in + fan_out))
a = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation

and replace _calculate_fan_in_and_fan_out with tensor.size(0) for both values.