Tensorflow get_variable into PyTorch

Ahmed_Abdelaziz · May 30, 2020, 11:39am

Hi all,

I am trying to convert this tensorflow code into pytorch. For example, I converted the below tensorflow code

tf.get_variable("char_embeddings", [len(data.char_dict), data.char_embedding_size]),  char_index)  # [num_sentences, max_sentence_length, max_word_length, emb]

into

class CharEmbeddings(nn.Module):
    def __init__(self, config, data):
          ....
          self.embeddings = nn.init.xavier_uniform_(torch.empty(len(data.char_dict), data.char_embedding_size))
   

    def forward(self, char_index):
        # [num_sentences, max_sentence_length, max_word_length, emb]
        char_emb = self.embeddings[char_index]

From what I read, if no initializer is passed to the get_variable here, the glorot_uniform_initializer will be used which is I think it equivalent to xavier_uniform_

Two questions here:

Is that conversion valid?
Should I expect the original embeddings self.embeddings to backpropagate and update its values? Is that the expected behavior from the tensorflow version as well? Should I add requires_grad to the embeddings tensor?

ptrblck · May 31, 2020, 7:35am

Probably yes, but you should compare the default arguments to both methods, as each framework might use other defaults.
self.embeddings is not created as an nn.Parameter, so Autograd won’t calculate the gradients for this tensor. You could use:

emb = torch.empty(len(...))
nn.init.xqavier_uniform_(emb)
self.embeddings = nn.Parameter(emb)

instead to create a parameter, which will get gradients.

Ahmed_Abdelaziz · June 4, 2020, 7:25am

Thanks for your response. I don’t get the difference here. Why a tensor with requires_grad will not get gradients while the Parameter will do? I am not sure if I understand required_grad well though. Any explanation?

Thanks!

ptrblck · June 4, 2020, 7:58am

Both will get gradients, but in your initial post you didn’t set requires_grad=True in the tensor creation, so self.embeddings wouldn’t get any gradients.
Additionally to that, a registered nn.Parameter will be automatically pushed to the device, if you call model.to() and will also be returned in model.parameters(), while a tensor will not.

Ahmed_Abdelaziz · June 4, 2020, 8:37am

I got it, thank you!

Ahmed_Abdelaziz · September 16, 2020, 5:36pm

Following up on this question, I see that the model I am migrating from TensorFlow uses get_variable to intiazliae network biases

hidden_bias = tf.get_variable("hidden_bias_{}".format(i), [hidden_size])

So I assumed I can use torch.nn.init.xavier_uniform_ to init network bias in Torch too

torch.nn.init.xavier_uniform_(self.input.bias) but I got that error

Fan in and fan out can not be computed for tensor with less than 2 dimensions

Any idea how to mimic the same init as TensorFlow for biases

ptrblck · September 18, 2020, 4:03am

I don’t know how TF initializes the bias, but as the error message claims, xavier_uniform cannot be used on parameters with less than 2 dimensions (which is the case for your bias parameter).

Ahmed_Abdelaziz · September 18, 2020, 9:01am

Yes true. However I did investigate the histograms for the bias from TF and it seems they pass Fan-in = Fan-out

ptrblck · September 18, 2020, 10:37am

fan_in and fan_out are calculated as:

def _calculate_fan_in_and_fan_out(tensor):
    dimensions = tensor.dim()
    if dimensions < 2:
        raise ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions")

    num_input_fmaps = tensor.size(1)
    num_output_fmaps = tensor.size(0)
    receptive_field_size = 1
    if tensor.dim() > 2:
        receptive_field_size = tensor[0][0].numel()
    fan_in = num_input_fmaps * receptive_field_size
    fan_out = num_output_fmaps * receptive_field_size

    return fan_in, fan_out

Would you then set both values to bias.size(0)?

If that’s the case, you could manually apply it to the xavier_uniform method, which is defined as:

    fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)
    std = gain * math.sqrt(2.0 / float(fan_in + fan_out))
    a = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation

and replace _calculate_fan_in_and_fan_out with tensor.size(0) for both values.