I am implementing an algorithm by Google (which was written in Tensorflow 1.0) in PyTorch. The Tensorflow implementation defines a classification head on top of a BERT model as follows:
hidden_size = output_layer.shape.as_list()[-1]
output_weights = tf.get_variable(
"output_weights", [hidden_size],
initializer=tf.zeros_initializer()
if config.init_cell_selection_weights_to_zero else _classification_initializer())
output_bias = tf.get_variable(
"output_bias", shape=(), initializer=tf.zeros_initializer())
I want to define the same in PyTorch. I defined it as follows:
class net(nn.Module):
def __init__(self, config):
super().__init__(config)
# classification head
if config.init_cell_selection_weights_to_zero:
self.output_weights = nn.Parameter(torch.zeros(config.hidden_size))
else:
self.output_weights = nn.Parameter(torch.empty(config.hidden_size))
nn.init.normal_(self.output_weights, std=0.02) # here, a truncated normal is used in the original implementation
self.output_bias = nn.Parameter(torch.zeros([]))
def forward(self, ...)
In other words, I am using torch.nn.Parameter
as a counterpart of tf.get_variable
to define this additional trainable layer. This adds the classification head to the parameters of the model (i.e. they are printed when I print list(model.parameters())
), but they are not printed when I type print(model)
.
Is this because they are not registered yet? Do I have to use register_parameter
in my init function?