Weight Sharing - weights for the next step

Ali_Akgoz · August 20, 2020, 6:36am

Hi, I will use same layer many times (count will change at each batch) in my model, I found that example:

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once.
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred

My question is if I use this code it will create lets say 3 middle layers and will use same weights for all, but what about gradients? I mean during training different gradients will be calculated for each middle layer. When backpropagation finishes at the next step their weights will be different from each other. How pytorch will calculate the weight of that layer for the next batch? Will it take average? or take the last layer’s weights?

Thanks,
Ali

ptrblck · August 22, 2020, 5:00am

By default the gradients will be accumulated in the parameters.

That shouldn’t be the case, since the parameters are not updated during the backward() call, but by the optimizer.step().