Variable parts of hidden layers in a network

cherepanovic · May 22, 2019, 6:11pm

Hello community,

my question is about a variable output or some parts of a net which can vary. For example a flag would direct which output (or some part of a net) is to choose. That means that I have different hidden layers and with means of a flag it will be decided which is to take. Is there some examples available, could you provide someone.

here a small illustration:

best regards!

ptrblck · May 22, 2019, 10:23pm

You can just pass a flag into your model’s forward to chose a certain path:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.path1 = nn.Linear(5, 10)
        self.path2 = nn.Linear(10, 10)
        
    def forward(self, x, path):
        if path == 'path1':
            x = self.path1(x)
        elif path == 'path2':
            x = self.path2(x)
        else:
            print('unknown path')
        return x

model = MyModel()
x1 = torch.randn(1, 5)
output1 = model(x1, 'path1')
x2 = torch.randn(1, 10)
output2 = model(x2, 'path2')

cherepanovic · May 23, 2019, 10:19am

Thank you for your answer!

One question more, how does learning work (the backpropagation?)

iArunava · May 23, 2019, 11:10am

In very brief (for supervised settting)

You do a forward pass, after all the calculations, the final linear layer outputs a vector [1, nc], where nc is the number of classes.
This gets compared to the true classes, and we get a value of loss
This loss gets backpropagated.
Every parameter in the model gets updated with respect to this loss.
You have a better model! Hurray!

Let’s revise, the major the magic word backpropagation

This is an algorithm where each model parameter is compared to the loss.
And we calculate a gradient which is basically a number which has direction and magnitude
Now, this direction is indicated by torch.sign of this value, (i.e. whether positive or negative)
and moving the model parameter in this direction will increase the value of loss by a factor of magnitude.
So, we move in the opposite direction (i.e. negate the gradient) and thus decrease the loss

That was not very brief, but I think it’s clear.

Also read this: https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

cherepanovic · May 23, 2019, 10:36pm

I didn’t ask what the backpropagation is. Your post does not fit to my question.

iArunava · May 24, 2019, 1:52am

how does learning work (the backpropagation?)

My bad! Seeing the question mark after backprop. I derived that’s your question.

ptrblck · May 24, 2019, 11:51am

Since the computation graph will be created dynamically during the forward pass, only the parameters will be updated (and get a valid gradient) which were used to calculate the loss.
I.e. if you only use path1 during training, only self.path1 will be upgraded, while self.path2 keeps its initial values.