Create layer with forward pass only

Rafael_Valle · November 2, 2017, 12:07am

How would one write a pre-emphasis layer in pytorch that is not part of the gradient computation?

Y_t = cX_t + (1−c)X_{t−1}

richard · November 2, 2017, 2:55pm

Are you planning on getting gradients for Y? If so, you can preprocess your data X (do your X -> Y) transformation, then wrap Y in a Variable, and then send Y through a net.

Rafael_Valle · November 2, 2017, 3:15pm

@richard Yes, there’s a reconstruction loss on Y to backprop gradients to the network but not to the c parameter. I’m currently doing the pre-processing inside the network’s forward method but I’d like to have it wrapped in a network layer. Is this possible ?

richard · November 3, 2017, 2:32pm

As in, you want to create a nn.module for this?

Rafael_Valle · November 3, 2017, 3:15pm

Yes! From the perspective of back-prop, I’m unclear what’s the difference between having the pre-emphasis happen outside of the network versus inside of the network as in the example below.

class Network(nn.Module):
    def __init__(self, c, ch):
        super(Network, self).__init__():
        self.pre_processing = PreEmphasis(c)
        self.convs = nn.Sequenial([nn.Conv1d(ch[0], ch[1]), nn.Conv1d(ch[1], ch[2])])

    def forward(self, input)
        x = self.pre_processing(input)
        x = self.convs(x)
        return x

class PreEmphasis(nn.Module):
    def __init__(self, coef):
        super(PreEmphasis, self).__init__()
        self.coef = coef

    def forward(self, x)
        return self.pre_emph(x, self.coef)

    def pre_emph(self, x, coef)
        # pre_emphasis fn on x with coef

richard · November 3, 2017, 3:28pm

There is no difference with respect to autograd of having the pre-emphasis happening outside vs. inside the network. Autograd records the operations performed to Variable-wrapped tensor data and uses those to compute gradients in a backward pass.

As I understand it, you’re trying to get gradients on Y and not X. To accomplish that, you’d have to wrap Y in a Variable after X (as a tensor, not a Variable) comes out of the pre-emphasis step.

Rafael_Valle · November 3, 2017, 4:17pm

Is what you suggest the same as setting X.data = pre-processing(X.data) or would autograd record that change on X’s data?

Also, given that the network parameters are all after the pre-processing layer, is it really necessary to wrap Y in a Variable after X as a tensor? Wouldn’t it be the same as having X as a Variable, pre-processing that variable and computing the gradients on that X? For doing parameter updates, I mean…

richard · November 3, 2017, 4:52pm

Yes, I was suggesting Y = pre-processing(X.data). X.data is a tensor; autograd only keeps track of operations performed on Variables.

True, if you’re just doing parameter updates I believe that’ll be the same.