How to transform conv1d from keras?

chao · February 17, 2018, 5:01am

I’m trying to implement this network from keras to pytorch, not sure if it’s workable. The target model in keras is as following:

input_7 (InputLayer)             (None, 200, 76)       0                                            
____________________________________________________________________________________________________
conv_1 (Convolution1D)           (None, 192, 10)       6850        input_7[0][0]                    
____________________________________________________________________________________________________
conv_2 (Convolution1D)           (None, 184, 10)       910         conv_1[0][0]                     
____________________________________________________________________________________________________
conv_3 (Convolution1D)           (None, 174, 20)       2220        conv_2[0][0]                     
____________________________________________________________________________________________________

But the PyTorch seems to have a different API for conv1d. I tried this:

nn.Sequential(
            nn.Conv1d(in_channels=200, out_channels=192, kernel_size=10),
            nn.ReLU(),
            nn.Conv1d(in_channels=192, out_channels=184, kernel_size=10),
            nn.ReLU(),
            nn.Conv1d(in_channels=184, out_channels=174, kernel_size=20),
            nn.ReLU(),
        )

which seems not working properly.

The kernel_size in keras Convolution1D doesn’t seem very simple to transfer to conv1d in PyTorch.

Appreciate any help.

jpeg729 · February 17, 2018, 8:47am

I can’t see any difference between kernel_size in Keras and kernel_size in PyTorch.
Nor can I see any errors in your tiny code snippet.

What errors are you getting? What isn’t working as expected?

chao · February 17, 2018, 10:15pm

Hi Thanks.

So here’s my output in pytorch:

(500L, 192L, 67L)
(500L, 192L, 67L)
(500L, 184L, 58L)
(500L, 184L, 58L)
(500L, 174L, 39L)
(500L, 174L, 39L)

The first dimension is batch-size, so you can simply ignore it. The duplicate one corresponds to relu.

jpeg729 · February 18, 2018, 7:30am

What code produces those numbers?

ashwin.raju93 · February 18, 2018, 10:39am

input_7 (InputLayer)             (None, 200, 76)       0                                            
____________________________________________________________________________________________________
conv_1 (Convolution1D)           (None, 192, 10)       6850        input_7[0][0]                    
____________________________________________________________________________________________________
conv_2 (Convolution1D)           (None, 184, 10)       910         conv_1[0][0]                     
____________________________________________________________________________________________________
conv_3 (Convolution1D)           (None, 174, 20)       2220        conv_2[0][0]                     
______________________________________________________________________________

Can you post the keras snippet code which produces this output shape?

chao · February 19, 2018, 7:12pm

@jpeg729 @ashwin.raju93
Sorry for late response. Here are the keras codes:

x = Input(shape=(200, 76))
h = Convolution1D(10, 9, activation = 'relu', name='conv_1')(x)
h = Convolution1D(10, 9, activation = 'relu', name='conv_2')(h)
h = Convolution1D(20, 11, activation = 'relu', name='conv_3')(h)

model = Model(x, h)
print model.summary()

And the output is

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_6 (InputLayer)             (None, 200, 76)       0                                            
____________________________________________________________________________________________________
conv_1 (Convolution1D)           (None, 192, 10)       6850        input_6[0][0]                    
____________________________________________________________________________________________________
conv_2 (Convolution1D)           (None, 184, 10)       910         conv_1[0][0]                     
____________________________________________________________________________________________________
conv_3 (Convolution1D)           (None, 174, 20)       2220        conv_2[0][0]

And corresponding pytorch code is:

nn.Sequential(
            nn.Conv1d(in_channels=200, out_channels=192, kernel_size=10),
            nn.ReLU(),
            nn.Conv1d(in_channels=192, out_channels=184, kernel_size=10),
            nn.ReLU(),
            nn.Conv1d(in_channels=184, out_channels=174, kernel_size=20),
            nn.ReLU(),
        )

But the model structure is

(500L, 192L, 67L)
(500L, 192L, 67L)
(500L, 184L, 58L)
(500L, 184L, 58L)
(500L, 174L, 39L)
(500L, 174L, 39L)

ptrblck · February 19, 2018, 7:28pm

Your keras model defines 10 filters with kernel_size=9 in the first conv layer, while in your PyTorch model you define 192 filters with kernel_size=10. Keras’ filters is equal to out_channels.

EDIT: Also it seems to me that your Keras input hat 76 channels. You should transpose the input to get similar results to [batch, channels, length].

chao · March 7, 2018, 4:22am

Thanks for the explanation. Just to check it again.

So in keras, the input data is [batch, timestep/length, filter/channel], while in pytorch it becomes [batch, filter/channel, timestep/length] right?

If that’s the case, I guess I should do sth like transpose((0,2,1)).

ptrblck · March 7, 2018, 4:42am

Yes, if you are using the default settings with the tensorflow backend.
There is some option like channel_first to change this behavior in keras.

chao · March 7, 2018, 4:54am

Now the baseline model is using keras for VAE, and I want to reimplement them in pytorch. I can fix the input part by this transformation. But not sure how to do this for the decoder output.

To be more specific, in the pytorch implementation, the input is [batch, filter/channel, timestep/length]. And if I want to follow the structure of baseline models, which outputs [batch, timestep/length, filter/channel] (in keras), do you think it’s reasonable to use this transpose on the output for loss calculation?

ptrblck · March 7, 2018, 5:44am

Could you please post the complete Keras and PyTorch code, if that’s possible?
If you cannot post the complete code, a small snippet of the encoder and decoder part will be sufficient.

The decoder should keep the dimensions it gets from the encoder.
I don’t really understand the issue, but a look at the code will most likely change that.