Confused About Dimensions

I am relatively new to deep learning, I began using PyTorch to build a convolutional binary classifier for time series two weeks ago. I got it to the stage where I can pass time series individually without error but now I want to speed things up by using batches.

In my reading I have come across a few different terms and symbols to describe dimensions of layers and inputs and I am having a hard time matching them up with each other. PyTorch docs for Conv1D has (in_channels, out_channels, kernel_size) as arguments and describes algebraic symbols N, C and L.

For a given network:

Input
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
GlobAvPool()
|
Flattener of some kind e.g. .view(shape1, shape2)
|
Linear(shape1, 1) # 1 output for binary classifier

I have had this structure working for inputs of shape (1, 1, 556) which is a single time series of length 556 samples. When i change to batches, which numbers change? Are there restrictions on what batch_size can be?

So far my understanding is based on these terms and definitions:

  • kernel_size: the number of timestamps the kernel looks at

  • n_filters: the number of kernels, this seems to be the in_channels and out_channels for most conv (not the first one)

  • batch_size: how many time series am I feeding in at once

Note that the time series are just a single string of digits (does that mean one channel in analogy with an RGB image being 3 channels? or is the number of channels the batch size?)

Basically I need to understand how these shapes interplay so I can confidently write networks of varying structure and know which numbers go where.

I am wrapping my module in Skorch’s NeuralNetBinaryClassifier() which has made me much more confused but it has worked for batch_size = 1.

My code is below for reference but my priority is a conceptual answer to the above, relating to my context. Feel free to also point out anything silly that I have done if you like.

‘’‘class flattener(nn.Module):
def forward(self, x):
return x.view(self.batch_size, 1)’’’

‘’'class Net(nn.Module):

def __init__(self, 
             num_chans = 1, 
             batch_size = 1, 
             conv_layers = 4, 
             num_filters = 16, 
             dropout_prob = 0.1):
    
    super(Net, self).__init__()
    
    # Network hyperparameters
    self.num_chans = num_chans    # Number of channels in each input
    self.batch_size = batch_size  # Number of series in each batch
    self.n_convs = conv_layers    # Number of convolutional layers
    self.n_filters = num_filters  # Number of filters in each convolutional layer
    self.dropout_p = dropout_prob # Probability of each weight being zeroed in each dropout layer
    
    # Get input shape from a training sample
    a_series = train_df.RR_series[0]
    self.input_shape = (1, 1, max(np.shape(a_series)))
    ##print('Input shape: {}\n'.format(input_shape))

    # List of kernel sizes (number of times-teps covered) for each convolutional layer
    kernel_sizes = [2] + [4 + (2 * i) for i in range(self.n_convs - 1)]

    # The list is reversed so the layers look for longer features first, then shorter, down to 2 samples wide
    kernel_sizes = kernel_sizes[::-1]

    # First layer
    layers = [('conv1', nn.Conv1d(in_channels = self.input_shape[0], 
                                  out_channels = self.n_filters, 
                                  kernel_size = kernel_sizes[0])), 
              ('relu1', nn.ReLU()), 
              ('drop1', nn.Dropout(self.dropout_p, inplace = True))]

    # Initialise a list for the length of the output of each convolution
    # Value is initialised based on input length for use in recursive formula: L_out = (L_in - kernal_size) + 1
    conv_out_len = (self.input_shape[2] - kernel_sizes[0]) + 1
    conv_out_shapes = [(1, self.n_filters, conv_out_len)]

    ##print('Convolution: {}\nOutput shape: {}\n'.format(1, conv_out_shapes[0]))

    # Intermediate convolutional layers
    for conv_n in range(2, self.n_convs + 1):
        
        conv_out_len = (conv_out_len - kernel_sizes[conv_n - 1]) + 1
        conv_out_shapes.append((1, self.n_filters, conv_out_len))
        ##print('Convolution: {}\nOutput shape: {}\n'.format(conv_n, conv_out_shapes[conv_n - 1]))

        layers.append(('conv{}'.format(conv_n), nn.Conv1d(in_channels = self.n_filters, 
                                                        out_channels = self.n_filters, 
                                                        kernel_size = kernel_sizes[conv_n - 1])))

        layers.append(('relu{}'.format(conv_n), nn.ReLU()))
        layers.append(('drop{}'.format(conv_n), nn.Dropout(self.dropout_p, inplace = True)))

        if conv_n == (self.n_convs): # Final layer

            layers.append(('global_average_pool', nn.AvgPool1d(conv_out_len)))
            layers.append(('flatten', flattener()))

    # Linear output layer
    layers.append(('dense', nn.Linear(in_features = self.n_filters, out_features = 1)))

    layers_dict = OrderedDict(layers)
    
    # Initialise the weights, sampling from a normal Kaiming He distribution
    for conv_n in range(self.n_convs):
        
        #print(layers_dict['conv' + str(conv_n + 1)])
        torch.nn.init.kaiming_normal_(layers_dict['conv' + str(conv_n + 1)].weight, nonlinearity = 'relu')
    
    torch.nn.init.kaiming_normal_(layers_dict['dense'].weight, nonlinearity = 'relu')
    
    # Construct the network
    self.net = nn.Sequential(layers_dict)
    
    #################################### old stuff ################################
    #pd.DataFrame(layers_dict, index = ['start'])
    #torch.nn.init.kaiming_normal_(self.conv1.weight, nonlinearity = 'relu')
    #torch.nn.init.kaiming_normal_(self.conv2.weight, nonlinearity = 'relu')
    #torch.nn.init.kaiming_normal_(self.conv3.weight, nonlinearity = 'relu')
    #torch.nn.init.kaiming_normal_(self.conv4.weight, nonlinearity = 'relu')
    #torch.nn.init.kaiming_normal_(self.dense.weight, nonlinearity = 'relu')
    
    # Moving to GPU
    #self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    #print('Device: ', self.device)
    #self.net.to(self.device)
    
    #print(self.net)
    
def forward(self, x):
    
    x = x.view(1, 1, self.input_shape[2])
    #x = x.to(self.device)
    print('passed okay')
    # These lines are necessary to prevent errors when the loss function is called
    x = torch.where(torch.isnan(x), torch.zeros_like(x), x)
    x = torch.where(torch.isinf(x), torch.zeros_like(x), x)
    
    return self.net(x)#.view(-1, 1)'''

‘’’
net = NeuralNetBinaryClassifier(module = Net(num_filters = 128, batch_size = 32),
criterion = nn.BCEWithLogitsLoss,
max_epochs = 100,
lr = 0.001,
train_split = None,
device = ‘cuda’,
optimizer = optim.Adam,
callbacks = [LRScheduler(policy = ‘ReduceLROnPlateau’,
factor = 0.5,
patience = 10,
min_lr = 1e-4)])’’’

Apologies for the dodgy formatting, not sure what’s going on there either.
Thanks!