I am relatively new to deep learning, I began using PyTorch to build a convolutional binary classifier for time series two weeks ago. I got it to the stage where I can pass time series individually without error but now I want to speed things up by using batches.

In my reading I have come across a few different terms and symbols to describe dimensions of layers and inputs and I am having a hard time matching them up with each other. PyTorch docs for Conv1D has (in_channels, out_channels, kernel_size) as arguments and describes algebraic symbols N, C and L.

For a given network:

Input
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
GlobAvPool()
|
Flattener of some kind e.g. .view(shape1, shape2)
|
Linear(shape1, 1) # 1 output for binary classifier

I have had this structure working for inputs of shape (1, 1, 556) which is a single time series of length 556 samples. When i change to batches, which numbers change? Are there restrictions on what batch_size can be?

So far my understanding is based on these terms and definitions:

• kernel_size: the number of timestamps the kernel looks at

• n_filters: the number of kernels, this seems to be the in_channels and out_channels for most conv (not the first one)

• batch_size: how many time series am I feeding in at once

Note that the time series are just a single string of digits (does that mean one channel in analogy with an RGB image being 3 channels? or is the number of channels the batch size?)

Basically I need to understand how these shapes interplay so I can confidently write networks of varying structure and know which numbers go where.

I am wrapping my module in Skorch’s NeuralNetBinaryClassifier() which has made me much more confused but it has worked for batch_size = 1.

My code is below for reference but my priority is a conceptual answer to the above, relating to my context. Feel free to also point out anything silly that I have done if you like.

‘’‘class flattener(nn.Module):
def forward(self, x):
return x.view(self.batch_size, 1)’’’

‘’'class Net(nn.Module):

``````def __init__(self,
num_chans = 1,
batch_size = 1,
conv_layers = 4,
num_filters = 16,
dropout_prob = 0.1):

super(Net, self).__init__()

# Network hyperparameters
self.num_chans = num_chans    # Number of channels in each input
self.batch_size = batch_size  # Number of series in each batch
self.n_convs = conv_layers    # Number of convolutional layers
self.n_filters = num_filters  # Number of filters in each convolutional layer
self.dropout_p = dropout_prob # Probability of each weight being zeroed in each dropout layer

# Get input shape from a training sample
a_series = train_df.RR_series
self.input_shape = (1, 1, max(np.shape(a_series)))
##print('Input shape: {}\n'.format(input_shape))

# List of kernel sizes (number of times-teps covered) for each convolutional layer
kernel_sizes =  + [4 + (2 * i) for i in range(self.n_convs - 1)]

# The list is reversed so the layers look for longer features first, then shorter, down to 2 samples wide
kernel_sizes = kernel_sizes[::-1]

# First layer
layers = [('conv1', nn.Conv1d(in_channels = self.input_shape,
out_channels = self.n_filters,
kernel_size = kernel_sizes)),
('relu1', nn.ReLU()),
('drop1', nn.Dropout(self.dropout_p, inplace = True))]

# Initialise a list for the length of the output of each convolution
# Value is initialised based on input length for use in recursive formula: L_out = (L_in - kernal_size) + 1
conv_out_len = (self.input_shape - kernel_sizes) + 1
conv_out_shapes = [(1, self.n_filters, conv_out_len)]

##print('Convolution: {}\nOutput shape: {}\n'.format(1, conv_out_shapes))

# Intermediate convolutional layers
for conv_n in range(2, self.n_convs + 1):

conv_out_len = (conv_out_len - kernel_sizes[conv_n - 1]) + 1
conv_out_shapes.append((1, self.n_filters, conv_out_len))
##print('Convolution: {}\nOutput shape: {}\n'.format(conv_n, conv_out_shapes[conv_n - 1]))

layers.append(('conv{}'.format(conv_n), nn.Conv1d(in_channels = self.n_filters,
out_channels = self.n_filters,
kernel_size = kernel_sizes[conv_n - 1])))

layers.append(('relu{}'.format(conv_n), nn.ReLU()))
layers.append(('drop{}'.format(conv_n), nn.Dropout(self.dropout_p, inplace = True)))

if conv_n == (self.n_convs): # Final layer

layers.append(('global_average_pool', nn.AvgPool1d(conv_out_len)))
layers.append(('flatten', flattener()))

# Linear output layer
layers.append(('dense', nn.Linear(in_features = self.n_filters, out_features = 1)))

layers_dict = OrderedDict(layers)

# Initialise the weights, sampling from a normal Kaiming He distribution
for conv_n in range(self.n_convs):

#print(layers_dict['conv' + str(conv_n + 1)])
torch.nn.init.kaiming_normal_(layers_dict['conv' + str(conv_n + 1)].weight, nonlinearity = 'relu')

torch.nn.init.kaiming_normal_(layers_dict['dense'].weight, nonlinearity = 'relu')

# Construct the network
self.net = nn.Sequential(layers_dict)

#################################### old stuff ################################
#pd.DataFrame(layers_dict, index = ['start'])
#torch.nn.init.kaiming_normal_(self.conv1.weight, nonlinearity = 'relu')
#torch.nn.init.kaiming_normal_(self.conv2.weight, nonlinearity = 'relu')
#torch.nn.init.kaiming_normal_(self.conv3.weight, nonlinearity = 'relu')
#torch.nn.init.kaiming_normal_(self.conv4.weight, nonlinearity = 'relu')
#torch.nn.init.kaiming_normal_(self.dense.weight, nonlinearity = 'relu')

# Moving to GPU
#self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#print('Device: ', self.device)
#self.net.to(self.device)

#print(self.net)

def forward(self, x):

x = x.view(1, 1, self.input_shape)
#x = x.to(self.device)
print('passed okay')
# These lines are necessary to prevent errors when the loss function is called
x = torch.where(torch.isnan(x), torch.zeros_like(x), x)
x = torch.where(torch.isinf(x), torch.zeros_like(x), x)

return self.net(x)#.view(-1, 1)'''
``````

‘’’
net = NeuralNetBinaryClassifier(module = Net(num_filters = 128, batch_size = 32),
criterion = nn.BCEWithLogitsLoss,
max_epochs = 100,
lr = 0.001,
train_split = None,
device = ‘cuda’,