I am relatively new to deep learning, I began using PyTorch to build a convolutional binary classifier for time series two weeks ago. I got it to the stage where I can pass time series individually without error but now I want to speed things up by using batches.
In my reading I have come across a few different terms and symbols to describe dimensions of layers and inputs and I am having a hard time matching them up with each other. PyTorch docs for Conv1D has (in_channels, out_channels, kernel_size) as arguments and describes algebraic symbols N, C and L.
For a given network:
Input
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
Conv1d(in_channels, out_channels, kernel_size)
|
GlobAvPool()
|
Flattener of some kind e.g. .view(shape1, shape2)
|
Linear(shape1, 1) # 1 output for binary classifier
I have had this structure working for inputs of shape (1, 1, 556) which is a single time series of length 556 samples. When i change to batches, which numbers change? Are there restrictions on what batch_size can be?
So far my understanding is based on these terms and definitions:
-
kernel_size: the number of timestamps the kernel looks at
-
n_filters: the number of kernels, this seems to be the in_channels and out_channels for most conv (not the first one)
-
batch_size: how many time series am I feeding in at once
Note that the time series are just a single string of digits (does that mean one channel in analogy with an RGB image being 3 channels? or is the number of channels the batch size?)
Basically I need to understand how these shapes interplay so I can confidently write networks of varying structure and know which numbers go where.
I am wrapping my module in Skorch’s NeuralNetBinaryClassifier() which has made me much more confused but it has worked for batch_size = 1.
My code is below for reference but my priority is a conceptual answer to the above, relating to my context. Feel free to also point out anything silly that I have done if you like.
‘’‘class flattener(nn.Module):
def forward(self, x):
return x.view(self.batch_size, 1)’’’
‘’'class Net(nn.Module):
def __init__(self,
num_chans = 1,
batch_size = 1,
conv_layers = 4,
num_filters = 16,
dropout_prob = 0.1):
super(Net, self).__init__()
# Network hyperparameters
self.num_chans = num_chans # Number of channels in each input
self.batch_size = batch_size # Number of series in each batch
self.n_convs = conv_layers # Number of convolutional layers
self.n_filters = num_filters # Number of filters in each convolutional layer
self.dropout_p = dropout_prob # Probability of each weight being zeroed in each dropout layer
# Get input shape from a training sample
a_series = train_df.RR_series[0]
self.input_shape = (1, 1, max(np.shape(a_series)))
##print('Input shape: {}\n'.format(input_shape))
# List of kernel sizes (number of times-teps covered) for each convolutional layer
kernel_sizes = [2] + [4 + (2 * i) for i in range(self.n_convs - 1)]
# The list is reversed so the layers look for longer features first, then shorter, down to 2 samples wide
kernel_sizes = kernel_sizes[::-1]
# First layer
layers = [('conv1', nn.Conv1d(in_channels = self.input_shape[0],
out_channels = self.n_filters,
kernel_size = kernel_sizes[0])),
('relu1', nn.ReLU()),
('drop1', nn.Dropout(self.dropout_p, inplace = True))]
# Initialise a list for the length of the output of each convolution
# Value is initialised based on input length for use in recursive formula: L_out = (L_in - kernal_size) + 1
conv_out_len = (self.input_shape[2] - kernel_sizes[0]) + 1
conv_out_shapes = [(1, self.n_filters, conv_out_len)]
##print('Convolution: {}\nOutput shape: {}\n'.format(1, conv_out_shapes[0]))
# Intermediate convolutional layers
for conv_n in range(2, self.n_convs + 1):
conv_out_len = (conv_out_len - kernel_sizes[conv_n - 1]) + 1
conv_out_shapes.append((1, self.n_filters, conv_out_len))
##print('Convolution: {}\nOutput shape: {}\n'.format(conv_n, conv_out_shapes[conv_n - 1]))
layers.append(('conv{}'.format(conv_n), nn.Conv1d(in_channels = self.n_filters,
out_channels = self.n_filters,
kernel_size = kernel_sizes[conv_n - 1])))
layers.append(('relu{}'.format(conv_n), nn.ReLU()))
layers.append(('drop{}'.format(conv_n), nn.Dropout(self.dropout_p, inplace = True)))
if conv_n == (self.n_convs): # Final layer
layers.append(('global_average_pool', nn.AvgPool1d(conv_out_len)))
layers.append(('flatten', flattener()))
# Linear output layer
layers.append(('dense', nn.Linear(in_features = self.n_filters, out_features = 1)))
layers_dict = OrderedDict(layers)
# Initialise the weights, sampling from a normal Kaiming He distribution
for conv_n in range(self.n_convs):
#print(layers_dict['conv' + str(conv_n + 1)])
torch.nn.init.kaiming_normal_(layers_dict['conv' + str(conv_n + 1)].weight, nonlinearity = 'relu')
torch.nn.init.kaiming_normal_(layers_dict['dense'].weight, nonlinearity = 'relu')
# Construct the network
self.net = nn.Sequential(layers_dict)
#################################### old stuff ################################
#pd.DataFrame(layers_dict, index = ['start'])
#torch.nn.init.kaiming_normal_(self.conv1.weight, nonlinearity = 'relu')
#torch.nn.init.kaiming_normal_(self.conv2.weight, nonlinearity = 'relu')
#torch.nn.init.kaiming_normal_(self.conv3.weight, nonlinearity = 'relu')
#torch.nn.init.kaiming_normal_(self.conv4.weight, nonlinearity = 'relu')
#torch.nn.init.kaiming_normal_(self.dense.weight, nonlinearity = 'relu')
# Moving to GPU
#self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#print('Device: ', self.device)
#self.net.to(self.device)
#print(self.net)
def forward(self, x):
x = x.view(1, 1, self.input_shape[2])
#x = x.to(self.device)
print('passed okay')
# These lines are necessary to prevent errors when the loss function is called
x = torch.where(torch.isnan(x), torch.zeros_like(x), x)
x = torch.where(torch.isinf(x), torch.zeros_like(x), x)
return self.net(x)#.view(-1, 1)'''
‘’’
net = NeuralNetBinaryClassifier(module = Net(num_filters = 128, batch_size = 32),
criterion = nn.BCEWithLogitsLoss,
max_epochs = 100,
lr = 0.001,
train_split = None,
device = ‘cuda’,
optimizer = optim.Adam,
callbacks = [LRScheduler(policy = ‘ReduceLROnPlateau’,
factor = 0.5,
patience = 10,
min_lr = 1e-4)])’’’
Apologies for the dodgy formatting, not sure what’s going on there either.
Thanks!