Conceptual Confusion about designing an architecture

I’m attempting to do a sequence CNN for a simple regression. I am wondering about the process of doing so.

I isolated a single datapoint, and changed it’s dimensions to accomodate the future model ( to also work with batches). e.g. my sequences are 35 length so i pass in a [1,1,35] TO DEBUG my network as I add layers I add the layer to my model class’s init then include it in the forward() call.

This works for the following code…

‘’’
class myCNN(torch.nn.Module):
def init(self):
super(myCNN, self).init()

    self.conv1d = torch.nn.Conv1d(in_channels=1, out_channels=8,kernel_size=2)
    self.norm = torch.nn.BatchNorm1d(8)
    self.relu = torch.nn.ReLU()
    self.pool = torch.nn.MaxPool1d(kernel_size=2)
    
    self.fc1 = torch.nn.Linear(136,1) #136 is correct for a SINGLE data point
    
def forward(self,x):
    #reshape x
    x = x.unsqueeze(1) #adds necessary dimension for Conv1D
    x = self.conv1d(x)
    x = self.norm(x)
    x = self.relu(x)
    x = self.pool(x)
    x = x.view(-1) # this flattens the layer
    x = self.fc1(x)
    
    return x

‘’’

However, as you can see I hardcoded the exact size in the self.fc1() layer. This is size for a single data point. When I change my batch size, it increases accordingly.

My questions are

  1. Do I need to pass batch_size in the init() call.

  2. shouldn’t there be a way to make it flexible to batch size, whether it is 1 or 128… this was possible with my linear models.

  3. how can I put in the correct size to the input_dimension for the linear layer?

Second question; I’m trying to replicate models that do stacks of Convolutions/Pooling.

I’m still looking for an example that demonstrates how this is done. It seems like I need to know the number of channels after each conv/pooling block to pass into the following block. How can I do this without hardcoding these values?

I hope this makes sense!

thanks

You could treat the use case in the same way variable input shapes are treated in vision models, i.e. by adding an adaptive pooling layer before passing the activation to the linear layer.
This will make sure that the expected number of features are constant and you will be able to pass different sequence length.
Also, flatten the activation via x = x.view(x.size(0), -1).

You could try torch.nn.LazyConv2d to avoid the need to define the number of input channels.