Conv1d on grayscale image. How to improve?

HashRocketSyntax · April 12, 2021, 11:14pm

Posting this partially as a working example for people that are struggling, and partially for feedback as I am brand new to torch.

I have individual images of shape: width x height. There is no channel (aka single channel) because it’s grayscale.
sample.shape==[160, 120]

With a batch size of 5, I get the shape: sample x width x height.
batch.shape==[5, 160, 120] ← width will serve as 1d input channel

def fn_build(): # wrapped just to try init strategies. i know functional api exists.

    model = nn.Sequential(
        #Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
        nn.Conv1d(
            in_channels=160 # running with `in_channels` as the width of the image.
            , out_channels=56 # arbitrary number. treating this as network complexity.
            , kernel_size=3
            , padding=1
        )
        , nn.MaxPool1d(kernel_size=2, stride=2)
        #, nn.BatchNorm1d(56,56) # stuck at local minimum (0.68)
        , nn.ReLU() #wasnt learning with tanh
        , nn.Dropout(p=0.4)

        , nn.Conv1d(
            in_channels=56, out_channels=128, # expand to get more granular featurespace.
            kernel_size=3, padding=1
        )
        , nn.MaxPool1d(kernel_size=2, stride=2)
        #, nn.BatchNorm1d(128,128) # stuck at local minimum (0.68)
        , nn.ReLU() #wasnt learning with tanh
        , nn.Dropout(p=0.4)
        # output [5x1440]

        , nn.Flatten()
        , nn.Linear(3840, 3840)
        , nn.BatchNorm1d(3840, 3840)
        , nn.ReLU()
        , nn.Dropout(p=0.4)

        , nn.Linear(3840, 1)
        , nn.Sigmoid()
    )

# ------ Tried init but it didn't learn (0.50 flatline) ------
#     def initialize_layerz(m):
#         if type(m) == nn.Linear:
#             nn.init.uniform_(m.weight)
#         if type(m) == nn.Conv1d:
#             nn.init.normal_(m.weight)
            
#     model.apply(initialize_layerz)

    return model

Padding is an obvious area for improvement, but I’m more concerned with topology at this stage in learning.
Is there a way I can use less neurons (3840) in my receiving Linear layer?
Would a multidimensional shape of linear layer be faster/ possible?
Is there anything obviously wrong with those init lines?
All criticism welcome.