Posting this partially as a working example for people that are struggling, and partially for feedback as I am brand new to torch.
I have individual images of shape: width x height. There is no channel (aka single channel) because it’s grayscale.
sample.shape==[160, 120]
With a batch size of 5, I get the shape: sample x width x height.
batch.shape==[5, 160, 120]
← width will serve as 1d input channel
def fn_build(): # wrapped just to try init strategies. i know functional api exists.
model = nn.Sequential(
#Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
nn.Conv1d(
in_channels=160 # running with `in_channels` as the width of the image.
, out_channels=56 # arbitrary number. treating this as network complexity.
, kernel_size=3
, padding=1
)
, nn.MaxPool1d(kernel_size=2, stride=2)
#, nn.BatchNorm1d(56,56) # stuck at local minimum (0.68)
, nn.ReLU() #wasnt learning with tanh
, nn.Dropout(p=0.4)
, nn.Conv1d(
in_channels=56, out_channels=128, # expand to get more granular featurespace.
kernel_size=3, padding=1
)
, nn.MaxPool1d(kernel_size=2, stride=2)
#, nn.BatchNorm1d(128,128) # stuck at local minimum (0.68)
, nn.ReLU() #wasnt learning with tanh
, nn.Dropout(p=0.4)
# output [5x1440]
, nn.Flatten()
, nn.Linear(3840, 3840)
, nn.BatchNorm1d(3840, 3840)
, nn.ReLU()
, nn.Dropout(p=0.4)
, nn.Linear(3840, 1)
, nn.Sigmoid()
)
# ------ Tried init but it didn't learn (0.50 flatline) ------
# def initialize_layerz(m):
# if type(m) == nn.Linear:
# nn.init.uniform_(m.weight)
# if type(m) == nn.Conv1d:
# nn.init.normal_(m.weight)
# model.apply(initialize_layerz)
return model
- Padding is an obvious area for improvement, but I’m more concerned with topology at this stage in learning.
- Is there a way I can use less neurons (3840) in my receiving Linear layer?
- Would a multidimensional shape of linear layer be faster/ possible?
- Is there anything obviously wrong with those init lines?
- All criticism welcome.