Posting this partially as a working example for people that are struggling, and partially for feedback as I am brand new to torch.
I have individual images of shape: width x height. There is no channel (aka single channel) because it’s grayscale.
With a batch size of 5, I get the shape: sample x width x height.
batch.shape==[5, 160, 120] ← width will serve as 1d input channel
def fn_build(): # wrapped just to try init strategies. i know functional api exists. model = nn.Sequential( #Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros') nn.Conv1d( in_channels=160 # running with `in_channels` as the width of the image. , out_channels=56 # arbitrary number. treating this as network complexity. , kernel_size=3 , padding=1 ) , nn.MaxPool1d(kernel_size=2, stride=2) #, nn.BatchNorm1d(56,56) # stuck at local minimum (0.68) , nn.ReLU() #wasnt learning with tanh , nn.Dropout(p=0.4) , nn.Conv1d( in_channels=56, out_channels=128, # expand to get more granular featurespace. kernel_size=3, padding=1 ) , nn.MaxPool1d(kernel_size=2, stride=2) #, nn.BatchNorm1d(128,128) # stuck at local minimum (0.68) , nn.ReLU() #wasnt learning with tanh , nn.Dropout(p=0.4) # output [5x1440] , nn.Flatten() , nn.Linear(3840, 3840) , nn.BatchNorm1d(3840, 3840) , nn.ReLU() , nn.Dropout(p=0.4) , nn.Linear(3840, 1) , nn.Sigmoid() ) # ------ Tried init but it didn't learn (0.50 flatline) ------ # def initialize_layerz(m): # if type(m) == nn.Linear: # nn.init.uniform_(m.weight) # if type(m) == nn.Conv1d: # nn.init.normal_(m.weight) # model.apply(initialize_layerz) return model
- Padding is an obvious area for improvement, but I’m more concerned with topology at this stage in learning.
- Is there a way I can use less neurons (3840) in my receiving Linear layer?
- Would a multidimensional shape of linear layer be faster/ possible?
- Is there anything obviously wrong with those init lines?
- All criticism welcome.