Dear community members,
I am kind of stuck - maybe you can help me.
The time series pattern that I would like to binary classify consists of 3 features that I will call 1,2,3 in the following.
An ok time series pattern consists of a linearly increasing feature and a linearly decreasing feature with the same slope. I call this “X” pattern in the following. The challenge is that this X pattern can be on any combination 1,2 // 2,3 // 1,3. The third feature just has some different slope linear increase, is constant or has some other time behaviour.
A nok time series pattern follows no “X” time pattern.
Mathematically this should be solvable with a CNN consisting of 3 filter pairs:
a) linear increasing on channel 1
b) linear increasing on channel 1, linear decreasing on channel 2 (with exact opposite slope)
and this for all 3 combinations.
If an X pattern occurs on channel 1,2 then a) activates with strength a_1 = s, while b) activates with strength a_2=2s. Using thus the following Gaussian nonlinearity exp(-((2a_1-a_2+eps)/(a_1-a_2+eps))**2) will then activate strongly if a_12=a_2 that is an X pattern is present. Here eps is any small number 1e-12.
How can I implement this idea in pytorch? All my tries do not converge, here is one, where I let the network learn 3 filters (see b) above), then constrain the other 3 filters to be just a copy of the learned with additional 0 activations to enforce the structure a) above:
def nonlinear(u,v): eps = 1e-20 fac1 = 2*u-v+eps fac2 = u-v+eps # you want to keep the width of the Gaussian narrow, to have stronger specificity to class 1: 2*u=v (!) res = torch.exp(-torch.pow((fac1/fac2)/0.2,2)) # differentiable non-linearity: Gaussian :) return res class CNN(nn.Module): def __init__(self, input_size, output_size, n_feature, kernel_sz): super(CNN, self).__init__() self.n_feature = n_feature assert n_feature == 3 self.conv2a = nn.Conv2d(in_channels=1, out_channels=n_feature, kernel_size=kernel_sz, stride=kernel_sz, bias=False) self.conv2b = nn.Conv2d(in_channels=1, out_channels=n_feature, kernel_size=kernel_sz, stride=kernel_sz, bias=False) self.sig = nn.Softsign() self.relu = nn.ReLU() def forward(self, x, verbose=False): tsteps = x.shape[-1] self.conv2a.weight.copy_(self.conv2b.weight.detach()) for ii, tup in enumerate([(1,2),(0,2),(0,1)]): model_cnn.conv2a.weight[ii,:,tup,:] = 0 model_cnn.conv2a.weight[ii,:,tup,:] = 0 u = self.conv2a(x) # n_batch x channels x signals x timesteps => n_batch x filters x 1 x n_timesteps/kernel_sz v = self.conv2b(x) x = nonlinear(u,v) # calculate activations x = torch.mean(x, 3) # max over filter/feature map => is one pair filter active? x = torch.max(x, 1).values # => n_batch x 1 x n_timesteps/kernel_sz return x[:,0]
I know this network architecture is very specific still I do not understand why SGD has so much troubles finding the optimal filters.
What I also tried is just use a single conv2 with 6 filters and applying the nonlinearity between even and odd filters.
Regards and happy New Year