Network architecture to find "X" pattern in time series data

Dear community members,

I am kind of stuck - maybe you can help me.
The time series pattern that I would like to binary classify consists of 3 features that I will call 1,2,3 in the following.
An ok time series pattern consists of a linearly increasing feature and a linearly decreasing feature with the same slope. I call this “X” pattern in the following. The challenge is that this X pattern can be on any combination 1,2 // 2,3 // 1,3. The third feature just has some different slope linear increase, is constant or has some other time behaviour.
A nok time series pattern follows no “X” time pattern.

Mathematically this should be solvable with a CNN consisting of 3 filter pairs:
a) linear increasing on channel 1
b) linear increasing on channel 1, linear decreasing on channel 2 (with exact opposite slope)
and this for all 3 combinations.
If an X pattern occurs on channel 1,2 then a) activates with strength a_1 = s, while b) activates with strength a_2=2s. Using thus the following Gaussian nonlinearity exp(-((2a_1-a_2+eps)/(a_1-a_2+eps))**2) will then activate strongly if a_12=a_2 that is an X pattern is present. Here eps is any small number 1e-12.

How can I implement this idea in pytorch? All my tries do not converge, here is one, where I let the network learn 3 filters (see b) above), then constrain the other 3 filters to be just a copy of the learned with additional 0 activations to enforce the structure a) above:

def nonlinear(u,v):
    eps = 1e-20
    fac1 = 2*u-v+eps
    fac2 = u-v+eps
    # you want to keep the width of the Gaussian narrow, to have stronger specificity to class 1: 2*u=v (!)
    res = torch.exp(-torch.pow((fac1/fac2)/0.2,2)) # differentiable non-linearity: Gaussian :)
    return res

class CNN(nn.Module):

    def __init__(self, input_size, output_size, n_feature,  kernel_sz):
        super(CNN, self).__init__()
        self.n_feature = n_feature
        assert n_feature == 3
        self.conv2a = nn.Conv2d(in_channels=1, out_channels=n_feature, kernel_size=kernel_sz, stride=kernel_sz[1], bias=False)
        self.conv2b = nn.Conv2d(in_channels=1, out_channels=n_feature, kernel_size=kernel_sz, stride=kernel_sz[1], bias=False)
        self.sig = nn.Softsign()
        self.relu = nn.ReLU()
        
    def forward(self, x, verbose=False):
        tsteps = x.shape[-1]
        
        self.conv2a.weight.copy_(self.conv2b.weight.detach())
        for ii, tup in enumerate([(1,2),(0,2),(0,1)]):
            model_cnn.conv2a.weight[ii,:,tup[0],:] = 0
            model_cnn.conv2a.weight[ii,:,tup[1],:] = 0
   
        u = self.conv2a(x) # n_batch x channels x signals x timesteps => n_batch x filters x 1 x n_timesteps/kernel_sz
        v = self.conv2b(x)
        x = nonlinear(u,v) # calculate activations
        x = torch.mean(x, 3)  
        # max over filter/feature map => is one pair filter active?
        x = torch.max(x, 1).values # => n_batch x 1 x n_timesteps/kernel_sz  
        return x[:,0]

I know this network architecture is very specific still I do not understand why SGD has so much troubles finding the optimal filters.
What I also tried is just use a single conv2 with 6 filters and applying the nonlinearity between even and odd filters.

Regards and happy New Year :slight_smile: