Dear community members,
I am kind of stuck - maybe you can help me.
The time series pattern that I would like to binary classify consists of 3 features that I will call 1,2,3 in the following.
An ok time series pattern consists of a linearly increasing feature and a linearly decreasing feature with the same slope. I call this “X” pattern in the following. The challenge is that this X pattern can be on any combination 1,2 // 2,3 // 1,3. The third feature just has some different slope linear increase, is constant or has some other time behaviour.
A nok time series pattern follows no “X” time pattern.
Mathematically this should be solvable with a CNN consisting of 3 filter pairs:
a) linear increasing on channel 1
b) linear increasing on channel 1, linear decreasing on channel 2 (with exact opposite slope)
and this for all 3 combinations.
If an X pattern occurs on channel 1,2 then a) activates with strength a_1 = s, while b) activates with strength a_2=2s. Using thus the following Gaussian nonlinearity exp(-((2a_1-a_2+eps)/(a_1-a_2+eps))**2) will then activate strongly if a_12=a_2 that is an X pattern is present. Here eps is any small number 1e-12.
How can I implement this idea in pytorch? All my tries do not converge, here is one, where I let the network learn 3 filters (see b) above), then constrain the other 3 filters to be just a copy of the learned with additional 0 activations to enforce the structure a) above:
def nonlinear(u,v):
eps = 1e-20
fac1 = 2*u-v+eps
fac2 = u-v+eps
# you want to keep the width of the Gaussian narrow, to have stronger specificity to class 1: 2*u=v (!)
res = torch.exp(-torch.pow((fac1/fac2)/0.2,2)) # differentiable non-linearity: Gaussian :)
return res
class CNN(nn.Module):
def __init__(self, input_size, output_size, n_feature, kernel_sz):
super(CNN, self).__init__()
self.n_feature = n_feature
assert n_feature == 3
self.conv2a = nn.Conv2d(in_channels=1, out_channels=n_feature, kernel_size=kernel_sz, stride=kernel_sz[1], bias=False)
self.conv2b = nn.Conv2d(in_channels=1, out_channels=n_feature, kernel_size=kernel_sz, stride=kernel_sz[1], bias=False)
self.sig = nn.Softsign()
self.relu = nn.ReLU()
def forward(self, x, verbose=False):
tsteps = x.shape[-1]
self.conv2a.weight.copy_(self.conv2b.weight.detach())
for ii, tup in enumerate([(1,2),(0,2),(0,1)]):
model_cnn.conv2a.weight[ii,:,tup[0],:] = 0
model_cnn.conv2a.weight[ii,:,tup[1],:] = 0
u = self.conv2a(x) # n_batch x channels x signals x timesteps => n_batch x filters x 1 x n_timesteps/kernel_sz
v = self.conv2b(x)
x = nonlinear(u,v) # calculate activations
x = torch.mean(x, 3)
# max over filter/feature map => is one pair filter active?
x = torch.max(x, 1).values # => n_batch x 1 x n_timesteps/kernel_sz
return x[:,0]
I know this network architecture is very specific still I do not understand why SGD has so much troubles finding the optimal filters.
What I also tried is just use a single conv2 with 6 filters and applying the nonlinearity between even and odd filters.
Regards and happy New Year