I am trying to feed raw vibration to a GRU layer. The idea of feeding these vibration raw data, is so the GRU layer extract the features of the raw vibration data, so it can optimize the feature extraction to get better prediction.
This is the GRU layer I have:
class GruPeakFR(nn.Module):
def __init__(self, num_inputs, num_hiddens=1, sigma=0.01):
self.num_hiddens = num_hiddens
init_weight = lambda * shape: nn.Parameter(torch.randn(*shape) * sigma)
# Calculate the size of the rfft output
self.rfft_size = num_inputs # // 2 + 1
triple = lambda: (init_weight(num_inputs, num_hiddens),
init_weight(num_hiddens, num_hiddens),
self.W_xz, self.W_hz, self.b_z = triple() # Update gate
self.W_xr, self.W_hr, self.b_r = triple() # Reset gate
self.W_xh, self.W_hh, self.b_h = triple() # Candidate hidden state
self.input_dropout = nn.Dropout(0.0)
self.input_dropout = nn.Dropout(0.0)
def forward(self, inputs, H=None):
if H is None:
H = torch.zeros((inputs.shape[1], self.num_hiddens), device=inputs.device)
outputs = []
for X in inputs:
fft_x = torch.fft.fft(X).real
update_gate = (torch.matmul(fft_x, self.W_xz) +
torch.matmul(H, self.W_hz) + self.b_z)
Z_fft = torch.sigmoid(update_gate)
Z = torch.fft.ifft(Z_fft).real
Z_peak = calculate_the_peak(Z)
reset_gate = (torch.matmul(fft_x, self.W_xr) +
torch.matmul(H, self.W_hr) + self.b_r)
R_fft = torch.sigmoid(reset_gate)
R = torch.fft.ifft(R_fft).real
R_peak = calculate_the_peak(R)
# Z = torch.fft.irfft(Z_fft, n=X.size(-1))
candidate_hidden_gate = torch.matmul(fft_x, self.W_xh) + torch.matmul(R* H, self.W_hh) + self.b_h
candidate_hidden_gate = torch.fft.ifft(candidate_hidden_gate).real
# candidate_hidden_gate = calculate_the_peak(candidate_hidden_gate)
H_tilde = torch.tanh(candidate_hidden_gate)
H_tilde = calculate_the_peak(H_tilde)
H = Z_peak * H + (1 - Z_peak) * H_tilde
# print(H)
# print("Z shape:", Z.shape)
# print("R shape:", R.shape)
# print("H shape:", H.shape)
# print("R * H shape:", (R * H).shape)
# print("self.W_hh shape:", self.W_hh.shape)
outputs = torch.stack(outputs, dim=0)
if outputs.isnan().any():
print("Gru Peak giving NaN values")
return outputs, H
As the sigmoid function destroy the vibration signal, I use the FFT real values to prevent this. I also tried by getting the absolute value (the magnitude), After the sigmoid and tanh function, I extract the max value of the recreated vibration signal with the ifft.
It does train, however it fails on the test set. I was thinking that this layer increase the risk of overfitting and tried different methods to prevent overfitting, but with no avail. Iām still new with working with RNN and trying to change their way how their work. If there are people more used to GRU that can give some tips about creating your own custom GRU layers to tailor it to my task.
One possibility is to change the layer by a Convolutional and changing the kernel, but I would like to give this GRU one more shoot before giving up on it. If you have also resources regarding GRU that can help me understand better how to custom them, I am all ears.
Thanks !