Aggregating Predictions before Computing Loss

I am working on binary classification with time-series data. My approach is to, given a 30-second long signal, split it into n (overlapping) 2-second long signals, feed each one of these n 2-second long signals into the model, get n predictions as output in an array, and then compute the prediction of the whole 30-second long signal by aggregating the n predictions, before using this single prediction in the loss.

My model isn’t learning, and I suspect it is because I have not adequately set .requires_grad=true in the right places during this entire computation.

Here’s my code:

input_size = 30000
window_size = 2000
window_step = 100

for i in range(batch_size):
            
    signal = torch.reshape(batch_x[i], (1, 1, input_size))        
    steps = ((input_size - window_size) / window_step) + 1
            
    agg_pred = []    
    loss = 0
    for j in range(steps):
                
        signal_segment = signal[:, :, j*window_step:j*window_step + window_size]
        y_pred = model(signal_segment)
                               
        agg_pred.append(int(round(y_pred)))
                
    signal_pred = torch.tensor(np.argmax(np.bincount(np.array(agg_pred))), dtype=torch.float, device="cuda", requires_grad=True)
            
    loss += nn.BCELoss(signal_pred, batch_y[i])
    loss_values.append(loss)
            
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Any ideas as to what I could try modifying?

You are detaching the computation graph in a few places:

  • int(round(y_pred)) will detach y_pred so that the backward call cannot compute the gradients of all preceding operations
  • numpy operations are not differentiable in PyTorch, so you would have to use the corresponding PyTorch method or write the backward function manually
  • re-wrapping the output in a torch.tensor(, requires_grad=True) will also detach the input from all preceding operations