Aggregating Predictions before Computing Loss

arjung · July 28, 2019, 3:20pm

I am working on binary classification with time-series data. My approach is to, given a 30-second long signal, split it into n (overlapping) 2-second long signals, feed each one of these n 2-second long signals into the model, get n predictions as output in an array, and then compute the prediction of the whole 30-second long signal by aggregating the n predictions, before using this single prediction in the loss.

My model isn’t learning, and I suspect it is because I have not adequately set .requires_grad=true in the right places during this entire computation.

Here’s my code:

input_size = 30000
window_size = 2000
window_step = 100

for i in range(batch_size):
            
    signal = torch.reshape(batch_x[i], (1, 1, input_size))        
    steps = ((input_size - window_size) / window_step) + 1
            
    agg_pred = []    
    loss = 0
    for j in range(steps):
                
        signal_segment = signal[:, :, j*window_step:j*window_step + window_size]
        y_pred = model(signal_segment)
                               
        agg_pred.append(int(round(y_pred)))
                
    signal_pred = torch.tensor(np.argmax(np.bincount(np.array(agg_pred))), dtype=torch.float, device="cuda", requires_grad=True)
            
    loss += nn.BCELoss(signal_pred, batch_y[i])
    loss_values.append(loss)
            
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Any ideas as to what I could try modifying?

ptrblck · July 30, 2019, 10:32pm

You are detaching the computation graph in a few places:

int(round(y_pred)) will detach y_pred so that the backward call cannot compute the gradients of all preceding operations
numpy operations are not differentiable in PyTorch, so you would have to use the corresponding PyTorch method or write the backward function manually
re-wrapping the output in a torch.tensor(, requires_grad=True) will also detach the input from all preceding operations