Time-series lookup in loss function

So I’ve got an optimization problem where the goal is to predict a time series envelope; it’s the estimated fundamental pitch of a signal, at a series of equally spaced time steps.

Now the trouble starts. My loss function is as follows - roughly speaking it iterates through the predicted periods from the pitch envelope, testing how well it minimizes the difference successive audio segments of period-length (if the successive segments cancel each other out, the period must be correct) – all differences are accumulated to give an overall error value of the fitness of that pitch envelope to the audio signal.

def get_fold_alignment_loss(period_envelope: torch.Tensor, audio: torch.Tensor) \
        -> torch.Tensor:
    cume_offset = torch.zeros(1, device=period_envelope.device)
    error = torch.zeros(1, device=period_envelope.device)

    while cume_offset < audio.size(0):
        prog_fraction = cume_offset / audio.size(0)
        current_idx = int(prog_fraction * period_envelope.size(0))

        prev_val = period_envelope[current_idx]
        next_val = period_envelope[current_idx + 1]
        prev_prog = current_idx / period_envelope.size(0)
        next_prog = (current_idx + 1) / period_envelope.size(0)
        prev_period = torch.pow(2, prev_val)
        next_period = torch.pow(2, next_val)
        prog_to_next = (prog_fraction - prev_prog) / (next_prog - prev_prog)
        period = prog_to_next * next_period + (1 - prog_to_next) * prev_period

        prev_pos = max(0, round((cume_offset - period).item()))
        pos = round(cume_offset.item())
        next_pos = min(audio.size(0), round((cume_offset + period).item()))

        if next_pos - pos == 0:
        cume_offset += period
        min_size = min(pos - prev_pos, next_pos - pos)
        error += torch.linalg.norm(audio[pos - min_size:pos] - audio[pos + min_size])
    return error

Now this isn’t differentiable as I have needed to truncate continuous values to do look-ups into the learned time-series, and to calculate error quantities requires indexing into ranges of the signal.

In principle, the time series (audio and pitch envelope) could be represented by some high-degree polynomial or sinc basis functions… so that they could be ‘indexed’ by continuous values, but I have to wonder if there is a better way. Can I estimate the gradients with small random perturbations of the pitch envelope?

Would appreciate any help pointing me in the right direction.

I suppose just a hint as to whether this is appropriate; it’s my first try diving into the autograd stuff:

If providing my own gradients, I need to calculate d_loss / d_params, correct? So for each element in my learnable time series, what is the ratio of loss delta versus the adjustment delta.

What amount of smoothness is required to be smooth enough for Adam/SGD to function? This would, I believe, directly map to how many iterations of gradient sampling are required.