So I’ve got an optimization problem where the goal is to predict a time series envelope; it’s the estimated fundamental pitch of a signal, at a series of equally spaced time steps.
Now the trouble starts. My loss function is as follows - roughly speaking it iterates through the predicted periods from the pitch envelope, testing how well it minimizes the difference successive audio segments of period-length (if the successive segments cancel each other out, the period must be correct) – all differences are accumulated to give an overall error value of the fitness of that pitch envelope to the audio signal.
def get_fold_alignment_loss(period_envelope: torch.Tensor, audio: torch.Tensor) \
-> torch.Tensor:
cume_offset = torch.zeros(1, device=period_envelope.device)
error = torch.zeros(1, device=period_envelope.device)
while cume_offset < audio.size(0):
prog_fraction = cume_offset / audio.size(0)
current_idx = int(prog_fraction * period_envelope.size(0))
prev_val = period_envelope[current_idx]
next_val = period_envelope[current_idx + 1]
prev_prog = current_idx / period_envelope.size(0)
next_prog = (current_idx + 1) / period_envelope.size(0)
prev_period = torch.pow(2, prev_val)
next_period = torch.pow(2, next_val)
prog_to_next = (prog_fraction - prev_prog) / (next_prog - prev_prog)
period = prog_to_next * next_period + (1 - prog_to_next) * prev_period
prev_pos = max(0, round((cume_offset - period).item()))
pos = round(cume_offset.item())
next_pos = min(audio.size(0), round((cume_offset + period).item()))
if next_pos - pos == 0:
break
cume_offset += period
min_size = min(pos - prev_pos, next_pos - pos)
error += torch.linalg.norm(audio[pos - min_size:pos] - audio[pos + min_size])
return error
Now this isn’t differentiable as I have needed to truncate continuous values to do look-ups into the learned time-series, and to calculate error quantities requires indexing into ranges of the signal.
In principle, the time series (audio and pitch envelope) could be represented by some high-degree polynomial or sinc basis functions… so that they could be ‘indexed’ by continuous values, but I have to wonder if there is a better way. Can I estimate the gradients with small random perturbations of the pitch envelope?
Would appreciate any help pointing me in the right direction.