So I’ve got an optimization problem where the goal is to predict a time series envelope; it’s the estimated fundamental pitch of a signal, at a series of equally spaced time steps.
Now the trouble starts. My loss function is as follows - roughly speaking it iterates through the predicted periods from the pitch envelope, testing how well it minimizes the difference successive audio segments of period-length (if the successive segments cancel each other out, the period must be correct) – all differences are accumulated to give an overall error value of the fitness of that pitch envelope to the audio signal.
def get_fold_alignment_loss(period_envelope: torch.Tensor, audio: torch.Tensor) \ -> torch.Tensor: cume_offset = torch.zeros(1, device=period_envelope.device) error = torch.zeros(1, device=period_envelope.device) while cume_offset < audio.size(0): prog_fraction = cume_offset / audio.size(0) current_idx = int(prog_fraction * period_envelope.size(0)) prev_val = period_envelope[current_idx] next_val = period_envelope[current_idx + 1] prev_prog = current_idx / period_envelope.size(0) next_prog = (current_idx + 1) / period_envelope.size(0) prev_period = torch.pow(2, prev_val) next_period = torch.pow(2, next_val) prog_to_next = (prog_fraction - prev_prog) / (next_prog - prev_prog) period = prog_to_next * next_period + (1 - prog_to_next) * prev_period prev_pos = max(0, round((cume_offset - period).item())) pos = round(cume_offset.item()) next_pos = min(audio.size(0), round((cume_offset + period).item())) if next_pos - pos == 0: break cume_offset += period min_size = min(pos - prev_pos, next_pos - pos) error += torch.linalg.norm(audio[pos - min_size:pos] - audio[pos + min_size]) return error
Now this isn’t differentiable as I have needed to truncate continuous values to do look-ups into the learned time-series, and to calculate error quantities requires indexing into ranges of the signal.
In principle, the time series (audio and pitch envelope) could be represented by some high-degree polynomial or sinc basis functions… so that they could be ‘indexed’ by continuous values, but I have to wonder if there is a better way. Can I estimate the gradients with small random perturbations of the pitch envelope?
Would appreciate any help pointing me in the right direction.