Hello everyone,
When updating torch from 0.4.0 to 0.4.1 it seems like the number of frames computed by STFT has changed for a same given signal length. I would like to ask some clarifications on how to relate signal duration and number of STFT frames in the 0.4.1 version.
I use sr=22050 ; n_fft = 2048 ; win_size = 1024 (hann periodic) ; hop_size = 256 (75% overlap)
considering a single time series " data_in " of size " data_len "
I crop the signal to the number of frames up to the last complete window hop:
n_frames = int(np.floor((data_len-win_size)/hop_size))+1
crop_len = win_size+(n_frames-1)*hop_size
STFT = torch.stft(data_in[:crop_len],n_fft,hop_length=hop_size,win_length=win_size,window=hann_window,center=True,pad_mode=‘reflect’,normalized=False,onesided=True)
Then STFT has the shape (1025, n_frames+4) which is the right number of onesided frequency bins but 4 more frames than what I expected since I cropped the signal.
The 4 frames are consistently added to my initial calculation for a dataset of audio files with several different input length so it seems the correct formula could be
n_frames = int(np.floor((data_len-win_size)/hop_size))+5
Could anyone clarify this point please ?
How can I take an input signal length, infer the number of frames that fit in it without padding, cropping the input signal and then compute STFT that yields the expected number of frames ?
Thanks in advance !