some problem by use torch::stft with C++ API,but output Shape is different to Python API

macos, torch version:1.7.1

Python API:

import torch
x = torch.rand([2, 441344], dtype=float);
y = torch.stft(x, 4096)
output:torch.Size([2, 2049, 432, 2])

auto wav = torch::rand({2, 441344});
auto stft = torch::stft(wav, 4096);
std::cout << “output sizes:” << stft.sizes() << std::endl;
output sizes::[2, 2049, 428, 2]


I find that stft note:" librosa’s center and pad_mode arguments are currently only implemented in python because it uses torch.nn.functional.pad which is python-only."
but, C++ API has torch.nn.functional.pad also

Hello, I have had the same issue. I know I am late but it might be useful for someone else :
The C++ version of STFT does not have padding implemented by default whereas the Python version does, in “reflect” mode particularly.

In your case, n_fft = 4096 so the tensor should be padded by 2048 on each side before the STFT.
The “reflect” padding on a tensor like {0,1,2,3,4,5,6,7,8,9} with a padding of 3 on each sides would return : {3,2,1,0,1,2,3,4,5,6,7,8,9,8,7,6}. In C++, a code to do that would be something like :

torch::Tensor input = {0,1,2,3,4,5,6,7,8,9}
int length = input.sizes()[0]
int pad = 3
torch::Tensor left_pad = input.slice(0,1,pad+1).flip(0)
torch::Tensor right_pad = input.slice(0,length-pad-1,length-1).flip(0)
torch::Tensor output = torch::cat({left_pad,input,right_pad},0)

You may also need to use "view"s on your Tensors when working with multiple dimensions.
Hope this helps.