Encoder-Decoder model for variable length waveform

Hi There,
I want to build a network which inputs a variable length waveform and then outputs exactly the same length waveform as input. Is there any recommendations to build such network. I try to use conv1d and transposed1d to build the network, but during the downsample stage some layers would round the size of last axis to an int, and cause the final output waveform smaller than the input.

Thanks