Hello everyone, First of all I would like to point out that I am a total noob on python etc, and that I have been trying to understand for 3 days what I am doing.
So there you have it, I would like to use the Audio-File-Translator—S2ST software on my pc, after having difficulty installing everything via python and commands, I finally managed to open the software. I put a 30 second test mp3 that was lying around to see the result, at first everything went well until I got this error at the end. If I understand correctly it’s a CPU problem etc. but I really don’t know how to solve this problem, being a total noob in this area. Could someone help me get the software working please?
This would help me a lot with my studies!!
Thanks in advance ! ![]()
PS: I hope my post is in the right category
C:\Users\K>audio-file-translator
pygame 2.6.0 (SDL 2.28.4, Python 3.12.2)
Hello from the pygame community. https://www.pygame.org/contribute.html
Selected file: C:/Users/K/Downloads/hypnosis-to-sleep-in-15-minutes-dark-screen-voice-only-no-music-128-ytshorts.savetube.me.mp3
Selected file: C:/Users/K/Downloads/trad01.mp3_chunk1.mp3
num_chunks: 1
ffmpeg version N-116412-g97a708a507-20240724 Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 14.1.0 (crosstool-NG 1.26.0.93_a87bf7f)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --enable-shared --disable-static --disable-w32threads --enable-pthreads --enable-iconv --enable-zlib --enable-libfreetype --enable-libfribidi --enable-gmp --enable-libxml2 --enable-fontconfig --enable-libharfbuzz --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-chromaprint --enable-libdav1d --enable-libdavs2 --enable-libdvdread --enable-libdvdnav --disable-libfdk-aac --enable-ffnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libaribcaption --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libvpl --enable-openal --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --enable-vaapi --enable-libvidstab --enable-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-libs=-lgomp --extra-ldflags=-pthread --extra-ldexeflags= --cc=x86_64-w64-mingw32-gcc --cxx=x86_64-w64-mingw32-g++ --ar=x86_64-w64-mingw32-gcc-ar --ranlib=x86_64-w64-mingw32-gcc-ranlib --nm=x86_64-w64-mingw32-gcc-nm --extra-version=20240724
libavutil 59. 28.100 / 59. 28.100
libavcodec 61. 10.100 / 61. 10.100
libavformat 61. 5.101 / 61. 5.101
libavdevice 61. 2.100 / 61. 2.100
libavfilter 10. 2.102 / 10. 2.102
libswscale 8. 2.100 / 8. 2.100
libswresample 5. 2.100 / 5. 2.100
libpostproc 58. 2.100 / 58. 2.100
Input #0, mp3, from 'C:/Users/K/Downloads/trad01.mp3_chunk1.mp3':
Metadata:
encoder : Lavf61.5.101
Duration: 00:00:30.00, start: 0.011021, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 48000 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc59.37
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to 'C:/Users/K/Downloads/ttt.mp3_chunk1.mp3':
Metadata:
TSSE : Lavf61.5.101
Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc59.37
Press [q] to stop, [?] for help
[out#0/mp3 @ 0000029da0fbee00] video:0KiB audio:468KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.089238%
size= 469KiB time=00:00:29.98 bitrate= 128.1kbits/s speed=4.11e+03x
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
ERROR:root:Error processing audio: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
ERROR:root:Error during translation: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
Exception in thread Thread-1 (run_translation):
Traceback (most recent call last):
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
self.run()
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\AudioFileTranslator_S2ST\translator_gui.py", line 226, in run_translation
translation_result = self.translator_instance.process_audio_chunk(chunk_output_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\AudioFileTranslator_S2ST\audio_translator.py", line 74, in process_audio_chunk
predicted_ids = self.model.generate(input_features["input_features"], forced_decoder_ids=forced_decoder_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 658, in generate
) = self.generate_with_fallback(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 801, in generate_with_fallback
seek_outputs = super().generate(
^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\generation\utils.py", line 1733, in generate
model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\generation\utils.py", line 548, in _prepare_encoder_decoder_kwargs_for_generation
model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\models\whisper\modeling_whisper.py", line 1026, in forward
inputs_embeds = nn.functional.gelu(self.conv1(input_features))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\conv.py", line 310, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
C:\Users\K>out, labels = out.type(torch.cuda.FloatTensor), labels.type(torch.cuda.FloatTensor)
'out' n’est pas reconnu en tant que commande interne
ou externe, un programme exécutable ou un fichier de commandes.