Noob here ! Need help with "RuntimeError: Input type (torch.FloatTensor)"

Hello everyone, First of all I would like to point out that I am a total noob on python etc, and that I have been trying to understand for 3 days what I am doing.

So there you have it, I would like to use the Audio-File-Translator—S2ST software on my pc, after having difficulty installing everything via python and commands, I finally managed to open the software. I put a 30 second test mp3 that was lying around to see the result, at first everything went well until I got this error at the end. If I understand correctly it’s a CPU problem etc. but I really don’t know how to solve this problem, being a total noob in this area. Could someone help me get the software working please?
This would help me a lot with my studies!!
Thanks in advance ! :slight_smile:

PS: I hope my post is in the right category

C:\Users\K>audio-file-translator
pygame 2.6.0 (SDL 2.28.4, Python 3.12.2)
Hello from the pygame community. https://www.pygame.org/contribute.html
Selected file: C:/Users/K/Downloads/hypnosis-to-sleep-in-15-minutes-dark-screen-voice-only-no-music-128-ytshorts.savetube.me.mp3
Selected file: C:/Users/K/Downloads/trad01.mp3_chunk1.mp3
num_chunks: 1
ffmpeg version N-116412-g97a708a507-20240724 Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 14.1.0 (crosstool-NG 1.26.0.93_a87bf7f)
  configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --enable-shared --disable-static --disable-w32threads --enable-pthreads --enable-iconv --enable-zlib --enable-libfreetype --enable-libfribidi --enable-gmp --enable-libxml2 --enable-fontconfig --enable-libharfbuzz --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-chromaprint --enable-libdav1d --enable-libdavs2 --enable-libdvdread --enable-libdvdnav --disable-libfdk-aac --enable-ffnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libaribcaption --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libvpl --enable-openal --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --enable-vaapi --enable-libvidstab --enable-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-libs=-lgomp --extra-ldflags=-pthread --extra-ldexeflags= --cc=x86_64-w64-mingw32-gcc --cxx=x86_64-w64-mingw32-g++ --ar=x86_64-w64-mingw32-gcc-ar --ranlib=x86_64-w64-mingw32-gcc-ranlib --nm=x86_64-w64-mingw32-gcc-nm --extra-version=20240724
  libavutil      59. 28.100 / 59. 28.100
  libavcodec     61. 10.100 / 61. 10.100
  libavformat    61.  5.101 / 61.  5.101
  libavdevice    61.  2.100 / 61.  2.100
  libavfilter    10.  2.102 / 10.  2.102
  libswscale      8.  2.100 /  8.  2.100
  libswresample   5.  2.100 /  5.  2.100
  libpostproc    58.  2.100 / 58.  2.100
Input #0, mp3, from 'C:/Users/K/Downloads/trad01.mp3_chunk1.mp3':
  Metadata:
    encoder         : Lavf61.5.101
  Duration: 00:00:30.00, start: 0.011021, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3 (mp3float), 48000 Hz, stereo, fltp, 128 kb/s
      Metadata:
        encoder         : Lavc59.37
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to 'C:/Users/K/Downloads/ttt.mp3_chunk1.mp3':
  Metadata:
    TSSE            : Lavf61.5.101
  Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 128 kb/s
      Metadata:
        encoder         : Lavc59.37
Press [q] to stop, [?] for help
[out#0/mp3 @ 0000029da0fbee00] video:0KiB audio:468KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.089238%
size=     469KiB time=00:00:29.98 bitrate= 128.1kbits/s speed=4.11e+03x
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
ERROR:root:Error processing audio: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
ERROR:root:Error during translation: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
Exception in thread Thread-1 (run_translation):
Traceback (most recent call last):
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\AudioFileTranslator_S2ST\translator_gui.py", line 226, in run_translation
    translation_result = self.translator_instance.process_audio_chunk(chunk_output_path,
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\AudioFileTranslator_S2ST\audio_translator.py", line 74, in process_audio_chunk
    predicted_ids = self.model.generate(input_features["input_features"], forced_decoder_ids=forced_decoder_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 658, in generate
    ) = self.generate_with_fallback(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 801, in generate_with_fallback
    seek_outputs = super().generate(
                   ^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\generation\utils.py", line 1733, in generate
    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\generation\utils.py", line 548, in _prepare_encoder_decoder_kwargs_for_generation
    model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\models\whisper\modeling_whisper.py", line 1026, in forward
    inputs_embeds = nn.functional.gelu(self.conv1(input_features))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\conv.py", line 310, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\K\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

C:\Users\K>out, labels = out.type(torch.cuda.FloatTensor), labels.type(torch.cuda.FloatTensor)
'out' n’est pas reconnu en tant que commande interne
ou externe, un programme exécutable ou un fichier de commandes.

Somewhere in your code, there is an interaction happening between two values where one is on the gpu and the other isn’t.

Make sure all your tensors are on cuda. .to(device) is a nice way of doing this by initialising device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') at the beginning of your code.

To check the device of a variable:

tensor = torch.randn(3, 4)
print(tensor.device)