How to use pytroch to realize custom learning a, b parameters?

My task is multimodal emotion recogntion. How to fuse different modality is challenging and difficult.
the dimension of audio or text modality is sequence, batch, feature dimention
Question. bimodality = aaudio_modality + btext_modality.

how to use pytroch to realize custom learning a, b parameters? thanks

For text, you will need a tokenizer and transformer. You can use this tutorial to get started:

https://pytorch.org/tutorials/beginner/transformer_tutorial.html

Change the final Softmax layer to ReLU and make the last Linear layer output larger, maybe 64 neurons wide.

For the audio, you can use this tutorial:

https://pytorch.org/tutorials/intermediate/speech_command_classification_with_torchaudio_tutorial.html

And change the final layer to ReLU, and change the final Linear layer to 64 neurons wide.

Combine both of the above into one model with one forward pass.

Then concat both of those outputs, and send through a final linear layer, 128 neurons wide for input and output should be however many classifications you need, followed by a Softmax.

@J_Johnson Thank you for your detailed advice
best wishes

1 Like