Building a Model for Multi-Output Embedding Generation: Seeking Advice and Insights

Hi everyone,

I’m currently working on an deep learning project and would love to get your thoughts and advice on a few key aspects. Here’s an overview of what I’m doing:

I’m developing a model that takes a single high-dimensional embedding vector (256 dimensions) as input and generates three distinct high-dimensional embedding vectors (each 256 dimensions) as output (right now as a concatenated 768 vector output). The goal is to enable the model to learn meaningful representations for multi-output embeddings based on the input data. This is a supervised learning task, and I have a dataset with several hundred thousand input-output pairs (input to 3 outputs) to work with.

Model Architecture:

  • What types of architectures would you recommend for multi-output embedding generation?
  • Are there specific design patterns or loss functions you’d suggest to ensure each output embedding is unique and well-represented?

I’ve experimented with a few approaches so far:

  1. Straightforward Neural Network: A simple fully connected network has been surprisingly effective and performs well overall.
  2. Transformer Decoder: I’ve also tried a Transformer decoder-based architecture, which works decently but isn’t as efficient as I’d like.
  3. GAN Architecture: Since this task involves generating embeddings, I’ve been considering a GAN-based approach. Do you think this could be effective for high-dimensional embedding generation? Any advice on GAN design for this type of task?

Additionally, I’d prefer not to fine-tune LLM due to resource constraints. However, I might explore fine-tuning a smaller 1B-parameter model. If you have insights on:

  • How to modify or fine-tune such a model to directly output embeddings instead of text.

Loss Functions:

  • I’ve experimented with the following:
    1. Cosine Similarity Loss: It works well for aligning the embeddings with the expected direction.
    2. MSE + Cosine Similarity Loss: This combination seems to strike a good balance, optimizing both magnitude and directional alignment.

Are there other commonly used or innovative loss functions for high-dimensional vector generation tasks?
For example, any techniques that emphasize diversity between outputs or enforce specific constraints on embedding properties?

Thank You!