Hi everyone,
I’m currently working on an deep learning project and would love to get your thoughts and advice on a few key aspects. Here’s an overview of what I’m doing:
I’m developing a model that takes a single high-dimensional embedding vector (256 dimensions) as input and generates three distinct high-dimensional embedding vectors (each 256 dimensions) as output (right now as a concatenated 768 vector output). The goal is to enable the model to learn meaningful representations for multi-output embeddings based on the input data. This is a supervised learning task, and I have a dataset with several hundred thousand input-output pairs (input to 3 outputs) to work with.
Model Architecture:
- What types of architectures would you recommend for multi-output embedding generation?
- Are there specific design patterns or loss functions you’d suggest to ensure each output embedding is unique and well-represented?
I’ve experimented with a few approaches so far:
- Straightforward Neural Network: A simple fully connected network has been surprisingly effective and performs well overall.
- Transformer Decoder: I’ve also tried a Transformer decoder-based architecture, which works decently but isn’t as efficient as I’d like.
- GAN Architecture: Since this task involves generating embeddings, I’ve been considering a GAN-based approach. Do you think this could be effective for high-dimensional embedding generation? Any advice on GAN design for this type of task?
Additionally, I’d prefer not to fine-tune LLM due to resource constraints. However, I might explore fine-tuning a smaller 1B-parameter model. If you have insights on:
- How to modify or fine-tune such a model to directly output embeddings instead of text.
Loss Functions:
- I’ve experimented with the following:
- Cosine Similarity Loss: It works well for aligning the embeddings with the expected direction.
- MSE + Cosine Similarity Loss: This combination seems to strike a good balance, optimizing both magnitude and directional alignment.
Are there other commonly used or innovative loss functions for high-dimensional vector generation tasks?
For example, any techniques that emphasize diversity between outputs or enforce specific constraints on embedding properties?
Thank You!