I do not know if this is the right forum to ask, but here it is. I have been using PyTorch with supervised learning. Now I am trying out something which (in my opinion) is not supervised and is outlined below. This is my first time attempting such a problem.
I have data that I would like to interpret in higher dimension (sort of the opposite of an auto-encoder). What I intent to achieve is, an
m dimension data being represented in
n dimensions (
n>m) with the constraint that the representation in the
n dimensions is maximally separated.
For clarity, I will assume the binary field. My data has
m=20 dimensions. I would like to represent this data in a
n=50 dimensional space. There are
2^m number of data combinations that are possible and
2^n number of points in the larger dimension. My goal is to select
2^m points from
2^n points, which are as far away from each other as possible.
If anyone can help me with the type of learning (a basic outline) that I would need to do, it would be great. Thanks!
that’s a bit too generic…
- do you really want to measure separation using distances in discrete space, like hamming distance? or you seek a solution for continuous numbers?
- assuming continuous, will “autonomous” transformation suffice, or should it support gradient flow before/after? i.e. should separation measure form an auxiliary loss term for another task?
@googlebot, thank you for your reply.
The ultimate goal is the hamming distance. But I think to reach there, through training, we might need to go through continuous numbers. What I mean is, suppose we start from random initialization, then we need to move either towards 0 or towards 1 to maximize the criterion. However, if it is possible to do it without having to go through the real numbers, that would also do.
I do not have another loss. The training concludes when I have found the points on the higher dimensional space.
Hm, check if gray code won’t solve this problem; take n-digit gray codes and select 2^m regularly spaced elements as output space (using input as index)
In case that won’t work, I still feed that you need to seek a mathematical solution instead of “training”.
@googlebot mathematical solutions work well for small dimensions. I would like to work with large dimensions, so I assumed training would help. I will take a look at gray code. Thanks .