Interpreting in higher dimensions with maximal spacing constraint

learner47 · January 12, 2021, 10:03pm

I do not know if this is the right forum to ask, but here it is. I have been using PyTorch with supervised learning. Now I am trying out something which (in my opinion) is not supervised and is outlined below. This is my first time attempting such a problem.

I have data that I would like to interpret in higher dimension (sort of the opposite of an auto-encoder). What I intent to achieve is, an m dimension data being represented in n dimensions (n>m) with the constraint that the representation in the n dimensions is maximally separated.

For clarity, I will assume the binary field. My data has m=20 dimensions. I would like to represent this data in a n=50 dimensional space. There are 2^m number of data combinations that are possible and 2^n number of points in the larger dimension. My goal is to select 2^m points from 2^n points, which are as far away from each other as possible.

If anyone can help me with the type of learning (a basic outline) that I would need to do, it would be great. Thanks!

googlebot · January 13, 2021, 3:41am

that’s a bit too generic…

do you really want to measure separation using distances in discrete space, like hamming distance? or you seek a solution for continuous numbers?
assuming continuous, will “autonomous” transformation suffice, or should it support gradient flow before/after? i.e. should separation measure form an auxiliary loss term for another task?

learner47 · January 13, 2021, 4:12am

@googlebot, thank you for your reply.

The ultimate goal is the hamming distance. But I think to reach there, through training, we might need to go through continuous numbers. What I mean is, suppose we start from random initialization, then we need to move either towards 0 or towards 1 to maximize the criterion. However, if it is possible to do it without having to go through the real numbers, that would also do.
I do not have another loss. The training concludes when I have found the points on the higher dimensional space.

googlebot · January 13, 2021, 5:32am

Hm, check if gray code won’t solve this problem; take n-digit gray codes and select 2^m regularly spaced elements as output space (using input as index)

In case that won’t work, I still feed that you need to seek a mathematical solution instead of “training”.

learner47 · January 13, 2021, 5:33am

@googlebot mathematical solutions work well for small dimensions. I would like to work with large dimensions, so I assumed training would help. I will take a look at gray code. Thanks .