Represent each pixel as n sized vector

vainaijr · October 23, 2019, 9:32pm

If I have a batch of images [N, C, H, W], let us assume
[1, 1, 28, 28], and I want to represent each pixel within each image, as a vector, how do I do this, for example, each of the [1, 28, 28] pixels as 10 sized vectors, so I get a tensor like [1, 28, 28, 10]?

albanD · October 23, 2019, 9:54pm

Hi,

Yes you can do that.
Note that in classical applications, the channel dimension is used for such things.

vainaijr · October 23, 2019, 10:31pm

is doing

x = nn.Conv2d(in_channels=1, out_channels=10, kernel_size=1)
x(input_image)

equivalent to what I am asking, also if I want my neural network to learn the size of vectors for each pixel, so one pixel could be represented as 10 sized vector, another pixel could be represented as 20 sized vector, how to do this?

furthermore, if I want to find how each pixel is related to other pixels, represented by a score, and that score is 0, for same pixel, and high for nearby pixels, that differ.

for example, if I have a 20x20 image, with all white pixels, except one pixel which is black, then this score for black pixel with black pixel would be 0, score of black pixel with white pixel would be high, as they are different, similarly score of white pixel with white pixel would be 0, score of white pixel with black pixel would be high, and finally I have a matrix of scores for each pixel, so, black pixel score would be high, as there are lot of white pixels around it, whereas white pixels score would be lower than black pixel.

so for [20x20] sized image, I get [20x20] sized score matrix, and then reduce it to number of classes I have in classification task, let us assume 5, so from [20x20] to 5, these 5 numbers would be probability of what class was predicted based on scores.

the task of neural network would be to learn vector size, and how to represent all of these pixels as vectors.

albanD · October 24, 2019, 2:24pm

Hey,
I’ll try to answer all your questions, let me know if I missed some

is doing equivalent to what I am asking

This will allow you to convert 1 value per pixel to 10 value per pixel yes.

if I want my neural network to learn the size of vectors for each pixel, so one pixel could be represented as 10 sized vector, another pixel could be represented as 20 sized vector, how to do this?

Learning discrete values (size in your case) is very hard and this leads to non-continuous problems and so gradients cannot be used. You usually have to relax these to continuous values but that it very problem-dependent and you will need to experiment with your use case.
Currently, pytorch won’t allow you to have a dimension with varying size accross sample. You can find here the issue tracking this feature. A common workaround (that might be required by your relaxation above) is to have a max size for the vector associated with pixels.

vainaijr · October 24, 2019, 6:28pm

Learning discrete values (size in your case) is very hard and this leads to non-continuous problems and so gradients cannot be used.

is that the reason, that when I do this,

a = torch.arange(-5, 5)
x = nn.Parameter(a)

I get error,

RuntimeError: Only Tensors of floating point dtype can require gradients

and we need to find someway to make an integer parameter, that is updated to have only discrete integer values, but updating parameters by gradient based approach is not the way to do this.

so, if I want my neural network to learn things like, image size, image crop size, an integer threshold, an integer batch_size, an integer number of layers in the network, then none of this could be done by gradient based approach, as all of these are discrete integers?

albanD · October 24, 2019, 7:24pm

Yes all of these are discrete problems and that’s why you see very little work actually learning such things.
People usually do grid-search to try different architectures / kernel size etc… Because there are not much more options.

One line of work that went around this (with limited success though) is that work on differentiable computers that you can look at.

vainaijr · October 24, 2019, 8:25pm

is anything wrong with this,

let us assume we have a 3x3 matrix, like

1 2 3
4 5 6
7 8 9

now we want to learn an integer threshold, and want to zero out elements that fall below this threshold,

let us assume that final target matrix is

0 0 0
0 5 6
7 8 9

so, we want neural network to learn that 5 is the threshold.

we initialize the threshold to be a random integer, let us say 8, and also specify that this integer would be in a range [1-9], so task is to update this 8 till neural network learns that 5 is the threshold.

if we represent each of these discrete values that this threshold could take, as a vector, something like

1 -> [0.1, 0.4, -0.5, 0.9, 0.02]
2 -> [0.01, 0.04, 0.3, -0.5, 0.4]

and so on, for [3-9]
and then assign a score to them, based on scalar dot product (sdp) with other vectors, and difference between them, so

1 -> [0.1, 0.4, -0.5, 0.9, 0.02] -> sum(sdp of 1 with [1-9] * difference between vectors 1 with [1-9]) -> score = 0.5
2 -> [0.01, 0.04, 0.3, -0.5, 0.4] -> sum(sdp of 2 with [1-9] * difference between vectors 1 with [1-9]) -> score = 0.2
3 -> [...] -> sum(sdp of 3 with [1-9] * difference between vectors 1 with [1-9]) -> score = 0.8

one example to get score for number 3 would be,

score_3 = (vector_3)*(vector_1)*abs(vector_3 - vector_1)
+ (vector_3)*(vector_2)*abs(vector_3 - vector_2)
+ (vector_3)*(vector_3)*abs(vector_3 - vector_3)
...
+ (vector_3)*(vector_9)*abs(vector_3 - vector_9)

the number with highest score would be our expected threshold, the task of neural network is to update these weights within each of these continuous vectors by gradient based approach, till high score is obtained for expected threshold.

so after one training iteration, suppose we get high score for number 7, then we get a matrix like

0 0 0
0 0 0
7 8 9

we compare this matrix with our target, and find a loss, which would be high, so we go for more iterations, till we get threshold as 5.

albanD · October 25, 2019, 2:13pm

Hi,

This boils down to perform a classification task where each task is a possible value of the threshold?
That might work !

As I said above. It’s not that it’s impossible to do, just that there is no one approach and it will be problem dependent. Your approach for example only works if you have a relatively small number of possible values. Also you need to have examples where you know in advance the correct value. This is not the case for network architecture for example.

vainaijr · October 25, 2019, 4:02pm

I think to scale it, outputs need to be concatenated, for example,

sentence -> yes yes yes yes yes no yes yes yes yes yes yes

expected outcome after training -> [0 0 0 0 0 1 0 0 0 0 0 0]

break into parts -> [yes yes yes] [yes yes no] [yes yes yes] [yes yes yes]

random initialize -> [1 2 3] [4 5 6] [7 8 9] [10 11 12]

# assume we represent each word by a vector of size 1

get score -> [3 2 3] [3 2 3] [3 2 3] [3 2 3]

# assume our formula for difference between two words is difference between their vectors, ideally we would want to multiply those vectors also

pass scores up -> [3 2 3 3 2 3] [3 2 3 3 2 3]

get score -> [2 4 2 2 4 2] [2 4 2 2 4 2]

pass scores up -> [2 4 2 2 4 2 2 4 2 2 4 2]

get score -> [8 16 8 8 16 8 8 16 8 8 16 8]

someway represent it as -> [0 1 0 0 1 0 0 1 0 0 1 0]

outcome -> [yes no yes yes no yes yes no yes yes no yes]

expected outcome -> [yes yes yes yes yes no yes yes yes yes yes yes]

get a loss, backprop, update initialization.

same thing applies for pixels, discrete values also

can you give me example of some other problem where we do not know the value in advance, I think so same thing would apply to such problem also.

or maybe we only need to change the order after concatenation, as computing score for so many numbers would be difficult after concatenation.

sentence -> yes yes yes yes yes no yes yes yes yes yes yes

expected outcome after training -> [0 0 0 0 0 1 0 0 0 0 0 0]

break into parts -> [yes yes yes] [yes yes no] [yes yes yes] [yes yes yes]

random initialize -> [1 2 3] [4 5 6] [7 8 9] [10 11 12]
# assume we represent each word by a vector of size 1

get score -> [3 2 3] [3 2 3] [3 2 3] [3 2 3] 
# assume our formula for difference between two words is difference between their vectors
# we would want to multiply those vectors also

pass scores up -> [[3 2 3] [3 2 3]] [[3 2 3] [3 2 3]]
# let us represent these vectors as a b c d

represented as -> [   a       b   ] [   c       d   ]
# we do not change a b c d, only change the order in which they get concatenated
# here a, b, c, d are same, but assume they were different
# so after changing order it could be, 
[b a][d c]
[a b][d c]
[b a][c d]
# neural network needs to learn to change the order also, I do not know how will this happen.
# we apply same technique on a b c d

# we store a b c d, as we will have to reveal them later

# randomly initialize these four vectors by a vector of size 1
represented as -> [1 2] [3 4]
# apply the same formula again

get score -> [1 1] [1 1]

do concatenation again -> [[1 1] [1 1]]

represented_as -> [a b]
which could be [b a]

represented_as -> [1 2]
# random initialization, representing each vector as a vector of size 1

pass scores up -> [1]

represented_as -> [a]
now no concatenation is possible, we have reduced it to one vector,
so, reveal it, based on all concatenations.

assume we get outcome -> [yes no yes yes no yes yes no yes yes no yes]

expected outcome -> [yes yes yes yes yes no yes yes yes yes yes yes]

get a loss, backprop, update initialization.

in the 2nd case, task of neural network would be to learn to initialize vectors at each level.

vainaijr · November 1, 2019, 8:25pm

hello @albanD ,
I wrote following program for the same (no concat in this, only computing scores), is this a valid test, or am I making some mistake?

class A(nn.Module):
  def __init__(self):
    super().__init__()
    self.embed = nn.Embedding(50, 10)
  def forward(self, inp_matrix):
    inp_matrix = inp_matrix.reshape(50)
    self.scores = torch.zeros(50)
    for i in range(len(inp_matrix)):
      for j in range(len(inp_matrix)):
        self.scores[i] += abs(((self.embed(inp_matrix[i].long())@(self.embed(inp_matrix[i].long()) + self.embed(inp_matrix[j].long())).t()))) # I use formula (A)*(A+B)
    self.scores = self.scores/(self.scores.sum().expand_as(self.scores)) # I divide values by fixed number, but this should also be learnt
    # print(self.scores)
    return self.scores

model = A()
loss_func = nn.CrossEntropyLoss() # .to('cuda')
optim = torch.optim.SGD(model.parameters(), lr=5) # I use SGD, for technique to be a better fit for SGD

model.train()
for i in range(1000):
  optim.zero_grad()
  input = torch.randperm(50).float()
  zeros = torch.zeros(50)
  input.requires_grad = True
  target = torch.where(input==15, torch.tensor(1.), zeros)
  print('input', input, 'target', target)
  input = input.reshape(1, 50)
  # print(input.shape)
  # print('input', input, '\n', 'target', target)
  x = model(input)
  x = x.reshape(1, 50)
  y = torch.argmax(x)
  print('x', x, 'target', torch.tensor([torch.argmax(target).item()]))
  loss = loss_func(x, torch.tensor([torch.argmax(target).item()]))
  loss.backward()
  # for p in model.parameters():
    # print(p.grad)
  print(list(model.parameters())[0].sum())
  optim.step()
  print(list(model.parameters())[0].sum())
  output = input.reshape(50)
  for j in range(len(output)):
    # print(j, y, input[j], input[y])
    if output[j] < output[y]:
      output[j] = 0.
  print('output', output)
  print(loss)

I see loss decreasing, and output, is 0 for values below 15.

albanD · November 1, 2019, 9:11pm

I didn’t tried the code but it looks ok.

btw I would advise against saving scores in self.scores, but rather just scores. You don’t want your nn.Modules to have states in general.