How to convert a Variable[0.8,0.1,0.1,0] to [1,0,0,0] (since 0.8 is max number,max number is 1, else is 0), in a computational graph(with gradient back prop)?

How to convert a Variable[0.8,0.2,0] to [1,0,0] (since 0.8 is max number,max number is 1, else is 0), in a computational graph(with gradient back prop)?

1 Like
v = torch.autograd.Variable(torch.Tensor([0.8,0.1,0.1,0]), requires_grad=True)
out = (v==torch.max(v)).float()
1 Like

RuntimeError: inconsistent tensor size at /home/jcc/pytorch/torch/lib/TH/generic/THTensorMath.c:2668

error, in the second line

Wich version of pytorch are you using ?

In older versions, you had to compare two tensors of the same size:

(v==torch.max(v).expand_as(v)).float()

my version is 0.1.11, thanks, it works!

1 Like

given m1=nn.Linear(100,50), and m1 converts Variable A(4 * 100) to VariableB(4 * 50), and suppose the parameters
of m1 is W1(100 * 50 tensor) and b1(50 * 1 tensor).
So if I take W1 as a Variable, and given C(100*50 tensor) and do something like:

B = m1(A)
D = W1+C
loss1 = loss_func1(D,target1)
loss2 = loss_func2(B,target2)
loss=loss1+loss2
loss.backward()

what is the gradient like for W1, given the fact it is the parameters of m1, not purely Variable? Anything special?

Nothing special I suppose. For example with an MSELoss, the gradient of W1 should be something like (ignoring dimensions) :

W.grad = d(l1)/d(W) + d(l2)/d(W) = 2 (W+C-T1) + 2 (W*A+b1-T2)*A

You can even check:

M = nn.Linear(100,50)
W = M.weight
criterion = nn.MSELoss(M.parameters())

C = Variable(torch.rand(50,100),requires_grad=True)
A = Variable(torch.rand(4,100),requires_grad=True)


T1 = Variable(torch.rand(50,100))
T2 = Variable(torch.rand(4,50))

B = M(A)
D = W + C
l1 = torch.sum((D-T1)**2)
l2 = torch.sum((B-T2)**2)
l = l1 + l2
l.backward()

x = 2*(D - T1) 
y = (B - T2).transpose(0,1).unsqueeze(0)
z = A.unsqueeze(0)
t = 2*torch.bmm(y,z).squeeze()

grad_test = x+t # 2 (W+C-T1) + 2 (W*A+b1-T2)*A

print(torch.sum((W.grad-grad_test)**2))

Variable containing:
0
[torch.FloatTensor of size 1]

The original thing you wanted is not differentiable

what do you mean? What is the “original thing”?

I meant this argmax-like operation is not differentiable.

how about this operation:

  v = torch.autograd.Variable(torch.Tensor([0.8,0.1,0.1,0]), requires_grad=True)
  out = (v==torch.max(v).expand_as(v)).float()

does variable v and out have gradients?

@alexis-jacq
@SimonW

Isn’t that the same thing? The expand_as call doesn’t do anything here.

And also, argmax operation is never differentiable.

thanks for this post:here
and the code you provided below:

M = nn.Linear(100,50)
W = M.weight
criterion = nn.MSELoss(M.parameters())

    C = Variable(torch.rand(50,100),requires_grad=True)
A = Variable(torch.rand(4,100),requires_grad=True)


T1 = Variable(torch.rand(50,100))
T2 = Variable(torch.rand(4,50))

B = M(A)
D = W + C
l1 = torch.sum((D-T1)**2)
l2 = torch.sum((B-T2)**2)
l = l1 + l2
l.backward()

x = 2*(D - T1) 
y = (B - T2).transpose(0,1).unsqueeze(0)
z = A.unsqueeze(0)
t = 2*torch.bmm(y,z).squeeze()

grad_test = x+t # 2 (W+C-T1) + 2 (W*A+b1-T2)*A

print(torch.sum((W.grad-grad_test)**2))

Variable containing:
0
[torch.FloatTensor of size 1]

besides, if I want to select (slicing) one column of W1(100,50) matrix: W1_c(100 * 1 Variable), and obviously converting W1 to W1_c needs a (50 * 1) Variable, with only one 1-value and 49 0-value,right? (like [0,0,0,0,1,0,0,…,0,0])Let’s call this Variable Select_W. So what if this Select_W is created by something like:

    v = torch.autograd.Variable(torch.Tensor([0.8,0.1,0.1,0.1,......,0.3]), requires_grad=True)
   Select_W = (v==torch.max(v).expand_as(v)).float()
   More_0 = W1 * Select_W 
   loss3 = loss_func3(More_0,target3)

So what is the gradients flow like among Variable Select_W, More_0,Variable v and the Variable A, B,etc? No gradients for Select_W Variable? Variable that created by slicing operation, does not have gradients? But this computational graph actually can be trained, I think. But I am confused by the gradient flow.
@SimonW
@alexis-jacq

You can check that out.requires_grad=False. @SimonW is right by notifying argmax (but here it’s rather an indicator function than the true “argmax”) is not differentiable. However, it’s possible to set the derivative of the function as f'(x)=0 for all x.

For instance,
f(x,t) = Softmax(x*t) --> argmax(x) when t --> +inf
and respectivly, f'(x,t) --> 0

One simple solution could be:

v = Variable(torch.Tensor([0.8,0.1,0.1,0]), requires_grad=True)
y = F.softmax(v*100)

@alexis-jacq
thanks,besides,what about this question?