How to convert a Variable[0.8,0.1,0.1,0] to [1,0,0,0] (since 0.8 is max number,max number is 1, else is 0), in a computational graph(with gradient back prop)?

brisker · December 6, 2017, 4:40pm

thanks for this post:here
and the code you provided below:

M = nn.Linear(100,50)
W = M.weight
criterion = nn.MSELoss(M.parameters())

    C = Variable(torch.rand(50,100),requires_grad=True)
A = Variable(torch.rand(4,100),requires_grad=True)


T1 = Variable(torch.rand(50,100))
T2 = Variable(torch.rand(4,50))

B = M(A)
D = W + C
l1 = torch.sum((D-T1)**2)
l2 = torch.sum((B-T2)**2)
l = l1 + l2
l.backward()

x = 2*(D - T1) 
y = (B - T2).transpose(0,1).unsqueeze(0)
z = A.unsqueeze(0)
t = 2*torch.bmm(y,z).squeeze()

grad_test = x+t # 2 (W+C-T1) + 2 (W*A+b1-T2)*A

print(torch.sum((W.grad-grad_test)**2))

Variable containing:
0
[torch.FloatTensor of size 1]

besides, if I want to select (slicing) one column of W1(100,50) matrix: W1_c(100 * 1 Variable), and obviously converting W1 to W1_c needs a (50 * 1) Variable, with only one 1-value and 49 0-value,right? (like [0,0,0,0,1,0,0,…,0,0])Let’s call this Variable Select_W. So what if this Select_W is created by something like:

    v = torch.autograd.Variable(torch.Tensor([0.8,0.1,0.1,0.1,......,0.3]), requires_grad=True)
   Select_W = (v==torch.max(v).expand_as(v)).float()
   More_0 = W1 * Select_W 
   loss3 = loss_func3(More_0,target3)

So what is the gradients flow like among Variable Select_W, More_0,Variable v and the Variable A, B,etc? No gradients for Select_W Variable? Variable that created by slicing operation, does not have gradients? But this computational graph actually can be trained, I think. But I am confused by the gradient flow.
@SimonW
@alexis-jacq