thanks for this post:here

and the code you provided below:

```
M = nn.Linear(100,50)
W = M.weight
criterion = nn.MSELoss(M.parameters())
C = Variable(torch.rand(50,100),requires_grad=True)
A = Variable(torch.rand(4,100),requires_grad=True)
T1 = Variable(torch.rand(50,100))
T2 = Variable(torch.rand(4,50))
B = M(A)
D = W + C
l1 = torch.sum((D-T1)**2)
l2 = torch.sum((B-T2)**2)
l = l1 + l2
l.backward()
x = 2*(D - T1)
y = (B - T2).transpose(0,1).unsqueeze(0)
z = A.unsqueeze(0)
t = 2*torch.bmm(y,z).squeeze()
grad_test = x+t # 2 (W+C-T1) + 2 (W*A+b1-T2)*A
print(torch.sum((W.grad-grad_test)**2))
```

Variable containing:

0

[torch.FloatTensor of size 1]

besides, if I want to **select** **(slicing)** one column of W1(100,50) matrix: W1_c(100 * 1 Variable), and obviously converting W1 to W1_c needs a (50 * 1) Variable, with only one 1-value and 49 0-value,right? (like [0,0,0,0,1,0,0,…,0,0])Let’s call this Variable Select_W. So what if this Select_W is created by something like:

```
v = torch.autograd.Variable(torch.Tensor([0.8,0.1,0.1,0.1,......,0.3]), requires_grad=True)
Select_W = (v==torch.max(v).expand_as(v)).float()
More_0 = W1 * Select_W
loss3 = loss_func3(More_0,target3)
```

So what is the gradients flow like among Variable Select_W, More_0,Variable v and the Variable A, B,etc? No gradients for Select_W Variable? Variable that created by slicing operation, does not have gradients? But this computational graph actually can be trained, I think. But I am confused by the gradient flow.

@SimonW

@alexis-jacq