Hi everyone,

I recently watch some code style for scaling gradient values without scaling loss value, but I am confusing about it. I implemented a simple example code myself below.

```
torch.manual_seed(0)
x = torch.randn(2, 3, requires_grad=True)
y = 5
z = 2 * x
z_backward = z * y
z_new = z.detach() + (z_backward - z_backward.detach())
z_new2 = z + (z_backward - z_backward.detach())
print('------------------------')
print(f'x input: {x}')
print(f'x gradient: {x.grad}')
print('------------------------')
print(f'z: {z}')
print(f'z_new: {z_new}')
print('------------------------')
z.mean().backward(retain_graph=True)
print(f'x gradient from z :{x.grad}')
x.grad.zero_()
z_new.mean().backward(retain_graph=True)
print(f'x gradient from z_new :{x.grad}')
x.grad.zero_()
z_backward.mean().backward(retain_graph=True)
print(f'x gradient from z_backward :{x.grad}')
x.grad.zero_()
z_new2.mean().backward(retain_graph=True)
print(f'x gradient from z_new2 :{x.grad}')
print('------------------------')
```

The corresponding outputs are showing below

```
------------------------
x input: tensor([[ 1.5410, -0.2934, -2.1788],
[ 0.5684, -1.0845, -1.3986]], requires_grad=True)
x gradient: None
------------------------
z: tensor([[ 3.0820, -0.5869, -4.3576],
[ 1.1369, -2.1690, -2.7972]], grad_fn=<MulBackward0>)
z_new: tensor([[ 3.0820, -0.5869, -4.3576],
[ 1.1369, -2.1690, -2.7972]], grad_fn=<AddBackward0>)
------------------------
x gradient from z :tensor([[0.3333, 0.3333, 0.3333],
[0.3333, 0.3333, 0.3333]])
x gradient from z_new :tensor([[1.6667, 1.6667, 1.6667],
[1.6667, 1.6667, 1.6667]])
x gradient from z_backward :tensor([[1.6667, 1.6667, 1.6667],
[1.6667, 1.6667, 1.6667]])
x gradient from z_new2 :tensor([[2., 2., 2.],
[2., 2., 2.]])
------------------------
```

I can understand that the gradient from `z`

is 0.3333, which is 2 divided by 6. It is also easy to understand that gradient from `z_backward `

is scaled by 5.

However, what I am confusing is about gradient from `z_new `

and `z_new2`

. The output of `z`

, `z_new`

and `z_new2`

generate the same values but different gradient value corresponding to `x`

Could anyone answer me how the values from `z_new`

and `z_new2`

are generated ? Also, the gradent values from `z_backward`

and `z_new`

are the same. What is the difference between these two?

Thanks for any suggestion in advance.