I would like to create bunch of tensors, that share the same data and same gradient accumulation, but are viewed as different tensors by autograd when calculating gradients.

So when I call backward() on the function I try to optimize, pytorch will calculate separate gradient for each of the tensors. But instead of keeping every gradient separely, it will sum them all together. And when I call optimization step(), it will used this shared accumulated gradient to update shared data.

So all tensors always have exactly the same value and exactly the same grad, but are viewed as different tensors by the rest of my calculation graph.

Do you mean to create a view?

https://pytorch.org/docs/stable/tensor_view.html

```
>>> a = torch.ones(2, 2, requires_grad=True)
>>> a.grad
>>> b = a[:]
>>> b._is_view()
True
>>> b._base is a
True
>>> torch.autograd.grad(b * 2 + a * 2, (a,), torch.ones(2, 2))
(tensor([[4., 4.],
[4., 4.]]),)
>>> torch.autograd.grad(a * 2, (a,), torch.ones(2, 2))
(tensor([[2., 2.],
[2., 2.]]),)
>>>
```