I am creating a zero tensor before a loop (lets call it “test_tensor”) and at each time step I want to reset the elements to zero. Currently I am doing it as in the code snippet below:

auto test_tensor = torch::zeros_like({tensor_initializer});
auto test_tensor_2 = torch::zeros_like({tensor_initializer});

for (int time_step = 0; time_step<time_steps; time_step++) {
test_tensor_2 = test_tensor;
test_tensor = torch::zeros_like({tensor_initializer});
for (int i = 0; i<limit; i++) {
test_tensor[i] = function(*args);
}

If I try to set the elements of the “test_tensor” to zero using test_tensor.zero_() I obtain different results and my tests fail.

Questions:

What could be the reason that these two ways work differently?

Is it really beneficial in terms of memory to use test_tensor.zero_() or I could stick to the current way without an issue?

No I am not tracking any gradients. I am just calling the function that contains the above code in my tests and the output is different than the expected when I am using the .zero_().

The code you shared should work the same in both cases.
I guess something else in the code keeps a reference to test_tensor and so gets its reference changed when you do the update inplace after zeroing it out.

Actually, it really confuses me because the tensor is not used in any other way than in the code above and I don’t pass is as reference to any other tensor. Anyway, I am going to investigate further.

If I understood correctly that is suspicious because now you only have 1 tensor. If you zero_() you are losing your previous results. It works because you are creating a new tensor on the next line (and the first initialization is useless).

I thought since I don’t pass test_tensor to test_tensor_2 as reference that it would do a copy so any changes in test_tensor would not affect test_tensor_2. If it is not really a copy then it should be that the issue.

I am actually very interested to understand why my solution is wrong.

Briefly, my goal is to move the values of test_sensor to test_tensor_2 and then turn test_tensor to a zero tensor without allocating new memory anywhere in the process (if possible). I can sacrifice a bit of computation time for this but not memory. What would you think is a better implementation?

Zero_
There are only assignments. Take this piece of code

byte result[255];
memset(result,0, sizeof(result));
for(int i = 0; i < limits; i++)
result[i] += fct(i);
// There is no reasons to use "+=" because result[i] is 0 (+= is from your git)
// So result is only assigned
// So there is no reason to initialize to 0 IIF the loop is entered at least once and limits == sizeof(results)
// see empty_like

Copy_
Copying is useless because you can just swap tensors instead: one is assigned (say current), the other holds the prev step (say prev).

auto prev = test_tensor_2; // reference semantics (just like a shared_ptr if you prefer)
auto current = test_tensor; //
for(int step=...
{
for(int i= ...)
current[i] = fct(i);
swap(prev, current); // No copy
}

I can take a close look at your code if you like but I hope you get my point.