Best way to set all tensor elements to zero

Hello

I am creating a zero tensor before a loop (lets call it “test_tensor”) and at each time step I want to reset the elements to zero. Currently I am doing it as in the code snippet below:

auto test_tensor = torch::zeros_like({tensor_initializer});
auto test_tensor_2 = torch::zeros_like({tensor_initializer});

for (int time_step = 0; time_step<time_steps; time_step++) {
   test_tensor_2 = test_tensor;
   test_tensor = torch::zeros_like({tensor_initializer});
   for (int i = 0; i<limit; i++) {
     test_tensor[i] = function(*args);
   } 

If I try to set the elements of the “test_tensor” to zero using test_tensor.zero_() I obtain different results and my tests fail.

Questions:

  • What could be the reason that these two ways work differently?
  • Is it really beneficial in terms of memory to use test_tensor.zero_() or I could stick to the current way without an issue?

Hi,

Do you actually track gradients across these operations? If so that the two will have different behaviors yes.

For speed, the two will be very close. The one that creates a new Tensor will obviously use a bit more memory but that’s it.

Hello @albanD

Thanks for your reply!

No I am not tracking any gradients. I am just calling the function that contains the above code in my tests and the output is different than the expected when I am using the .zero_().

The code you shared should work the same in both cases.
I guess something else in the code keeps a reference to test_tensor and so gets its reference changed when you do the update inplace after zeroing it out.

Actually, it really confuses me because the tensor is not used in any other way than in the code above and I don’t pass is as reference to any other tensor. Anyway, I am going to investigate further.

Thanks a lot for the answers! :slight_smile:

If I understood correctly that is suspicious because now you only have 1 tensor. If you zero_() you are losing your previous results. It works because you are creating a new tensor on the next line (and the first initialization is useless).

Maybe I did not understand.

1 Like

Hello @pascal.soveaux

I thought since I don’t pass test_tensor to test_tensor_2 as reference that it would do a copy so any changes in test_tensor would not affect test_tensor_2. If it is not really a copy then it should be that the issue.

Thanks for the answer!

I found the solution so I write it here for the record.

As correctly mentioned by @pascal.soveaux the test_tensor_2 = test_tensor; operation caused the problem. I changed it into

test_tensor_2.copy_(test_tensor);

and it now works when using just after

test_tensor.zero_();

Thanks everyone for the help!

1 Like

Hi,

I am terribly sorry but your solution is very wrong. You are not showing all your code, from what I see you neither need copy_ nor zero_.

Hello @pascal.soveaux

I am actually very interested to understand why my solution is wrong.

Briefly, my goal is to move the values of test_sensor to test_tensor_2 and then turn test_tensor to a zero tensor without allocating new memory anywhere in the process (if possible). I can sacrifice a bit of computation time for this but not memory. What would you think is a better implementation?

I don’t think the rest of the code is very relevant, but in case you want to see more of it you could see the function here (it’s called “compose_messages”): https://github.com/kovanostra/message-passing-neural-network/blob/v1.6.0/message_passing_nn/utils/messages.cpp

Thanks a lot for your time and please tell me if you need any more context! :slight_smile:

Hi,

I hope I do not sound pedantic, I can be wrong.

Given your piece of code there are 2 reasons :

  1. Zero_
    There are only assignments. Take this piece of code
byte result[255];
memset(result,0, sizeof(result));

for(int i = 0; i < limits; i++) 
 result[i] += fct(i); 
 // There is no reasons to use "+=" because result[i] is 0 (+= is from your git)
 // So result is only assigned
 // So there is no reason to initialize to 0 IIF the loop is entered at least once and limits == sizeof(results)
 // see empty_like
  1. Copy_
    Copying is useless because you can just swap tensors instead: one is assigned (say current), the other holds the prev step (say prev).
auto prev = test_tensor_2; // reference semantics (just like a shared_ptr if you prefer)
auto current = test_tensor; // 

for(int step=...
{
    for(int i= ...)
        current[i] = fct(i);

    swap(prev, current); // No copy
}

I can take a close look at your code if you like but I hope you get my point.

1 Like

I think the swap saved me both from the copy and the zero! Coming from the python world I didn’t know I could do such operation. Thanks a lot!

I would be very interested to hear any other comments you may have about my code if you have the time to check :slight_smile:

I’m glad it helped you. I will take a closer look at your code as soon as I have time to do so (pr).
Bonne journée :wink: