C++/aten pick integer value from a tensor

Dhorka · October 23, 2018, 9:32am

Hi,

I am working with aten in c++. I would like to cast one value of a tensor to integer, I was looking for in the aten documentation but I didn’t see anything related. Is it possible to do it?

I am trying to convert this piece of python code to cpp aten:

Python:

        k = 0                                                                                                                                                                                                                                                                      
        for i in range(startd, startd+numd):                                                                                                                                                                                                                                       
                if self._degs[i]>0:                                                                                                                                                                                                                                                
                        torch.mean(products.narrow(0,k,self._degs[i]), 0, out=output[i])                                                                                                                                                                                           
                else:                                                                                                                                                                                                                                                              
                        output[i].fill_(0)                                                                                                                                                                                                                                         
                k = k + self._degs[i]

aten/cpp:

#include <torch/torch.h>

at::Tensor aggregate(int startd, int starte, int  numd, at::Tensor degs, at::Tensor products, at::Tensor output){


        int k = 0;

        for(int i = startd; i < (startd+numd); i = i+1){
                if (degs[i] > 0){
                        output[i] = at::mean(products.narrow(0,k,int(degs[i])));
                }else{
                        output[i].fill_(0);
                }

                k = k + degs[i];
        }
        return output;
}

The intention of this code is work in a gpu and also be part of my main code in python. I would like to use this function for the forward pass of one of my custom layers.

tom · October 23, 2018, 10:30am

For read access, there is .toCLong() (and if I recall correctly, also, .to<int64_t>()). Note that indexing to a 1-element tensor and then using this is inefficient.

For read/write access and CPU tensors, you can either assign to *(t.data<int64_t>()) or you can - before you make it a scalar - use auto a = t.accessor<int64_t, dim>() to get something that provides array-like access a[i].

Best regards

Thomas

Dhorka · October 23, 2018, 10:56am

@tom thanks for your comments. I have some doubts regarding your answer. This code it will supposed to work in gpu. Then, I understand the first method is inefficient. The first question is, the second method, should it work on gpu?

If the answer is yes, I have some doubts in the second method using the accessor. Dim is the dimension of the tensor, right?

tom · October 23, 2018, 12:28pm

You didn’t say GPU. The direct correspondence for mean(…, out=) is mean_out, maybe that is the easiest. The k = 0 probably needs to be a _degs.new_zeros({1}).
If you want to go beyond what you have, you probably need to write your own kernel.
If you want your own kernels, there are PackedTensorAccessors, but you probably need the fixes from the native batch norm PR, too. Or you can do the pointers yourself, as in the C++ extension example’s cuda part.

Best regards

Thomas

P.S.: I forgot in my last mail, you likely want torch::Tensor, not at::Tensor.

Dhorka · October 24, 2018, 9:17am

@tom thanks for your comments! You are right, I didn’t mention the gpu, I will modify my first ticket in order to add and also I would like to clarify that the intention of this code is to create the forward and backward in c++, in order to speed up my python code but I am working on python with pytorch 0.4.1.

Regarding my troubles:

I was trying to use the torch::Tensor, but I think is not included in pytorch 0.4.1, right?
new_zeros is not working, I am suppose that is because I am still using at::tensor.
My code is working with the .toCLong() you mention in your first comment. But I would like to use the accessor you mention before, but I was not able to obtain the dim. I was trying to use this, the reason of to use the accessor is that I am trying to speed up my code.

tom · October 24, 2018, 11:21am

Looking at your code some more, I wonder whether it might be more reasonable to take the mean over the dimension(s >=) 1 and then do a scatter_add_ and divide for handling the 0 dimension. I don’t want to keep you from doing your own kernel, but I would guess that it’s hard to win much over that.

Accessors only work when you know the dimension (i.e. how many “axes”) of the tensor at compile time. You very likely want at least the 1.0 preview, and you need the updates from the batch norm PR for good cuda tensor accessors (e.g. you want 32bit indexing if you can use it etc.).

Best regards

Thomas

Dhorka · October 24, 2018, 2:22pm

Hi @tom,
Thanks for you answer. I do not completely understand your proposal. I do not see how to convert this piece of code:

for(int i = startd; i < (startd+numd); i = i+1){
                if (degs[i] > 0){
                        output[i] = at::mean(products.narrow(0,k,int(degs[i])));
                }else{
                        output[i].fill_(0);
                }

                k = k + degs[i];
        }

I need to put the results of the mean in a specific order using k and degs[i] indexes. I can not compute the mean in one shoot. I think I do not completely understood your point. Can you give more details?

In reference of the pytorch 1.0 preview. Taking the code from the github repository under the tag v1.0rc1, and the path from this pr, Will I have all the necessary?

Finally I would like to ask you some help with this piece of code(corresponds to the backward function):

  k = 0
        for i in range(startd, startd+numd):
                if self._degs[i]>0:
                        torch.div(grad_output[i], self._degs[i], out=grad_products[k])
                        if self._degs[i]>1:
                                grad_products.narrow(0, k+1, self._degs[i]-1).copy_( grad_products[k].expand(self._degs[i]-1,1,self._out_channels).squeeze(1) )
                        k = k + self._degs[i]

I have issues with this line:

grad_products.narrow(0, k+1, self._degs[i]-1).copy_( grad_products[k].expand(self._degs[i]-1,1,self._out_channels).squeeze(1) )

I found in the documentation that in at I have the same methods and I am trying to do this in c++:

int k = 0;

        for(int i = startd; i < (startd+numd); i = i+1){
                auto degs_int = degs[i].toCLong();

                if(degs_int > 0){
                        grad_products[k] = at::div(grad_output[i], degs_int);

                        if (degs_int > 1){

                                at::IntList sizes = {degs_int-1,1,out_channels};
                                grad_products.narrow(0, k+1, degs_int-1).copy_(grad_products[k].expand(sizes).squeeze(1));

                        }
                        k = k + degs_int;
                }

But it is not working, this is the error in runtime:

RuntimeError: The expanded size of the tensor (140414365974784) must match the existing size (64) at non-singleton dimension 2 (inferExpandGeometry at /pytorch/aten/src/ATen/ExpandUtils.cpp:69)
frame #0: at::native::expand(at::Tensor const&, at::ArrayRef<long>, bool) + 0x52 (0x7fb4bd3210c2 in /imatge/amosella/env/ecc_torch0.4.1_py36/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: at::Type::expand(at::Tensor const&, at::ArrayRef<long>, bool) const + 0x4a (0x7fb4bd4d4eda in /imatge/amosella/env/ecc_torch0.4.1_py36/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #2: torch::autograd::VariableType::expand(at::Tensor const&, at::ArrayRef<long>, bool) const + 0x4ad (0x7fb4c50d414d in /imatge/amosella/env/ecc_torch0.4.1_py36/lib/python3.6/site-packages/torch/_C.
cpython-36m-x86_64-linux-gnu.so)

Again, Thank you so much Tom, you are helping me a lot of with my first steps with the c++ api. Sorry if I am asking basics things, but honestly I am little lost with this.

tom · October 24, 2018, 3:32pm

I’d just check out master ~~+ the non-native/test bits of the PR.~~ (the PR is now merged).

Depending on how large degs is, maybe this would already work well:

degs = torch.randint(0, 10, (20,))
degs_sum = degs.cumsum(0)
data_size = degs_sum[-1].item()
products =torch.randn(data_size, 100, 10, requires_grad=True)
classes = (torch.arange(1,products.size(0)+1)[None]>degs_sum[:,None]).sum(0)  # this is inefficient, but if the number of classes isn't too large...
output = torch.zeros(degs.size(), dtype=products.dtype, device=products.device)
row_means = products.view(products.size(0), -1).mean(1)
output = output.scatter_add_(0, classes, row_means)/degs.clamp(min=1).float()

Best regards

Thomas

souljaboy764 · April 29, 2019, 9:48am

The syntax is not .to<int64_t>() it is .item<int64_t>()