Which can speed up?

i want to ask which one is better in the aspect of speed.

device=torch.device("cuda")
class fc_layer(nn.Module):
    def __init__(self):
        super(fc_layer,self).__init__()
        self.linear1=nn.Linear(emb_size,emb_size)
    def forward(self,x):
        y=self.linear1(x)
        a=np.ones((2,3))
        a=torch.tensor(a,dtype=torch.float32).to(device)
        y=y+a
        return y
device=torch.device("cuda")
class fc_layer(nn.Module):
    def __init__(self):
        super(fc_layer,self).__init__()
        self.linear1=nn.Linear(emb_size,emb_size)
    def forward(self,x):
        y=self.linear1(x)
        a=torch.ones(2,3).to(device)
        y=y+a
        return y
device=torch.device("cuda")
class fc_layer(nn.Module):
    def __init__(self):
        super(fc_layer,self).__init__()
        self.linear1=nn.Linear(emb_size,emb_size)
    def forward(self,x):
        y=self.linear1(x)
        a=torch.ones(2,3,device=device)
        y=y+a
        return y

by the way, is there any faster ways?

1 Like
  1. Will use two copies: when creating the tensor (as you are not using from_numpy) and the data transfer to the GPU
  2. Only the transfer
  3. Will create the tensor on the device directly, so I would use this approach.

Of course you won’t notice a significant difference given the current shapes, but I assume that this code snippet is only to show the different approaches.

1 Like

okay,thank you.Concerning the speed,i have one more question:

class net(nn.Module):
    def __init__(self):
        super(net,self).__init__()
        self.linear=nn.Linear(2,2)
    def forward(self,x):
        for i in range(2):
            x=self.linear(x)
        return x

in the above code for i in range(2),the loop is controlled by cpu or gpu?if cpu,for the sake of speed,should i use the following way?i wonder if there is a special controller in GPU to do some easy control,so cpu can be free of computation in most time rather than control gpu all the time?

def forward(self,x):
    x=self.linear(x)
    x=self.linear(x)
    return x

While there is the new (in torch 1.10) cuda graphs API to deal with the very case, I would not worry about it too much until you find your GPU is underutilized from you using Python for loops.

The Python overhead is more or less constant per operation. The Thomas rule of thumb for this is that if your operands have hundreds of elements, the Python overhead is likely not a crucial bottleneck. On the other hand, if you have many operands that are much smaller, the mere Tensor administration overhead (in C++, too) is likely to be a problem, too.

Best regards

Thomas