Best way to implement model for compatibility with CPU and GPU

Hi, I was implementing a Time-Delay Neural Network as a warmup to using PyTorch and have gotten it to work on a CPU but the same model won’t work on the GPU because one of the variables that I have is not a .cuda type variable.

class TDNN(nn.Module):
    def __init__(......):
        self.context = Variable(torch.LongTensor(context))
        self.kernel = nn.Parameter(torch.Tensor(output_dim, input_dim, self.kernel_width).normal_(0,stdv))
        self.bias = nn.Parameter(torch.Tensor(output_dim).normal_(0,stdv))

In the above snippet, I’m pretty sure that after calling net = TDNN().cuda() the kernel and bias are now .cuda type tensors but context is not. I tried including the context as a parameter but then when I initialize my optimizer like: optimizer = optim.Adam(net.parameters()) it gives me an error since I don’t want the gradient to be computed for the context variable.

To give some more details: the context tensor is being used in a torch.index_select call. This is why I have it initialized as a LongTensor.
I could probably figure out a hack to solve this but I was wondering what the best practice was for a problem like this.
Thanks!

So one example of a hack that would work is the following:
I introduced a self.cuda_flag:

if type(self.bias.data) == torch.cuda.FloatTensor and self.cuda_flag == False:
            self.context = self.context.cuda()
            self.cuda_flag = True

While this does solve the problem, I’m wondering if there is a more graceful and efficient way to do this.

you could use register_buffer

class TDNN(nn.Module):
    def __init__(......):
        self.register_buffer('context', torch.LongTensor(context))
        self.kernel = nn.Parameter(torch.Tensor(output_dim, input_dim, self.kernel_width).normal_(0,stdv))
        self.bias = nn.Parameter(torch.Tensor(output_dim).normal_(0,stdv))
2 Likes

Thanks!
I had a feeling I was missing something when I saw that the description of nn.module.cuda() which states that it moves the module’s parameters and buffers to the GPU. I didn’t even know some separate buffers existed. Guess I gotta read through the API docs more thoroughly!

mark. I just meet the same problem.

Excuse me,brother.I learned the tdnn recently. And i want to use it to implement speech separation.
This is my graduation design.I learned the paper 《A time delay neural network architecture for efficient modeling of long temporal contexts》 published in 2015.I want to implement the architecture, But I don’t know How. can you send me your TDNN project? my contact information is 735689717@qq.com.I am Chinese.My English is poor,Please forgive me and thank you very much.Looking forward to your reply. I really appreciate it.