Python lookup table or dictionary saved in the GPU

Hi, I want to create a lookup table or say a dictionary which is going to be accessed and modified during training. Using dict in Python, my lookup table is going to be saved in the CPU and I’m concerned about the back and forth between the GPU and CPU. How can I move the lookup table/dictionary from CPU to the GPU and avoid that? Can I use the nn.Embedding(..) as a lookup table?

Thank you


What kind of objects will you save there? You can save cuda tensors in a python dictionary and there won’t be any copy when you access them.
Keep in mind that unless you call .cuda() or .t("cuda") on a Tensor, pytorch will never move stuff to and from the gpu. So it is easy for you to control when things are exchanged between the two.

Hi, thank you for your fast reply.
I want to save output feature of my network. key is the label and value is the feature

k[label_i] = feature(x_i)

I will progressively update the dictionary and use it to compute a cosine similarity with another branch of my network. Can I use nn.Parameter, nn.Embedding or a simple python dict is okay?
Thank you

For such thing a python dict will be perfect for the following reasons:

  • It is easy to use and make clean code
  • It will hold references to Tensors, so nothing will be copied.
  • If you save Tensors for inspection and don’t need gradient through these new computations, you can save feature(x_i).detach() to avoid keeping the autograd informations.
  • If you need gradients (for example if you save these in a dict to later add an l2 penalty to your loss) then don’t detach.

nn.Parameter is definitely not what you want as it should only be used to define new parameters in an nn.Module.
nn.Embedding is used more to learn the features associated with a given label. So even though you could use it, it would be cumbersome to fit to your use.

Thank; i finally decided to register a buffer and update the buffer progressively during the training, Saving a high-dim feature in the CPU and later use it for cosine similarity is memory consuming.

class SourceMemory(Function):
    def __init__(self, M, alpha=0.01):
        super(SourceMemory, self).__init__()
        self.M = M
        self.alpha = alpha

    def forward(self, inputs, targets):
        self.save_for_backward(inputs, targets)
        outputs =
        return outputs

    def backward(self, grad_outputs):
        inputs, targets = self.saved_tensors
        grad_inputs = None
        if self.needs_input_grad[0]:
            grad_inputs =
        for x, y in zip(inputs, targets):
            self.M[y] = self.alpha * self.M[y] + (1. - self.alpha) * x
            self.M[y] = F.normalize(self.M[y], p=2, dim=0)
        return grad_inputs, None

class IntraNet(nn.Module):
    def __init__(self, beta=0.05, alpha=0.01, num_classes=0, num_features=0, weight=None):
        super(IntraNet, self).__init__()
        self.beta = beta
        self.alpha = alpha
        self.weight = weight
        self.register_buffer('M', torch.zeros(num_classes, num_features))

    def forward(self, inputs, targets, source_feat=None, epoch=None):
        self.alpha = self.alpha * epoch
        outputs = SourceMemory(self.M, alpha=self.alpha)(source_feat, targets)
        outputs /= self.beta
        loss = F.cross_entropy(inputs, targets, weight=self.weight)
        return loss, outputs