How to apply vmap on a heterogeneous tensor

Roy-Kid · December 11, 2024, 2:25pm

Hi,
I have a kernel that works well on non-batch data, and I want to extend it to batchwise-format, and calculate gradient through it. The problem is our batch is not heterogeneous, it looks like:

x = torch.tensor([1, 2, 2, 3, 3, 3.])
batch = torch.tensor([0, 1, 1, 2, 2, 2])
natoms = torch.bincount(batch)
stride = torch.tensor([0, 1, 3, 6])

what we want to do is use vmap instead of for loop:

result = []
for i in stride:
    ans = kernel(x[stride: stride+1])
    result.append(ans)
return torch.concat(result)

vmap(kernel, in_dims=(None, 0, 0))(x, stride[:-1], stride[1:])

apparently it’s not working. What should I do? is there any other solution except padding?

soulitzer · December 14, 2024, 2:46am

You might be interested in nested tensors which are special subclass of tensor designed to work on batches of data where the batches are jagged/have varying length.

torch.nested.nested_tensor_from_jagged(values=x, offsets=stride)

# NestedTensor(size=(3, j1), offsets=tensor([0, 1, 3, 6]), contiguous=True)

There’s a tutorial here:
https://pytorch.org/tutorials/prototype/nestedtensor.html

(Note that some parts of it are out of date, any call to torch.nested.nested_tensor should be passed a layout=torch.jagged argument; The part about implementing MHA has been updated relatively recently though)

NestedTensors unfortunately don’t work with vmap yet, so you’ll need to explicitly write out the batch dim in your program. (We have plans to support this soon though)

Roy-Kid · December 16, 2024, 2:01pm

Thank you so much! I’ve used nested tensors in my collate_fn before, but I haven’t tried applying it in this context yet. I’ll test it out and update this post once I have some results.

Roy-Kid · December 16, 2024, 8:28pm

I have read the doc but am still slightly confused: when should I use jagged?

    # Note: the torch.jagged layout is a nested tensor layout that supports a single ragged
    # dimension and works with torch.compile. The batch items each have shape (B, S*, D)
    # where B = batch size, S* = ragged sequence length, and D = embedding dimension.

here is my typical atom coordinates:

nested_tensor([
  tensor([[0, 0, 0],
          [0, 0, 1],
          [0, 1, 0]]),
  tensor([[0, 0, 0],
          [0, 0, 1],
          [0, 1, 0],
          [1, 0, 0],
          [0, 1, 1]])
])

I believe those tensors I don’t need a jagged layout. But if I have a bunch of tensors with shape [(n, ), (m, )...], then I need jagged? Is that jagged mean the dimension in the middle is irregular?

soulitzer · December 16, 2024, 9:19pm

I believe those tensors I don’t need a jagged layout.

Yep

But if I have a bunch of tensors with shape [(n, ), (m, )...] , then I need jagged?

Jagged layout only supports a single jagged dimension, e.g. among the tensors that you have, only a single dimension can vary. So if you have a bunch of 2-D tensors where n/m can both vary, then jagged layout would not work.

Is that jagged mean the dimension in the middle is irregular?

Your jagged dimension cannot be the very first one because there needs to be a preceeding batch dimension.
But you can have a 2D nested tensor, e.g. with shape= [3, [1, 2, 3]] where the varying dimension is the very last one.

Roy-Kid · January 6, 2025, 8:45pm

I find the tracker [Tracker] Move nested tensors to beta · Issue #112398 · pytorch/pytorch · GitHub. Is there any chance you could provide an estimate for when vmap will support nested tensors? I’d really love to see this feature as well.