Variable sized outputs in custom operator?

This might be a stupid question, but I cannot work out how to do it.

I have a custom operator which does some 3d rendering with forward and backward defined in c++ code. Normally I understand how to wrap a custom operator - I have to write a fake kernel which takes inputs and returns the correct size outputs. This is fine for most of the outputs of my operator, but I also have some outputs which depend on a 3d projection to a grid, which it isn’t possible to calculate the output size without knowing the full content of the input data and rendering it, i.e. running the full operator. Is there any way to wrap a custom operator with some variable sized output tensors? The theoretical maximum of the output buffers is really quite large, so I can’t just use a massive fixed size buffer here without wasting masses of GPU memory.

Why is that a hard requirement? Typically, it makes sense to write a fake/meta kernel to be able to use tools like torch.compile with your custom operator and have full support for optimizations across your operator. But if your operator really has a data-dependent output shape, this wouldn’t be possible. In that case, you can simply omit the meta kernel definition.

Or is there another reason why you need to run this custom operator with fake tensor inputs?