I was wondering if there exists an operation that given a list of (1-d) tensors, joins them in a single (2-d) tensor, s.t. the resulting tensor is a ‘view’ of the original tensors i.e. changing the values in either the original tensors or the joined one, would reflect in changes in the other?
torch.cat and torch.stack seem to create a new tensor. The resulting vectors of torch.chunk are views of the original but the operation is inverse to what I need.

No, there isn’t and it’s not supported. The JIT fusers can fuse a final torch.cat by allocating the entire tensor and then filling the parts.
Your best option is to keep around the larger tensor / allocate a larger tensor in advance and then work on the parts.

Working with preallocated tensor wouldn’t be feasible in my case as I would like the gradient to be calculated only for certain rows of that large tensor, which is also not possible in Pytorch if I understand well? It is possible that the gradient is set to 0 for the rows not used, but I need them to not be calculated - so that backward time doesn’t depend on the total number of rows in that large matrix, but only on the number of rows used in that operation (and their size ofc). Do you have an idea on how to achieve this?

I found this to be similar to torch.nn.functional.embedding, but there backward time (and also optimizer step time) depend on the total size of the values matrix.

I think modelling it similar to the “sparse” option for embedding is pretty much how you would manage it.
I would probably try to use custom autograd.Functions with a “custom sparse representation” (i.e. you keep track of the indices of non-zero rows or somesuch and then only instantiate a tensor with these parts).