For an input of shape (T, B, E), or (B, T, E) if batch_first is True, where T is the length of the longest sequence (equal to lengths[0] if enforce_sorted is True), B is the batch size, and E is any number of dimensions (including 0, but in general it is the embedding dimension).
So lengths contains the length of the B sequences : it must be of shape (B), and not of empty shape as in your case. This is what causes the problem.
For example if I have the three (B=3) following sequences (here their indexes in the vocabulary after tokenisation, padding_token_index = 0) : [[2, 5, 0, 0], [1, 0, 0, 0], [7, 5, 2, 2]] . Then T = 4 and lengths = [2, 1, 4] (should be a 1D CPU int64 tensor).
I solve the problem, sentence_length contain two dim, first dim present batch_size, second dim present each length of sentence. Hope can help you in the answer.