Average of vector words

My transformer network gives me a vector of size 100x128x250 (100 is the number of words, 128 is the batch, 250 is feature_size). I want to convert forward (). 100x128x250 -> 128x250 (128 average). Then 128x250 -> 128x3 (here I know it as Linear (250, 3). How to do this? 100x128x250 -> 128x250

torch.mean(x, axis=1)