Batched cross-correlation

so i have a tensor (a signal) f with shape (B,T,1) and another signal g with the same shape.
i want to perform “pairwise” cross correlation between samples with the same batch index. Namely, if I were to iterate the samples in the batch, I’d perform something like this:

all_xcorrs = []
for b in range(B): # where B is the batch size
    f_b = f[[b]].permute(0,2,1) # shape (1,1,T)
    g_b = g[[b]].permute(0,2,1) # shape (1,1,T)
    xcorr_b = F.conv1d(f_b,g_b, padding='same') # shape (1,1,T)
    all_xcorrs.append(xcorr_b)
all_xcorrs=torch.cat(all_xcorrs,axis=0) # should have shape (B,1,T)
all_xcorrs = all_xcorrs.permute(0,2,1) # (B,T,1)

how can I vectorize this process?
By the way - a workaround may be to calculate conv1d on the entire transposed tensors (which is similar to a nested loop for i in [B] for j in [B]: conv1d(f_i,f_j) ) and extracting the diagonal only:

f = f.transpose(1, 2)  # shape (B,1,T)
g = g.transpose(1, 2)  # shape (B,1,T)
all_xcorrs = F.conv1d(f, g, padding='same')
diag_indices = torch.arange(all_xcorrs.shape[0])
xcorrs = all_xcorrs[diag_indices, diag_indices, :].unsqueeze(-1)