Turning loop into matrix operation

I have an operation, that can be reproduced using the following code

reg_sig = torch.randn([32, 9, 5])
reg_adj = torch.randn([32, 9, 9, 4])

n_node = 9
n_bond_features = 4

SM_f = nn.Softmax(dim=2)
SM_W = nn.Softmax(dim=3)

p_f = SM_f(reg_sig)
p_W = SM_W(reg_adj)

V = torch.zeros([reg_sig.size(0), n_node])

for i in range(n_node):
    for j in range(n_node):
        if i is not j:
            for k in range(n_bond_features):
                V[:, i] += k * p_W[:, i, j, k]

I would like to get rid of the loop at the end. Here is my implementation

h_vec = torch.arange(n_bond_features).float()
inner_sum = torch.einsum('i,bjki->bjk', h_vec, p_W)
V = inner_sum-torch.diag_embed(torch.einsum('...ii->...i', inner_sum))
V = V.sum(2)

I was wondering if this is the most efficient way to do it.