Could pytorch provide correlation operator?

type or paste code here

Hi, I am working on optical flow projects, and most of the methods utilizes the correlation of two feature maps to get the similarity. However, some researchers implement the CUDA programming on correlation, like FlowNet and PWC-net; while others calculate all pairs correlation like, (RAFT, Could Pytorch support a correlation operator for us? Or can we easily implement the method using existing operators.

The calculation is straightforward.
cost_volume = corr(fmap1, fmap2, search_range)
given fmap1[B, H, W, C] and fmap2[B, H, W, C], search_range=3 (a radius of search range)

The cost_volume should be [B, H, W, 49]. which stores the correlation value between each feature in fmap1 and its corresponding n


You could use F.conv2d and just flip the filter kernel to get a cross-correlation.
Would that work or are you looking for another specific implementation?

hhh…I see. It’s not the basic math calculation, but a way to calculate the cost volume between feature maps. Conv2D only applies a window on a image but cost volume is from both feature map

How is this cost volume operation defined?
Could you give an example using two random feature maps?

Here is an example in PWC-Net.
The main idea is to calculate the correlation as dot product between two features. Larger result indicates more similar.
Meanwhile, the correlation could be between each feature in map1 and all features in map2; and each feature in map1 and its corresponding neighbor in map2.

It’s a bit unclear from the video, what the scalar output should represent.
A correlation between two signals would result in another signal, so are the authors using the peak of this correlation or another metric/coefficient?

Here is the definition in the PWC-net

We can think of two signals you mentioned as two feature maps from CNN, let’s say fmap1 [48, 128, 64] and fmap2 [48, 128, 64] where [H, W, C]. Then “result in another signal” should be the cost volume.

The way to calculate the cost volume should be like this:
For each feature in fmap1 [i, j, 128], we find its corresponding neighbor in fmap2: [i-3, j-3, 128] [i-2,j-3, 128]…[i, j, 128]…[i+3, j+3, 128]. Then we do dot product between feature1 [i, j, 128] between each feature2 then we got N correlation values where N is the number of pixels located in neighbor, and we cat those scalars we get [1, 1, N] correlation feature. Finally loop over all pixels in fmap1 then get a cost volume as [48, 128, N] which stores the correlation value and represents similarity between fmap1 and fmap2

Based on your description this would be similar to a convolution (or correlation, if you flip one kernel), which would be possible to apply using F.conv2d and two inputs.
However, I’m a bit confused by the shapes.
If we scale down the problem a bit and assume that both feature maps have a single channel, the correlation output would have the shape [h1 + 2*h2 -2, w1 + 2*w2 -2], of you perform a full correlation, wouldn’t it? These output values seem to correspond to the N correlation values you mentioned. In the last step you mention that you would repeat this step over all pixels, whihc is unclear to me.
Could you post a pseudo code or a dummy example in PyTorch or numpy?

This link from PWC-net might be helpful:

As we know that the correlation is a flipped convolution in pure math but here it doesn’t function like that style. Conv2D receives an input feature and then convs it by a kernel defined by user. However, you see that cost volume involves two different feature maps. In this case, all research publication in optical flow needs to implement CUDA programming to do such “correlation”. Like: FlowNet, FlowNet2, PWC-net. If pytorch is able to provide a official Correlation or CostVolume API, it would be great for both research and industry.

Here is the CUDA and python code from PWC-net.

As Deqin Sun mentioned, two different cost volumes exist.

I write a naive pseudo code

fmap1 = torch.ones(C, H, W) # [C, H, W]
fmap2 = torch.ones(C, H, W)

# Full cost volume (all-pairs correlation):
cost_vol = torch.ones(H, W, 1, H, W)
for i in range(H):
	for j in range(W):
		vec1 = fmap1(:, i, j)
		corr_ij = dot(vec1.T, fmap2.view(:, -1)) # [1, C] *[C, H*W] = [1, H*W]
		cost_vol(i, j, 0, corr_ij.view(1, H, W))

cost_vol = cost_vol.view(H, W, H*W)

# Partial cost volume
r = 3
dx = linespace(-r, r, 2*r+1)
dy = linespace(-r, r, 2*r+1)
grid = stack(meshgrid(dy, dx), axis=-1)

cost_vol = torch.ones(H, W, (2*r+1)**2)
for i in range(H):
	for j in range(W):
		vec1 = fmap1(:, i, j)
		fmap2_neighbor = grid_sample(fmap2, grid+[i, j]) # sample a sqare fmap2_neighbor at (i, j) in fmap2 with size(2*r+1, 2*r+1)
		corr_ij = dot(vec1.T, fmap2_neighbor.view(:, -1)) # [1, C] *[C, (2*r+1)**2] = [1, 49] 
		cost_vol(i, j, :) = corr_ij

# the first and second channel of final cost volume should be equal to the fmap1 shape. 
# We don't need to care about the order of rest of channels but the product result of the number of other channels indicates that how many feature in fmap2 we calculate the correlation with each feature in fmap1. EX: [H, W, H*W] means all-pairs correlation known as each feature in fmap1 is correlated with each feature in fmap2;   [H, W, 49] means each feature in fmap1 is correlated with a [7, 7] in fmap2.       

1 Like

Also need the built-in correlation function.

Due to some reason (gcc =9.3.0, cuda=10.1, and I do not have permission to change them), I cannot compile the existing pytorch correlation_package.

It will be very helpful if there is such a built-in function.

Thank you.

Would be great to have this layer added to pytorch.

I’m using Clement Pinard’s implementation: GitHub - ClementPinard/Pytorch-Correlation-extension: Custom implementation of Corrleation Module
It works with torch 1.7.0 and cuda 11.1.
However it needs to be compiled from source, which lowers the reproducibility of my code for other developers.

Correlation layer is essential in modern flow architectures (FlowNetC, PWCNet, MaskFlowNet).
Adding this layer to pytorch will also speed up adding it to the runtime frameworks (ONNX, openvino) so that we are able to compute optical flow on edge devices.

would this correspond in pure pytorch? it computes inner product between the central pixel to every pixels in the surrounding:

import torch
from einops import rearrange
unfold_op = torch.nn.Unfold(k1, dilation=1, padding=k1//2, stride=1)

def pixel_cost_volume(im1, im2, unfold_op):
    """im1/2 b c h w

    -> produces b h w k**2 cost volume, each channel represents
    inner product with a different discrete shift (i,j)
    b,c,h,w = im1.shape
    central_pixel = rearrange(im1, 'b c h w -> b h w c')
    neighbors = unfold_op(im2).reshape(b,-1,h,w)
    neighbors = rearrange(neighbors, 'b (c k2) h w -> b h w k2 c', c=c)
    y = torch.einsum('...i c, ...c -> ...i', neighbors, central_pixel)
    return y # (b, h, w, search_range x search_range)
1 Like

A native pytorch implementation would be great. Given two same sized images, this layer is similar to conv2D layer between image1 and a padded image2. But there are relevant differences: firstly there are no weights, image2 takes the place of the filters, secondly the multiplications are not added together only the channels are averaged, thirdly each multiplication between each of the images is stored in a different output channel. Take a look at:

This layer is very useful for optical flow.

To which extent is this topic a duplicate of Request for correlation layer?

Has this layer been implemented in PyTorch as of today? I see that RAFT is available as a model in torchvision
RAFT uses the correlation layer to compute a 4D cost volume. The optical flow community would really appreciate if this layer is added to PyTorch

1 Like

@ptrblck Are there any Pytorch native implementations yet? Or is there any custom implementation Pytorch would recommend? I am currently trying to compile it from this repo: GitHub - NVlabs/PWC-Net: PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, CVPR 2018 (Oral)
Unfortunately, it was developed for a very old version and I am not able to use new Cuda versions or Pytorch versions

Might be related torch.sampled_addmm (if we explicitly compute the neighborhood indices + don’t know if it’s good perfwise, as maybe a 2d/3d with explicit local neighborhoods allow for better perf than the generic case): Implementation of torch.sparse.sampled_baddmm · Issue #105319 · pytorch/pytorch · GitHub

Also, please feel free to chime in [question] Local video correlation (with temporal context length > 1) · Issue #148 · getkeops/keops · GitHub :slight_smile: I think it might be possible to accomplish this in keops