Could pytorch provide correlation operator?

Jiayuan_Li · June 3, 2020, 10:18am

type or paste code here

Hi, I am working on optical flow projects, and most of the methods utilizes the correlation of two feature maps to get the similarity. However, some researchers implement the CUDA programming on correlation, like FlowNet and PWC-net; while others calculate all pairs correlation like, (RAFT, https://arxiv.org/pdf/2003.12039.pdf). Could Pytorch support a correlation operator for us? Or can we easily implement the method using existing operators.

The calculation is straightforward.
cost_volume = corr(fmap1, fmap2, search_range)
given fmap1[B, H, W, C] and fmap2[B, H, W, C], search_range=3 (a radius of search range)

The cost_volume should be [B, H, W, 49]. which stores the correlation value between each feature in fmap1 and its corresponding n

ptrblck · June 4, 2020, 6:25am

You could use F.conv2d and just flip the filter kernel to get a cross-correlation.
Would that work or are you looking for another specific implementation?

Jiayuan_Li · June 8, 2020, 1:51am

hhh…I see. It’s not the basic math calculation, but a way to calculate the cost volume between feature maps. Conv2D only applies a window on a image but cost volume is from both feature map

ptrblck · June 8, 2020, 5:16am

How is this cost volume operation defined?
Could you give an example using two random feature maps?

Jiayuan_Li · June 9, 2020, 1:49am

Here is an example in PWC-Net. https://youtu.be/LBJ20kxr1a0?t=775
The main idea is to calculate the correlation as dot product between two features. Larger result indicates more similar.
Meanwhile, the correlation could be between each feature in map1 and all features in map2; and each feature in map1 and its corresponding neighbor in map2.

ptrblck · June 9, 2020, 4:56am

It’s a bit unclear from the video, what the scalar output should represent.
A correlation between two signals would result in another signal, so are the authors using the peak of this correlation or another metric/coefficient?

Jiayuan_Li · June 10, 2020, 1:52am

Hi,
Here is the definition in the PWC-net

We can think of two signals you mentioned as two feature maps from CNN, let’s say fmap1 [48, 128, 64] and fmap2 [48, 128, 64] where [H, W, C]. Then “result in another signal” should be the cost volume.

The way to calculate the cost volume should be like this:
For each feature in fmap1 [i, j, 128], we find its corresponding neighbor in fmap2: [i-3, j-3, 128] [i-2,j-3, 128]…[i, j, 128]…[i+3, j+3, 128]. Then we do dot product between feature1 [i, j, 128] between each feature2 then we got N correlation values where N is the number of pixels located in neighbor, and we cat those scalars we get [1, 1, N] correlation feature. Finally loop over all pixels in fmap1 then get a cost volume as [48, 128, N] which stores the correlation value and represents similarity between fmap1 and fmap2

ptrblck · June 10, 2020, 5:31am

Based on your description this would be similar to a convolution (or correlation, if you flip one kernel), which would be possible to apply using F.conv2d and two inputs.
However, I’m a bit confused by the shapes.
If we scale down the problem a bit and assume that both feature maps have a single channel, the correlation output would have the shape [h1 + 2*h2 -2, w1 + 2*w2 -2], of you perform a full correlation, wouldn’t it? These output values seem to correspond to the N correlation values you mentioned. In the last step you mention that you would repeat this step over all pixels, whihc is unclear to me.
Could you post a pseudo code or a dummy example in PyTorch or numpy?

Jiayuan_Li · June 10, 2020, 6:25am

This link from PWC-net might be helpful: https://github.com/deqings/PWC-Net/issues/1#issuecomment-404290729

As we know that the correlation is a flipped convolution in pure math but here it doesn’t function like that style. Conv2D receives an input feature and then convs it by a kernel defined by user. However, you see that cost volume involves two different feature maps. In this case, all research publication in optical flow needs to implement CUDA programming to do such “correlation”. Like: FlowNet, FlowNet2, PWC-net. If pytorch is able to provide a official Correlation or CostVolume API, it would be great for both research and industry.

Here is the CUDA and python code from PWC-net.

Jiayuan_Li · June 10, 2020, 6:59am

As Deqin Sun mentioned, two different cost volumes exist.

I write a naive pseudo code

fmap1 = torch.ones(C, H, W) # [C, H, W]
fmap2 = torch.ones(C, H, W)


# Full cost volume (all-pairs correlation):
cost_vol = torch.ones(H, W, 1, H, W)
for i in range(H):
	for j in range(W):
		vec1 = fmap1(:, i, j)
		corr_ij = dot(vec1.T, fmap2.view(:, -1)) # [1, C] *[C, H*W] = [1, H*W]
		cost_vol(i, j, 0, corr_ij.view(1, H, W))

cost_vol = cost_vol.view(H, W, H*W)

# Partial cost volume
r = 3
dx = linespace(-r, r, 2*r+1)
dy = linespace(-r, r, 2*r+1)
grid = stack(meshgrid(dy, dx), axis=-1)

cost_vol = torch.ones(H, W, (2*r+1)**2)
for i in range(H):
	for j in range(W):
		vec1 = fmap1(:, i, j)
		fmap2_neighbor = grid_sample(fmap2, grid+[i, j]) # sample a sqare fmap2_neighbor at (i, j) in fmap2 with size(2*r+1, 2*r+1)
		corr_ij = dot(vec1.T, fmap2_neighbor.view(:, -1)) # [1, C] *[C, (2*r+1)**2] = [1, 49] 
		cost_vol(i, j, :) = corr_ij

# the first and second channel of final cost volume should be equal to the fmap1 shape. 
# We don't need to care about the order of rest of channels but the product result of the number of other channels indicates that how many feature in fmap2 we calculate the correlation with each feature in fmap1. EX: [H, W, H*W] means all-pairs correlation known as each feature in fmap1 is correlated with each feature in fmap2;   [H, W, 49] means each feature in fmap1 is correlated with a [7, 7] in fmap2.

yangyi02 · August 28, 2020, 10:20am

Also need the built-in correlation function.

Due to some reason (gcc =9.3.0, cuda=10.1, and I do not have permission to change them), I cannot compile the existing pytorch correlation_package.

It will be very helpful if there is such a built-in function.

Thank you.

Ruslan_Baynazarov · December 22, 2020, 9:00am

Would be great to have this layer added to pytorch.

I’m using Clement Pinard’s implementation: GitHub - ClementPinard/Pytorch-Correlation-extension: Custom implementation of Corrleation Module
It works with torch 1.7.0 and cuda 11.1.
However it needs to be compiled from source, which lowers the reproducibility of my code for other developers.

Correlation layer is essential in modern flow architectures (FlowNetC, PWCNet, MaskFlowNet).
Adding this layer to pytorch will also speed up adding it to the runtime frameworks (ONNX, openvino) so that we are able to compute optical flow on edge devices.

Etienne_Perot · March 22, 2021, 1:51pm

would this correspond in pure pytorch? it computes inner product between the central pixel to every pixels in the surrounding:

import torch
from einops import rearrange
unfold_op = torch.nn.Unfold(k1, dilation=1, padding=k1//2, stride=1)

def pixel_cost_volume(im1, im2, unfold_op):
    """im1/2 b c h w

    -> produces b h w k**2 cost volume, each channel represents
    inner product with a different discrete shift (i,j)
    """
    b,c,h,w = im1.shape
    central_pixel = rearrange(im1, 'b c h w -> b h w c')
    neighbors = unfold_op(im2).reshape(b,-1,h,w)
    neighbors = rearrange(neighbors, 'b (c k2) h w -> b h w k2 c', c=c)
    y = torch.einsum('...i c, ...c -> ...i', neighbors, central_pixel)
    return y # (b, h, w, search_range x search_range)

Diego_Paez · December 13, 2021, 1:04am

A native pytorch implementation would be great. Given two same sized images, this layer is similar to conv2D layer between image1 and a padded image2. But there are relevant differences: firstly there are no weights, image2 takes the place of the filters, secondly the multiplications are not added together only the channels are averaged, thirdly each multiplication between each of the images is stored in a different output channel. Take a look at:

This layer is very useful for optical flow.

jondo · March 29, 2022, 4:09pm

To which extent is this topic a duplicate of Request for correlation layer?

shakxy42 · January 19, 2023, 2:14am

Has this layer been implemented in PyTorch as of today? I see that RAFT is available as a model in torchvision
RAFT uses the correlation layer to compute a 4D cost volume. The optical flow community would really appreciate if this layer is added to PyTorch

Avishka_Perera · September 8, 2023, 7:29am

@ptrblck Are there any Pytorch native implementations yet? Or is there any custom implementation Pytorch would recommend? I am currently trying to compile it from this repo: GitHub - NVlabs/PWC-Net: PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, CVPR 2018 (Oral)
Unfortunately, it was developed for a very old version and I am not able to use new Cuda versions or Pytorch versions

vadimkantorov · September 27, 2023, 10:39pm

Might be related torch.sampled_addmm (if we explicitly compute the neighborhood indices + don’t know if it’s good perfwise, as maybe a 2d/3d with explicit local neighborhoods allow for better perf than the generic case): Implementation of torch.sparse.sampled_baddmm · Issue #105319 · pytorch/pytorch · GitHub

vadimkantorov · September 28, 2023, 9:31am

Also, please feel free to chime in [question] Local video correlation (with temporal context length > 1) · Issue #148 · getkeops/keops · GitHub I think it might be possible to accomplish this in keops