Hi all,
I want to extract video features. Is there a pre-trained C3D (https://arxiv.org/abs/1412.0767) network available?.
Thanks
Hi, Gkv,
There are more advanced I3D and P3D pytorch impementations.
P3D: Learning Spatio-Temporal Representation with Pseudo-3D Residual,ICCV 2017
I3D: Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, CVPR 2017
Hi zhanghaoinf,
Thanks for your kind reply. I am very new to pytorch. Can you provide some links that tell how to use these kinds of implementations in my pytorch code. (I know how to load models using torchvision.models).
Thanks
I have tested P3D-Pytorch. itโs pretty simple and should share similar process with I3D.
- Pre-process: For each frame in a clip, there is pre-process like subtracting means, divide std.
An example:
import cv2
mean = (104 / 255.0, 117 / 255.0 ,123 / 255.0)
std = (0.225, 0.224, 0.229)
frame = cv2.imread(โa string to image pathโ)
frame /= 255.0 # [0, 255] โ [0, 1]
frame -= mean # subtract means
frame /= std # div std
frame = frame[:, : , (2, 1, 0)] # covert BGR image to RGB image
- A clip is a stack of frames with frame size H x W x 3 and clip size T x H x W x 3
since P3D requires input in form of 3 x T x H x W, perform:
clip = clip.permte(3, 0, 1, 2).contiguous()
- Load P3D model
(P3D, Bottleneck, p3d_model_path can be found in the mentioned github.)
model = P3D(Bottleneck, [3, 8, 36, 3], modality = โRGBโ)
p3d_weights = torch.load(โa string to p3d model pathโ)[โstate_dictโ]
model.load_state_dict(p3d_weights)
out=model(data)
print(out.size(),out)
Hi,
Thanks a lot for your fast reply. One more doubt, what are these modalities (RGB and Flow)?
Thanks
RGB modality refers to frames which are directly filmed by camera.
Flow refers optical flow that calculated between adjacent frames, which reflect motion information.
So in p3d, flow modality is for getting the motion description of the video and rgb is for the spatial features?
yes, you are right.
One more thing, optical flow can reflect motion patterns to some extents, but it is different from motion descriptor like dense trajectory. There exist some implementations to extract optical flow, such as fast-flow, brox-flow and TVL1. This part I didnโt study to much. Here are some references:
[1]. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks [Flow-Net v2] CVPRโ17
[2]. Flownet learning optical flow with convolutional networks [Flow-Net v1] ICCVโ15
[3]. Fast optical flow using dense inverse search [fast-flow] ECCVโ16
[4]. TV-L1 Optical Flow Estimation, Image Processingโ13
[5]. High accuracy optical flow estimation based on a theory for warping. [brox-flow] ECCVโ04
Hi Zhanhaoinf,
Can the model run in single GPU easily? or does require multiple GPUs?
hi, I meet the same question as you, and I find a pytorch implement of C3D. The link is following:
https://github.com/DavideA/c3d-pytorch
Greetings!
I am interested in replicating the C3D paper by Du Tran. The original repository is in caffe. I request you to please share the fast.ai or PyTorch implementation for the same.
I was able to find the following resources:
โโโโโโโโโโโโโโโ
PyTorch-Video-Recognition
Repository containing models lor video action recognition, including C3D, R2Plus1D, R3D, inplemented using PyTorch (0.4.0)
Trained on UCF101 and HMDB51 datasets
โโโโโโโโโโโโโโโ
Pytorch porting of C3D network, with Sports1M weights
Defining the C3D model as per the paper, not the complete implementation
โโโโโโโโโโโโโโโ
I am aware that developments have been made in the field of action recognition since 2015, but I am specifically interested in the C3D paper by FAIR.
Thanking you (all) in anticipation