Video clip data

Dear Pytorch users,

I have a dataset. It contains a bunch of video clips and labels.

e.g.
video clips -> clip0, clip1, clip2
labels -> 0, 1
label 0 means clip0 and clip1 should not be merged
label 1 means clip1 and clip2 should be merge

Can any model/architecture fit in this dataset? (the amount of data and label are not equal)

Thank you.