Fusion resnet3d

Advise please

I trained resnet50 (mmaction2) on rgb frames i got 90 val , optical flow 88 % , pretrained torchvision , when i fuse models with new head, tthe new model don’t improve well than each alone i got max 85%, i expected when concatenate features as space and temporal the model will be better

How. Can i improve it,