What's the disadvantage between using two smaller kernel instead of a larger kernel

The smaller kernel size,applied in VGG,is proved to be effective in capturing rich context information,We also use
larger kernel to cover a large region in a tensor. Instead of using a 5X5 kernel ,we can use kernel size= (5,1) and (1,5) cascaded conv layer or use two 3x3 conv layer cascadely.Both methods can reduce the computatioanl comsuption. What’s the difference? Is there any disadvantage of using 2 conv layers?