Take a look at the original paper for more insight.
For focal loss:
" In practice α may be set by inverse class frequency or treated as a hyperparameter to set by cross validation"
Also, your problem is not highly imbalance. You can use weighted cross entropy and get a good performance.