Best way to find threshold of multilabel action recognition after using MultilabelAveragePrecision

I trained a multilabel action recognition model using pytorch and I used for BCEWithLogitsLoss as criterion and during evaluation I evaluated using MultilabelAveragePrecision(num_labels=num_classes, average='micro' ). However, evaluating with MultilabelAveragePrecision works only when I have the the target output available, but since I would like to test the model on new video without any label I believe I need a thresholding system to do that. How can I obtain the best threshold, such as those used by MultilabelAveragePrecision for giving its results? Or are there any other way to fire only some on the actions that are most likely to be present in the clip?