Question about training the threshold for k-means clustering

I need to perform k-means clustering on a set of points. To initialize the centroids, I use a method where the first centroid covers the most points within a certain distance, the second centroid covers the most points except those already covered by the first centroid, and so on. I would like to make the distance parameter trainable. How can I achieve this? Thank you in advance to anyone who takes the time to answer my question. Your help is greatly appreciated!

My idea is to use sigmoid as a differentiable function to clip values at a certain threshold, allowing for differentiable gradients. Is it feasible?