Is the Kernel Size a Differentiable Parameter?

Can I use the output of a network to choose the kernel size for use later in the network? Would this be differentiable, i.e., will a network be able to learn the optimal pooling size passed into an avg pooling layer? Could someone provide an example of this ?

My data has special properties in that the amount of binning is very crucial as small bins would reduce performance and larger bins could obscure the data (the size of which varies).

you can’t take derivative wrt discrete parameters.

@SimonW So would there be no way to in effect learn the size of bins for a certain set of data? I’m trying to think of a way to compute this using pytorch operations instead of pooling in that case but I can’t think of anything.

it just doesn’t work that way mathematically, unless you soften the constraint and smooth the search space somehow of course.