DropConnect with changing probabilities

smonsays · May 29, 2020, 5:20pm

In this previous thread, an implementation of DropConnect is discussed. The recommended method assumes that the dropout probability remains constant. I am now trying to extend this to the case where every parameter has its own dropout probability and the dropout probabilities are updated during training according to some learning rule.

In my attempt so far, I register an additional buffer for each module that contains the dropout probabilities. However, I have trouble then updating the probabilities during training. To update the probabilities I use the following way of accessing the buffer:

old_probs = getattr(module, param_name + "_prob")
new_probs = learning_rule(old_probs)
setattr(module, param_name + "_prob", new_probs)

The updates don’t seem to work as intended. Can someone tell me whether this is a reasonable approach, or if I am missing something?

Here is the slightly modified _weight_drop() function with the additional buffer:

def _weight_drop(module, weights, dropout):
    """
    Helper for `WeightDrop`.
    """

    for name_w in weights:
        w = getattr(module, name_w)
        del module._parameters[name_w]
        module.register_parameter(name_w + '_raw', Parameter(w))
        module.register_buffer(name_w + '_prob', torch.full_like(w, dropout))

    original_module_forward = module.forward

    def forward(*args, **kwargs):
        for name_w in weights:
            raw_w = getattr(module, name_w + '_raw')
            prob_w = getattr(module, name_w + '_prob')
            w = torch.bernoulli(1.0 - prob_w) * raw_w
            setattr(module, name_w, w)

        return original_module_forward(*args, **kwargs)

    setattr(module, 'forward', forward)

michaelklachko · May 29, 2020, 5:53pm

Could you explain more what you’re trying to do and what’s not working? Try using my simpler method to do dropconnect (instead of the “recommended” one).

rwightman · May 29, 2020, 7:03pm

@smonsays overall it looks a bit overcomplicated, my comment re the changing probabilities issue, I’d do it one of two ways

encapsulate the DropConnect in its own module that has the drop prob as a scalar attribute, iterate over the model modules when you want to change it, and adjust that scalar for your DropConnect modules.
rely on Python pass-by-reference and wrap the scalar drop value in a class with a float conversion and call float(drop_prob) when you want to use it, you can impl your drop connect as functional or module

In both of the above, I’d use torch.empty(tensor_size).bernouilli_(scalar_prob) instead of the torch.bernoulli which requires a tensor for prob arg and thus you must match device, etc . Also, I don’t think prob should be a buffer, it will end up in the state dict, it is a training detail in most situations.

smonsays · May 29, 2020, 10:21pm

Thank you for your responses.

@michaelklachko: I need the flexibility of being agnostic to the actual module to which DropConnect is applied. That is, it should work for any torch.nn module that has a weight parameter (nn.Linear, nn.Conv2D etc.). Your simpler method is in my case not really simpler as it requires me to put “module code” into the training loop which I would like to avoid to keep the usage generic.

@rwightman: Your suggestions assume that I have a single scalar drop_prob value, but in my case I need to control this parameter for every weight individually. That is why I am using the full tensor. I register it as a buffer to make sure that PyTorch properly moves it to the correct device when to() is called.

rwightman · May 29, 2020, 10:38pm

@smonsays ah, I read it as you wanting to just change the value as training progresses, not have multiple different values per layer as well. You could still achieve that with scalars, with either 1 or 2 as long as you have a mapping of layer names to desired propabilities wherever you do the updates