What kind of filters are used with Conv2d?

I don’t think I found any info on this online but what are the default filters used by Conv2d?

I think filters are initialized with random values and they are updated via backpropagation.

Wait the filter values are updated, meaning that they likely change, every epoch?

As per my intutions, they should update every time when we do loss.backward().
Am I in the right directions, experts?

loss.backward() calculates all gradients in the current computation graph (including the filter weight gradients). The weights are updated if you call optimizer.step() (and passed these parameters to the optimizer before) or update the weights manually using the gradients.
@modeler in fact the weights change in every iteration if the gradients are non-zero.

oh, yes! sorry for the misconfusion I created @modeler.
Thanks for the answer @ptrblck.

What is the reasoning for a randomized initialization as opposed to some non-zero constant initialization?

I think symmetry breaking might be one reason, although the problem of equal outputs won’t be that serious like in linear layers.
E.g. if you initialize a linear layer with some constant weight, each output will have the same value. Later in the backward pass this could create the same weight updates for each parameter etc. Each weight thus cannot learn anything “new”, and you would have a whole layer of a cloned neuron.
Random initialization breaks this symmetry.

Also, I think another reason might be that your constant values might bias your model towards a particular solution, which might be useful, if you know what you are doing.

2 Likes

In addition to Peter’s spot-on comments about symmetry breaking, there is a the lottery ticket hypothesis, roughly speaking the theory that (overparametrised by traditional standards) NNs are “looking in many places of the parameter landscape, thereby picking up some useful ones”.

Weight initialization in particular is something that has been identified as fairly important and I can recommend spending thought on it - PyTorch inherits the initializations mostly from Torch, and might not always reflect the latest advice of how to do it. Most stock modules have a method reset_parameters that has the default (e.g. do ?? torch.nn.Conv2d.reset_parameters to see the source in IPython/Jupyter).
In contrast to weight, bias can, in my experience, often just be zeroed.

Best regards

Thomas

2 Likes

Hello ptrblck

I hope you are well. Sorry , I have a question about initialize the filters in pytorch. how I can specify them randomly from Gaussian distribution? are they from Gaussian distribution?

The filters in nn.Conv2d are initialized by reset_parameters as @tom mentioned.

To initialize them with a Gaussian distribution, you could use torch.nn.init.normal_.

i used Convd3. Where can I add this command in my code?

class ConvNetRedo1(nn.Module):
def init(self,numf1,numf2,fz1,fz2,nn2,nn3): # numf1( nnumber of filters first layer)numf2(nnumber of filters first layer)),fz1 kernel size(),fz2,nn2,nn3
super(ConvNetRedo1, self).init()
self.numf1=numf1
self.numf2=numf2
self.fz1=fz1
self.fz2=fz2
self.nn2=nn2
self.nn3=nn3
self.layer1 = nn.Sequential(nn.Conv3d(1, self.numf1, kernel_size=self.fz1, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))
self.layer2 = nn.Sequential(nn.Conv3d(self.numf1,self.numf2, kernel_size=self.fz2, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))

    self.fc1 = nn.Linear(3072, self.nn2) ##3027
    self.drop_out1 = nn.Dropout(0.5)
    self.relu1 = nn.ReLU() 
    self.fc2 = nn.Linear( self.nn2, self.nn3) 
    self.drop_out2 = nn.Dropout(0.5)
    self.relu2 = nn.ReLU()              
    self.fc3 = nn.Linear( self.nn3, 1)

You could write a weight_init function and call model.apply with it:

def weights_init(m):
    with torch.no_grad():
        if isinstance(m, nn.Conv3d):
            torch.nn.init.normal_(m.weight)
            torch.nn.init.normal_(m.bias)

net.apply(weights_init)

Is it correct to use??

def weights_init(m):
with torch.no_grad():
if isinstance(m, nn.Conv3d):
torch.nn.init.normal_(m.weight)
torch.nn.init.normal_(m.bias)

class ConvNetRedo1(nn.Module):
def init(self,numf1,numf2,fz1,fz2,nn2,nn3):
super(ConvNetRedo1, self).init()
self.numf1=numf1
self.numf2=numf2
self.fz1=fz1
self.fz2=fz2
self.nn2=nn2
self.nn3=nn3
self.layer1 = nn.Sequential(nn.Conv3d(1, self.numf1, kernel_size=self.fz1, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))

net.apply(weights_init)

    self.layer2 = nn.Sequential(nn.Conv3d(self.numf1,self.numf2, kernel_size=self.fz2, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))

net.apply(weights_init)

    self.fc1 = nn.Linear(3072, self.nn2) ##3027
    self.drop_out1 = nn.Dropout(0.5)
    self.relu1 = nn.ReLU() 
    self.fc2 = nn.Linear( self.nn2, self.nn3) 
    self.drop_out2 = nn.Dropout(0.5)
    self.relu2 = nn.ReLU()                          
    self.fc3 = nn.Linear( self.nn3, 1) 

def forward(self, x):
  x=x.unsqueeze(1).float()
  out = self.layer1(x)
 out = self.layer2(out)
  out = out.view(out.size(0), -1)
    out = self.fc1(out)
    out=self.drop_out1(out)
    out=self.relu1(out)
   out = self.fc2(out)
    out=self.drop_out2(out)
    out=self.relu2(out)  
    out = self.fc3(out)
    return out

Your format is a bit broken, but it seems you are trying to call net.apply inside the model definition?

PS: you can add code snippets by wrapping them in three backticks ``` :wink:

Yes, I use ( net.apply(weights_init)) after each CNN layer.

    with torch.no_grad():
        if isinstance(m, nn.Conv3d):
            torch.nn.init.normal_(m.weight)
            torch.nn.init.normal_(m.bias)
class ConvNetRedo1(nn.Module):
    def __init__(self,numf1,numf2,fz1,fz2,nn2,nn3): # numf1( nnumber of filters first layer)numf2(nnumber of filters first layer)),fz1 kernel size(),fz2,nn2,nn3
        super(ConvNetRedo1, self).__init__()
        self.numf1=numf1
        self.numf2=numf2
        self.fz1=fz1
        self.fz2=fz2
        self.nn2=nn2
        self.nn3=nn3
self.layer1 = nn.Sequential(nn.Conv3d(1, self.numf1, kernel_size=self.fz1, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))
net.apply(weights_init)
self.layer2 = nn.Sequential(nn.Conv3d(self.numf1,self.numf2, kernel_size=self.fz2, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))
net.apply(weights_init)
  self.fc1 = nn.Linear(3072, self.nn2) 
  self.drop_out1 = nn.Dropout(0.5)
  self.relu1 = nn.ReLU() 
  self.fc2 = nn.Linear( self.nn2, self.nn3) 
    self.drop_out2 = nn.Dropout(0.5)
   self.relu2 = nn.ReLU()               
    self.fc3 = nn.Linear( self.nn3, 1) ```
Is it correct?

Ah OK, you don’t need to call this method after each layer.
Just initialize your model and call it once via model.apply(weight_init) as shown in my example.
model.apply will recursively pass each module (and submodule, …) to the passed function.

Sth Like that?

````` def weights_init(m):

with torch.no_grad():

if isinstance(m, nn.Conv3d):

torch.nn.init.normal_(m.weight)

torch.nn.init.normal_(m.bias)

model = ConvNetRedo1(32,64,(7,7,5),(5,5,3),500,100)

model.apply(weight_init)

model=model.cuda

for epoch in range(num_epochs):

for i, data in enumerate(trainloader,0):

images, labels=data

optimizer.zero_grad()

outputs= model(images)````

Yes, that looks generally right.
You should call model = model.cuda() as a method (with parentheses), but the initialization work flow looks correct.

PS: It looks like you are adding each line in the format ticks `. Just add ``` before and after the complete code block (or use the “Preformatted text” button. :wink:

I really appreciate your help :slight_smile:

Excuse me what is (net.apply(weights_init)) in your code. I did not use it . is it correct? Or I missed it?

   with torch.no_grad():
       if isinstance(m, nn.Conv3d):
           torch.nn.init.normal_(m.weight)
           torch.nn.init.normal_(m.bias)

net.apply(weights_init)