The left side is their model and the right is mine.
I am using the code seen below
class NetworkV1_5(nn.Module):#http://cs230.stanford.edu/projects_spring_2019/reports/18681590.pdf Take 2
def __init__(self, base, num_classes):#Define the layers
self.base = base
self.base.classifier = nn.Sequential(
nn.Linear(in_features=25088, out_features=512, bias=True),
def forward(self, x):
fc = self.base(x)
The paper I quoted above says that the first fully connected layer is dropped from VGG16. Then the second layers dimensions are decreased to 512. A dropout later is then added and then the last layer with 196 outputs(Number of fine-grain annotations int he dataset)
Why are you using nn.AdaptiveAvgPool2d to get to spatial dims of 7x7?
I see the Keras model using maxpooling to bring 14x14 down to 7x7.
Apart from that, the rest seems fine. It’s better to start training your model and see how well it is performing on the benchmarking dataset. On a side note, it is still better to have a network architecture that deviates from the one mentioned in the paper. You may get better results, you never know