Reproducing a study designed with TensorFlow

Hello!

I am trying to reproduce the result of this study. To be specific, I want to finetune DenseNet201 using the architecture and hyperparameters set in the paper.


To achieve this, I have this PyTorch code:

# Setup pretrained model with ImageNet's pretrained weights
weights = torchvision.models.DenseNet201_Weights.DEFAULT
densenet_model = torchvision.models.densenet201(weights=weights).to(device)

# Get the length of class_names (one output unit for each class)
output_shape = len(class_names)

# Modify the classifier
densenet_model.classifier = nn.Sequential(
    nn.Linear(in_features=1920, out_features=128, bias=True),
    nn.ReLU(inplace=True),
    nn.Dropout(p=0.2),
    nn.Linear(in_features=128, out_features=64, bias=True),
    nn.ReLU(inplace=True),
    nn.Dropout(p=0.3),
    nn.Linear(in_features=64, out_features=output_shape, bias=True)  # Number of output classes
)

# Redefine the forward pass
class CustomDenseNet201(nn.Module):
    def __init__(self, base_model):
        super(CustomDenseNet201, self).__init__()
        self.features = base_model.features
        self.pool = nn.AdaptiveAvgPool2d((1, 1))
        self.flatten = nn.Flatten()
        self.classifier = base_model.classifier

    def forward(self, x):
        x = self.features(x)
        x = self.pool(x)
        x = self.flatten(x)
        x = self.classifier(x)
        return x

# Instantiate the custom model
model = CustomDenseNet201(densenet_model)

# Move the entire model to the device
model = model.to(device)

However, the model is not learning anything (val acc: 0.15 avg). To confirm that the result is reproducible, I used a fraction of the dataset in TensorFlow and the accuracy (0.7 avg) showed that it’s learning.

# Setup pretrained model with ImageNet's pretrained weights
base_model = applications.DenseNet201(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(num_classes, activation='softmax')
])

I’m sure that I am missing something or doing something wrong.

Before running the training loop , do

model.train()

This will ensure that your model is in training mode ,

Thank you. I already did that in the training loop like this:

for epoch in range(num_epochs):
    epoch_start_time = time.time()  # Start time for the epoch
    
    running_loss = 0.0
    n_correct_train = 0
    n_samples_train = 0
    model.train()  # Set model to training mode
    for i, (images, labels) in enumerate(dataloaders['train']):
        images = images.to(device)
        labels = labels.to(device)
        ......

The training process seem to be working fine, but the model is not learning anything, and my guess is that it has something to do with the pooling or flattening layer.

Is it possible that model.train() is being reinitialized at every iteration of the training loop?

No , model.train() does not initialize values, should be some other reason

1 Like

The only difference I’m seeing after looking at this for a bit is that the TensorFlow version of the top of the architecture ends with a ReLU and the Torch version does not. Seems like a stretch but maybe that’s why?

Also what is your loss value doing over time? Is it going up and down or just staying in place?

1 Like

Thank you. I added ReLU to that of Torch, but it’s not changing the result. I think Torchinfo summary for DenseNet201 does not show some of the operations it performed, such as global average pooling and maybe even the final ReLU someone asked a question along this line and no concrete answer was given.

I cross-checked my data transformation layer, but I don’t think it’s responsible for the poor performance. At this point, I don’t know what else to do.

According to the authors, this is the transformation performed on the data:

And this is my replication:

mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])


data_transforms = {
    'train': transforms.Compose([
    transforms.RandomHorizontalFlip(),   # Horizontal reflection
    transforms.RandomVerticalFlip(),     # Vertical reflection
    transforms.RandomRotation(30),       # Rotation
    transforms.RandomResizedCrop(224),   # Rescale images and crop to size
    transforms.RandomAffine(degrees=0, translate=(0.2, 0.2)), # Width and height shifts
    transforms.ColorJitter(brightness=(0.9, 1.1)),  # Brightness adjustment
    transforms.ToTensor(),               # Convert images to Tensor
    transforms.Normalize(mean, std)  # Normalize to [0, 1] range
]),
    'val': transforms.Compose([
    transforms.Resize(size=(224,224)),
    transforms.ToTensor(),              
    transforms.Normalize(mean, std)   
]),
    'test': transforms.Compose([
    transforms.Resize(size=(224,224)),
    transforms.ToTensor(),              
    transforms.Normalize(mean, std)  
]),
}