Sorry for the title, but I didn’t know how else to say this. I am using PyTorch Lightning modules and building a classifier model using Transformers on tabular data. Here is the main training loop:
def training_step(self, batch, batch_idx):
tokens = batch['tokens']
y = batch['label']
mask = batch['mask']
# x = self.base_model(tokens, mask)
x = self.features_embed(tokens)
x = self.encoder(x, src_key_padding_mask=mask)
x = self.linear(x).mean(axis=1).squeeze(1)
loss = F.binary_cross_entropy_with_logits(input=x,
target=y)
self.log('train_loss', loss)
return loss
The base model commented out here is defined in the init:
class LitModelWithCategoryEmbeddings(pl.LightningModule):
def __init__(self,
num_tokens: int,
num_categories: int,
dim_model: int = 96,
dim_h: int = 128,
n_head: int = 1,
dropout: float = 0.1,
activation: str = 'relu',
num_layers: int = 2,
lr: float = 1e-3):
"""
:param num_tokens:
:param dim_model:
:param dim_h:
:param n_head:
:param dropout:
:param activation:
:param num_layers:
"""
super().__init__()
self.base_model = LitModel(
num_tokens=num_tokens,
dim_model=dim_model,
dim_h=dim_h,
num_layers=num_layers,
n_head=n_head
)
summary(self.base_model)
self.features_embed = torch.nn.Embedding(num_embeddings=num_tokens,
embedding_dim=dim_model)
self.categories_embed = torch.nn.Embedding(num_embeddings=num_categories,
embedding_dim=dim_model)
encoder_layer = torch.nn.TransformerEncoderLayer(d_model=dim_model,
nhead=n_head,
dim_feedforward=dim_h,
dropout=dropout,
activation=activation,
batch_first=True)
self.encoder = torch.nn.TransformerEncoder(encoder_layer=encoder_layer,
num_layers=num_layers)
self.linear = torch.nn.Linear(in_features=dim_model, out_features=1)
self.lr = lr
self.valid_auc = AUROC(dist_sync_on_step=True)
self.test_auc = AUROC(dist_sync_on_step=True)
self.save_hyperparameters()
The base model is another class that I was using for pretraining, but I’m trying to write similar functionality without using any pre-training, and so I created a new class. When I run this code with the base model defined in the init function, I get the AUC in the validation set that I am expecting.
The part that is driving me insane is that if I simply comment out the defining of self.base_model
in the init, the AUC significantly decreases:
super().__init__()
# self.base_model = LitModel(
# num_tokens=num_tokens,
# dim_model=dim_model,
# dim_h=dim_h,
# num_layers=num_layers,
# n_head=n_head
# )
# summary(self.base_model)
It makes no sense to me why this snippet of code is affecting the model performance in any way because it is not being used at all in either the training loop or the validation loop. What on earth could be going on here?