DiffSBDD generates very small molecules after replacing EGNN with HybridDynamics

Title:
DiffSBDD generates very small molecules after replacing EGNN with HybridDynamics
Post:
Hi, I am working on a modified version of DiffSBDD using the CrossDocked dataset, and I replaced the original EGNN dynamics with a hybrid architecture that combines EGNN and a Graph Transformer.
My current issue is that the model tends to generate very small molecules, often much smaller than expected, even though training runs without a complete crash.
Here is a summary of my setup:
Dataset: CrossDocked
Mode: conditional / joint
Pocket representation: full-atom
Modified dynamics: HybridDynamics instead of the original EGNN
Key settings:
hidden_nf = 128
n_layers = 5
joint_nf = 32
attention = True
augment_noise = 0 or 0.001
What I observe:
Generated ligands are often unrealistically small
Sometimes training is unstable
I previously also encountered NaNs / memory issues in some runs
The issue became more noticeable after introducing the hybrid EGNN + Transformer dynamics
What I would like to understand:
Could this behavior indicate a problem in the node-count prior or size_distribution.npy used during sampling?
Could the hybrid dynamics be causing underfitting or collapse toward trivial small molecules?
Is there a recommended way to verify whether the model is learning the ligand size distribution correctly?
Are there specific parts of DiffSBDD generation or reconstruction that I should inspect first?
We would also appreciate guidance on which training setting is generally more preferable for this task: conditional (cond) or joint (joint). In our experiments, the difference in performance was relatively small, but we are interested in understanding which formulation is typically considered more effective or more stable in practice.
My suspicion is that the issue may be caused by one of the following:
incorrect node-count distribution
unstable training
mismatch between the original EGNN setup and the modified hybrid architecture
insufficient convergence
If anyone has faced a similar issue with DiffSBDD or diffusion-based molecular generation, I would really appreciate your suggestions.