Graph reconstruction using autoencoder

I am trying to build a GNN model, using PyG, to generate embeddings for a list of graphs to be used for a further downstream task like unsupervised clustering. The overall task is similar to using methods like graph2vec to generate whole graph embeddings. However, in my case, I also have node attributes and optionally edge weight and/or attributes. My idea was to train a model that could take into consideration the graph structure, the node attributes and if available the edge weights/attributes to generate the graph embeddings. Since I do not have any labels for the graphs, I decided to use an autoencoder approach to guide the training process. I observed that the decoder is unable to correctly reconstruct the graph structure particularly.

Networks:: Directed networks with ~1500 nodes and an average density of ~0.05. All the networks contain the same nodes but the edges differ. The nodes contain three attributes (one positive continuous and two binary). The assortativity coefficient is close to 0 (~0-0.2).

Encoder:: Intakes the edge indices and node attributes. Message passing occurs through three layers of GATConv before using AttentionalAggregation to convert the node level embeddings to graph level-embeddings. After each convolution layers ELU activation is used followed by dropout.

Decoder:: For decoding the structure, the graph-level embeddings are first decoded to node level embeddings using a simple two-layered feedforward NN with ReLU activation followed by using another similar feedforward NN to obtain the edge logits for all possible edges (adjacency matrix). For decoding the node attributes, I directly pass the graph-level embeddings through a similar two-layered feedforward NN. I have previously tried the InnerProductDecoder for decoding the structure but that dis not work well.

Losses: For structure, calculating the BCELossWithLogits on the whole adjacency matrix (used pos_weight to account for the sparse network). For the node attributes using the MSE loss. The sum is sent backward.

With the above setting, I observed that I reached an AUC of ~0.65 on the adjacency matrix reconstruction. As a cross-check I also calculated the MSE loss on the node embeddings from the encoder and the decoder which seems to be low.

Questions:
(1) How can I improve the graph reconstruction when my input network has a very less density?
(2) Do I need to be cautious about heterophily and use a heterophily aware model?
(3) How relevant should the node attributes be in predicting the edge? Is it necessary for GNN to have node attributes that are capable of edge prediction?

I am new into the domain of neural networks. Any help will be appreciated.

[Note: Replicate query on Stack Overflow]