Choosing the correct design for a cnn model

I am having a problem designing and choosing the correct model for my project. I will appreciate any insight or suggestions.

I am building an autoencoder for a proteins sequence. I want to represent my protein sequence as a sequence of one-hot vectors of size 20.
For a sequence of length N, I will start with an Nx20 matrix.
[[1, 0, 0, 0, …, 0],
[0, 0, 1, 0, …, 0],
[0, 0, 0, 0, …, 1],

[0, 1, 0, 0, …, 0]]
then I want each one-hot vector to pass through a small single-layer NN (20x4). This NN will be learning how to map each amino acid to a 4D space (given my problem space).
It results in an Nx4 output matrix.
The Nx4 input will then be flattened to enter a CNN layer (or a fully connected NN layer), shrink into a 1x10 encoding, and continue to the decoder and end up with the Nx20 one-hot vectors.
I can’t visualize the process to understand how I would design the network. Is there any online source to visually build a custom network and receive the architecture design details?
I have problems coming up with network design details on the fly, such as the matching dimensions, and many visual guides I find are incomplete or confusing.