Convolutional architecture to pick up this visual artefact

How many layers and what kernel size and stride would you suggest to pick up the size of the two dark areas here?

image