What do DQN or Actor-Critic learn for their CNN's?

For example I could understand the loss from an Actor would learn to change the probability of an action at fully-connected layer, but at convolutional layer? What kind of features does it learn?

It is exactly the same idea, but instead of a fully connected on the whole input, it is several times the same small fully-connected on different parts of the input.

For instance, if your channels are 4 consecutive black/white images with a moving ball rolling on the x axis, you will probably learn a 3D convolution (time * x * y) that will look like an edge detector projected on (x,y), and something close to an identity projected on (time*x).