Thanks a lot for your replies @ptrblck and @superunification. It is clear to me now i was doing it in an overly simplistic way without mimicing the calculations of the pool/conv layers in my NAS, etc.
The way I am doing it now is the following. I use this genetic algorithm as my NAS routine, and I feed a very long vector hyperparameters__
as an input. Then I retrieve from the vector the number pf layers, kernel sizes, when to do pooling layers etc. It looks roughly like this:
N_conv_max__ = int(hyperparameters__[-2]) # the number of conv-layers
N_fc_max__ = int(hyperparameters__[-1]) # the number of fc-layers
conv_layer_out_channels = hyperparameters__[0*N_conv_max:0*N_conv_max+N_conv_max__]
conv_layer_kernel_sizes = hyperparameters__[1*N_conv_max:1*N_conv_max+N_conv_max__]
conv_layer_strides = hyperparameters__[2*N_conv_max:2*N_conv_max+N_conv_max__]
conv_layer_paddings = hyperparameters__[3*N_conv_max:3*N_conv_max+N_conv_max__]
max_pool_layer_numbers = hyperparameters__[4*N_conv_max:4*N_conv_max+N_conv_max__]
max_pool_kernel_sizes = hyperparameters__[5*N_conv_max:5*N_conv_max+N_conv_max__]
max_pool_strides = hyperparameters__[6*N_conv_max:6*N_conv_max+N_conv_max__]
max_pool_paddings = hyperparameters__[7*N_conv_max:7*N_conv_max+N_conv_max__]
batch_norm_2d_layer_numbers = hyperparameters__[8*N_conv_max:8*N_conv_max+N_conv_max__]
linear_layer_out_features = hyperparameters__[9*N_conv_max:9*N_conv_max+N_fc_max__]
batch_norm_1d_layer_numbers = hyperparameters__[9*N_conv_max+N_fc_max:9*N_conv_max+N_fc_max+N_fc_max__]
where N_conv_max
is the maximum possible number of conv-layers that the genetic algorithm can try, and N_fc_max
is the maximum possible number of fc-layers that the genetic algorithm can try. I set them to be 15 and 2 correspondingly.
Then, I use the restricting intervals for each of the parameter (i.e. element in hyperparameters__
) to tell the genetic algorithm to sample values only from those intervals. These are the intervals I use (including the rest of the NAS routine):
N_dims = N_conv_max*9+N_fc_max*2+2 # last 2 for N_conv_max and N_fc_max values
# 1 conv_layer_out_channels
a1 = [[3,7]]+[[10,20]]+[[10,32]]+[[22,41]]+[[21,50]]+[[30,60]]+[[30,70]]+[[30,80]]+\
[[40,90]]+[[40,100]]+[[50,110]]+[[50,120]]+[[50,130]]+[[60,140]]+[[60,150]]
# 2 conv_layer_kernel_sizes
a2 = [[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]+\
[[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]+[[3,9]]
# 3 conv_layer_strides
a3 = [[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]+\
[[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]+[[1,5]]
# 4 conv_layer_paddings
a4 = [[0,1]]+[[3,5]]+[[5,10]]+[[5,10]]+[[5,10]]+[[5,10]]+[[5,10]]+[[5,10]]+\
[[5,10]]+[[5,10]]+[[5,10]]+[[5,10]]+[[5,10]]+[[5,10]]+[[5,10]]
# 5 max_pool_layer_numbers
a5 = [[0,0]]+[[0,1]]+[[0,0]]+[[0,1]]+[[0,0]]+[[0,1]]+[[0,0]]+[[0,1]]+[[0,0]]+\
[[0,1]]+[[0,0]]+[[0,1]]+[[0,0]]+[[0,1]]+[[0,0]]
# 6 max_pool_kernel_sizes
a6 = [[4,4]]*N_conv_max
# 7 max_pool_strides
a7 = [[1,2]]*N_conv_max
# 8 max_pool_paddings
a8 = [[1,2]]*N_conv_max
# 9 batch_norm_2d_layer_numbers
a9 = [[0,1]]*N_conv_max
# 10 linear_layer_out_features
a10 = [[1,1500]]*N_fc_max
# 11 batch_norm_1d_layer_numbers
a11 = [[0,1]]*N_fc_max
# 12
a12 = [[1,N_conv_max]]
# 13
a13 = [[1,N_fc_max]]
varbound=np.array(a1+a2+a3+a4+a5+a6+a7+a8+a9+a10+a11+a12+a13)
vartype=np.array([['int']]*N_dims)
algorithm_param = {'max_num_iteration': 1000,\
'population_size':100,\
'mutation_probability':0.1,\
'elit_ratio': 0.01,\
'crossover_probability': 0.5,\
'parents_portion': 0.3,\
'crossover_type':'uniform',\
'max_iteration_without_improv':None}
model=ga(function=function_to_optimize,\
dimension=N_dims,\
variable_type_mixed=vartype,\
function_timeout=3600,\
variable_boundaries=varbound,\
algorithm_parameters=algorithm_param)
model.run()
convergence=model.report
solution=model.output_dict
plt.plot(convergence)
plt.legend([solution['function']])
Finally, I use the cost-value at the last epoch as the optimisation criterion for the genetic algorithm.
Could you guys @superunification @ptrblck comment on this approach and what specific NAS-framework would you recommend?