CNN to recognize hand-face interaction


I’m currently involved in a project that aims at preventing risky hand-face interactions (skin touching, eye rubbing…). But we also want to distinguish these type of interactions from interaction like teeth brushing, eating, glasses adjustment…
Our proposed method is to use wearable devices and to test different machine learning model for this application.
For my part I use a wristband with an IMU to use the data from an accelerometer and a gyroscope. I want to classify the data in 12 classes of different movement type. The data will consist of a set of time series (the feautures could be 6-axis accelerometer and gyroscope, maybe the euler angles and the FFT of these signals) on a population of 12 people with at least 10 examples per class per people.
I need to design a CNN model, train it and evaluate it but I don’t have much experience with pytorch and deep learning. How do we design a model for time-series classification (number of conv1d layer, activation function, dropout…)? How do we chose all the needed parameters (kernel size, batch size, number of epoch, number of filters…)?

Thank you for your help