Model inference speed is slow

Hi guys I’m trying to build a web application for brain segmentation using a CNN model built with 3D MRI brain scans. The model data is stored in S3. However, it takes too long for model inference while I test uploading new brain images for segmentation tasks, any ideas to accelerate this?