End-to-end semantic joint detection and limb-pose estimation from depth images of preterm infants in NICUs

Carbonari, Matteo; Vallasciani, Greta; Migliorelli, Lucia; Frontoni, Emanuele; Moccia, Sara

doi:10.1109/iscc53001.2021.9631261

Continuous evaluation of preterm infants' spontaneous motility proved to be a decisive tool for timely diagnosing the presence of neurodevelopmental disorders. Automatic infants' limbs pose estimation is a powerful ally to support clinicians in infant's monitoring. This work proposes an end-to-end pipeline for limb-pose estimation based on a region-based convolutional neural network, named Mask R-CNN. The framework was validated on a custom dataset of 6000 depth images from 30 videos of 19 preterm infants acquired in a neonatal intensive care unit during the actual clinical practice. Leave-one-infant-out cross-validation was performed to evaluate the framework performance. Results for joints' detection showed a mean average precision equal to 0.9 with a standard deviation of 0.2. For limb-pose estimation, median root mean square error [pixel] was equal to 6.8 (right arm), 6.7 (left arm), 6.5 (right leg), 6.5 (left leg). The interquartile ranges [pixels] were 1.1, 1.2, 0.6, 1.2 for each limb, respectively. This end-to-end framework represents a step toward embedded monitoring solutions for on-the-edge computation.