Transfer learning for informative-frame selection in laryngoscopic videos through learned features

Patrini, Ilaria; Ruperti, Michela; Moccia, Sara; Mattos, Leonardo S.; Frontoni, Emanuele; De Momi, Elena

doi:10.1007/s11517-020-02127-7

Narrow-band imaging (NBI) laryngoscopy is an optical-biopsy technique used for screening and diagnosing cancer of the laryngeal tract, reducing the biopsy risks but at the cost of some drawbacks, such as large amount of data to review to make the diagnosis. The purpose of this paper is to develop a deep-learning-based strategy for the automatic selection of informative laryngoscopic-video frames, reducing the amount of data to process for diagnosis. The strategy leans on the transfer learning process that is implemented to perform learned-features extraction using six different convolutional neural networks (CNNs) pre-trained on natural images. To test the proposed strategy, the learned features were extracted from the NBI-InfFrames dataset. Support vector machines (SVMs) and CNN-based approach were then used to classify frames as informative (I) and uninformative ones such as blurred (B), with saliva or specular reflections (S), and underexposed (U). The best-performing learned-feature set was achieved with VGG 16 resulting in a recall of I of 0.97 when classifying frames with SVMs and 0.98 with the CNN-based classification. This work presents a valuable novel approach towards the selection of informative frames in laryngoscopic videos and a demonstration of the potential of transfer learning in medical image analysis.Flowchart of the proposed approach to automatic informative-frame selection in laryngoscopic videos. The approach leans on the transfer learning process, which is implemented to perform learned-features extraction using different convolutional neural networks (CNNs) pre-trained on natural images. Frame classification is performed exploiting two different classifiers: support vector machines and fine-tuned CNNs.