The significance of emotional assessment has gained increasing recognition across diverse fields, such as psychology, healthcare, education, and social sciences. It is deemed crucial for comprehending and addressing a broad spectrum of outcomes, including mental health, academic performance, patient experiences, and social acceptance. A noteworthy aspect of human emotions lies in their expression through a diverse range of vocal sounds, which can be interpreted and understood by the listener. From this perspective, the application of Mel Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) emerges as a brilliant approach to portraying the concealed vocal components that express emotional states. In this study, we employed Machine Learning (ML) techniques to leverage MFCCs and GTCCs, aiming to construct a classifier capable of assessing different emotions. We utilized the freely available Toronto Emotional Speech Set (TESS), a dataset comprising vocal recordings of two actresses repeating 200 words in seven distinct emotions. Furthermore, we introduced an age-related analysis to gain insights into how age might impact the human capacity to express emotions through vocal expression. Results obtained showed an accuracy of 99.6% in the emotional assessment for vocal recordings of both actresses employing GTCCs. Although the study is focused on the investigation of only two individuals, it represents an initial step in understanding the potential influence of age on emotional expression through vocalization. Furthermore, the age-related studies presented a 100% accuracy in the emotional assessment restricted to the vocal samples from the younger actress, compared to the 98.6% obtained in the one restricted to the older actress. These findings suggest that the age could influence the emotional expression.
Unveiling Age-Related Patterns in Vocal Expression of Emotions: A Machine Learning Approach with Mel and Gammatone Frequency Cepstral Coefficients
Perpetuini D.
;Cardone D.;Merla A.
2024-01-01
Abstract
The significance of emotional assessment has gained increasing recognition across diverse fields, such as psychology, healthcare, education, and social sciences. It is deemed crucial for comprehending and addressing a broad spectrum of outcomes, including mental health, academic performance, patient experiences, and social acceptance. A noteworthy aspect of human emotions lies in their expression through a diverse range of vocal sounds, which can be interpreted and understood by the listener. From this perspective, the application of Mel Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) emerges as a brilliant approach to portraying the concealed vocal components that express emotional states. In this study, we employed Machine Learning (ML) techniques to leverage MFCCs and GTCCs, aiming to construct a classifier capable of assessing different emotions. We utilized the freely available Toronto Emotional Speech Set (TESS), a dataset comprising vocal recordings of two actresses repeating 200 words in seven distinct emotions. Furthermore, we introduced an age-related analysis to gain insights into how age might impact the human capacity to express emotions through vocal expression. Results obtained showed an accuracy of 99.6% in the emotional assessment for vocal recordings of both actresses employing GTCCs. Although the study is focused on the investigation of only two individuals, it represents an initial step in understanding the potential influence of age on emotional expression through vocalization. Furthermore, the age-related studies presented a 100% accuracy in the emotional assessment restricted to the vocal samples from the younger actress, compared to the 98.6% obtained in the one restricted to the older actress. These findings suggest that the age could influence the emotional expression.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.