Penalizedmodel-based clustering of complex functional data

Pronello, Nicola; Ignaccolo, Rosaria; Ippoliti, Luigi; Fontanella, Sara

doi:10.1007/s11222-023-10288-2

High dimensional data, large-scale data, imaging and manifold data are all fostering new frontiers of statistics. These type of data are commonly considered in Functional Data Analysis where they are viewed as infinite-dimensional random vectors in a functional space. The rapid development of new technologies has generated a flow of complex data that have led to the development of new modeling strategies by scientists. In this paper, we basically deal with the problem of clustering a set of complex functional data into homogeneous groups. Working in a mixture model-based framework, we develop a flexible clustering technique achieving dimensionality reduction schemes through an L1 penalization. The proposed procedure results in an integrated modelling approach where shrinkage techniques are applied to enable sparse solutions in both the means and the covariance matrices of the mixture components, while preserving the underlying clustering structure. This leads to an entirely data-driven methodology suitable for simultaneous dimensionality reduction and clustering. The proposed methodology is evaluated through a Monte Carlo simulation study and an empirical analysis of real-world datasets showing different degrees of complexity.