High dimensional data, large-scale data, imaging and manifold data are all fostering new frontiers of statistics. These type of data are commonly considered in Functional Data Analysis where they are viewed as infinite-dimensional random vectors in a functional space. The rapid development of new technologies has generated a flow of complex data that have led to the development of new modeling strategies by scientists. In this paper, we basically deal with the problem of clustering a set of complex functional data into homogeneous groups. Working in a mixture model-based framework, we develop a flexible clustering technique achieving dimensionality reduction schemes through an L1 penalization. The proposed procedure results in an integrated modelling approach where shrinkage techniques are applied to enable sparse solutions in both the means and the covariance matrices of the mixture components, while preserving the underlying clustering structure. This leads to an entirely data-driven methodology suitable for simultaneous dimensionality reduction and clustering. The proposed methodology is evaluated through a Monte Carlo simulation study and an empirical analysis of real-world datasets showing different degrees of complexity.

Penalizedmodel-based clustering of complex functional data

Nicola Pronello;Luigi Ippoliti;
2023-01-01

Abstract

High dimensional data, large-scale data, imaging and manifold data are all fostering new frontiers of statistics. These type of data are commonly considered in Functional Data Analysis where they are viewed as infinite-dimensional random vectors in a functional space. The rapid development of new technologies has generated a flow of complex data that have led to the development of new modeling strategies by scientists. In this paper, we basically deal with the problem of clustering a set of complex functional data into homogeneous groups. Working in a mixture model-based framework, we develop a flexible clustering technique achieving dimensionality reduction schemes through an L1 penalization. The proposed procedure results in an integrated modelling approach where shrinkage techniques are applied to enable sparse solutions in both the means and the covariance matrices of the mixture components, while preserving the underlying clustering structure. This leads to an entirely data-driven methodology suitable for simultaneous dimensionality reduction and clustering. The proposed methodology is evaluated through a Monte Carlo simulation study and an empirical analysis of real-world datasets showing different degrees of complexity.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11564/822974
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact