Node classification on graph data is an important problem in many real-world applications. However, it requires labels for training, which can be difficult or expensive to obtain in practice. Consequently, typically only a small fraction of the accessible data is labeled. Recognizing this limitation, we consider the problem of spreading the labels from a small carefully chosen set of labeled data, also referred to as seeds, to a larger set of unlabeled data. Based on the common graph smoothness assumption, we cast this classification problem within the semi-supervised learning framework and propose a graph sampling design strategy for the seeds to improve the performance of the well-known label propagation algorithm. In particular, we show that more accurate predictions can be achieved if the seeds are “optimally” spread over the graph by means of a space-filling design, a sampling strategy particularly suited in cases in which no other attributes are available on the nodes. Both theoretical results and competitive experimental results on a variety of simulations and a real-world dataset demonstrate the effectiveness of the proposed methodology.

A space-filling sampling approach for collective classification of social media data

Gobbo, Emiliano del;Fontanella, Lara
;
Ippoliti, Luigi;Zio, Simone Di;Fontanella, Sara;Cucco, Alex
2026-01-01

Abstract

Node classification on graph data is an important problem in many real-world applications. However, it requires labels for training, which can be difficult or expensive to obtain in practice. Consequently, typically only a small fraction of the accessible data is labeled. Recognizing this limitation, we consider the problem of spreading the labels from a small carefully chosen set of labeled data, also referred to as seeds, to a larger set of unlabeled data. Based on the common graph smoothness assumption, we cast this classification problem within the semi-supervised learning framework and propose a graph sampling design strategy for the seeds to improve the performance of the well-known label propagation algorithm. In particular, we show that more accurate predictions can be achieved if the seeds are “optimally” spread over the graph by means of a space-filling design, a sampling strategy particularly suited in cases in which no other attributes are available on the nodes. Both theoretical results and competitive experimental results on a variety of simulations and a real-world dataset demonstrate the effectiveness of the proposed methodology.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11564/876255
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact