A space-filling sampling approach for collective classification of social media data

Gobbo, Emiliano Del; Fontanella, Lara; Ippoliti, Luigi; Zio, Simone Di; Fontanella, Sara; Cucco, Alex

doi:10.1007/s11634-026-00670-z

Node classification on graph data is an important problem in many real-world applications. However, it requires labels for training, which can be difficult or expensive to obtain in practice. Consequently, typically only a small fraction of the accessible data is labeled. Recognizing this limitation, we consider the problem of spreading the labels from a small carefully chosen set of labeled data, also referred to as seeds, to a larger set of unlabeled data. Based on the common graph smoothness assumption, we cast this classification problem within the semi-supervised learning framework and propose a graph sampling design strategy for the seeds to improve the performance of the well-known label propagation algorithm. In particular, we show that more accurate predictions can be achieved if the seeds are “optimally” spread over the graph by means of a space-filling design, a sampling strategy particularly suited in cases in which no other attributes are available on the nodes. Both theoretical results and competitive experimental results on a variety of simulations and a real-world dataset demonstrate the effectiveness of the proposed methodology.