Capturing a diverse range of opinions, sentiments, and topics is essential when selecting training data for statistical and machine learning models, particularly those that require interpretability. Online comments offer a valuable source of public opinion, but they often present a skewed representation, with opposing viewpoints being overrepresented compared to supportive ones. This imbalance can lead to biased models that reinforce stereotypes and reduce fairness and utility. The goal is to ensure that a broad spectrum of opinions and sentiments is reflected in the data, helping mitigate bias and providing a more comprehensive dataset for training. By doing so, we can develop fairer, more transparent models that are better suited for analysing complex social issues. To achieve this, it is crucial to employ effective sampling techniques, such as space-filling sampling on networks, that ensure thorough coverage of various topics and sentiments in online discussions. We will demonstrate this methodology with a simulated case study, and analysing social media comments focusing on online debate around migration. Considering the limitations of existing Italian lexical resources, we will introduce a novel sampling technique that ensures both topic and sentiment are adequately represented in the corpus, enhancing its overall reliability and breadth.

Covering the Online Spectrum of Opinion in Social Context: The Benefit of Network Node Sampling Through an Italian Case Study

Cucco, Alex
;
del Gobbo, Emiliano;Fontanella, Lara;Fontanella, Sara;Ippoliti, Luigi
2025-01-01

Abstract

Capturing a diverse range of opinions, sentiments, and topics is essential when selecting training data for statistical and machine learning models, particularly those that require interpretability. Online comments offer a valuable source of public opinion, but they often present a skewed representation, with opposing viewpoints being overrepresented compared to supportive ones. This imbalance can lead to biased models that reinforce stereotypes and reduce fairness and utility. The goal is to ensure that a broad spectrum of opinions and sentiments is reflected in the data, helping mitigate bias and providing a more comprehensive dataset for training. By doing so, we can develop fairer, more transparent models that are better suited for analysing complex social issues. To achieve this, it is crucial to employ effective sampling techniques, such as space-filling sampling on networks, that ensure thorough coverage of various topics and sentiments in online discussions. We will demonstrate this methodology with a simulated case study, and analysing social media comments focusing on online debate around migration. Considering the limitations of existing Italian lexical resources, we will introduce a novel sampling technique that ensures both topic and sentiment are adequately represented in the corpus, enhancing its overall reliability and breadth.
2025
Lecture Notes in Computer Science
Paszynski, M., Barnard, A.S., Zhang, Y.J.
Inglese
Workshops on Computational Science, which were co-organized with the 25th International Conference on Computational Science, ICCS 2025
2025
sgp
Internazionale
STAMPA
LECTURE NOTES IN COMPUTER SCIENCE
15907
60
67
8
9783031975530
9783031975547
Springer Science and Business Media Deutschland GmbH
Network sampling; Online content selection; Opinion mining; Space-filling designs
none
Cucco, Alex; Del Gobbo, Emiliano; Fontanella, Lara; Fontanella, Sara; Ippoliti, Luigi
273
info:eu-repo/semantics/conferenceObject
5
4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
   Identification and Critical Analysis of Online Racism and Xenophobia against (Im)migrants and Roma people
   TOLERANT
   M.U.R. - Ministero dell'Università e della Ricerca
   P2022APKJL
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11564/876373
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact