Capturing a diverse range of opinions, sentiments, and topics is essential when selecting training data for statistical and machine learning models, particularly those that require interpretability. Online comments offer a valuable source of public opinion, but they often present a skewed representation, with opposing viewpoints being overrepresented compared to supportive ones. This imbalance can lead to biased models that reinforce stereotypes and reduce fairness and utility. The goal is to ensure that a broad spectrum of opinions and sentiments is reflected in the data, helping mitigate bias and providing a more comprehensive dataset for training. By doing so, we can develop fairer, more transparent models that are better suited for analysing complex social issues. To achieve this, it is crucial to employ effective sampling techniques, such as space-filling sampling on networks, that ensure thorough coverage of various topics and sentiments in online discussions. We will demonstrate this methodology with a simulated case study, and analysing social media comments focusing on online debate around migration. Considering the limitations of existing Italian lexical resources, we will introduce a novel sampling technique that ensures both topic and sentiment are adequately represented in the corpus, enhancing its overall reliability and breadth.

Covering the Online Spectrum of Opinion in Social Context: The Benefit of Network Node Sampling Through an Italian Case Study

Cucco, Alex
;
del Gobbo, Emiliano;Fontanella, Lara;Fontanella, Sara;Ippoliti, Luigi
2025-01-01

Abstract

Capturing a diverse range of opinions, sentiments, and topics is essential when selecting training data for statistical and machine learning models, particularly those that require interpretability. Online comments offer a valuable source of public opinion, but they often present a skewed representation, with opposing viewpoints being overrepresented compared to supportive ones. This imbalance can lead to biased models that reinforce stereotypes and reduce fairness and utility. The goal is to ensure that a broad spectrum of opinions and sentiments is reflected in the data, helping mitigate bias and providing a more comprehensive dataset for training. By doing so, we can develop fairer, more transparent models that are better suited for analysing complex social issues. To achieve this, it is crucial to employ effective sampling techniques, such as space-filling sampling on networks, that ensure thorough coverage of various topics and sentiments in online discussions. We will demonstrate this methodology with a simulated case study, and analysing social media comments focusing on online debate around migration. Considering the limitations of existing Italian lexical resources, we will introduce a novel sampling technique that ensures both topic and sentiment are adequately represented in the corpus, enhancing its overall reliability and breadth.
2025
LECTURE NOTES IN COMPUTER SCIENCE
9783031975530
9783031975547
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11564/876373
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact