Within the branch of Software-Defined Networking (SDN), research in Cyber Security has underscored the pressing need to combat cyber-attacks. These crimes include the unauthorized access and manipulation of critical data, jeopardizing user confidentiality, authenticity, and system integrity. To address these challenges, the deployment of Intrusion Detection Systems (IDS) has become paramount. These systems play a crucial role in safeguarding both the SDN infrastructure and its users. IDSs operate much like classification systems, making them suitable for the application of machine learning techniques in identifying intrusions. These techniques rely on labeled datasets to train the system to differentiate between benign and malicious events based on various features. Once trained, the system can categorize new events as benign or malicious. Therefore, identifying which features are relevant for classification purposes is crucial. In the current literature, few studies have focused on the effectiveness of IDSs applied to SDNs. The performance evaluation of IDSs based on machine learning techniques within SDN environments involves the development of specialized datasets, comprising network traffic features essential for discerning attack patterns. Moreover, as the landscape of network attacks within SDN evolves, there arises a need for continuously updated datasets to evaluate IDS effectiveness. This paper aims to investigate which features are relevant to detect the most common attack types in an SDN. To do this, labeled datasets of network traffic in an SDN must be available. Unfortunately, to the best of our knowledge, there is only one publicly available dataset for SDN traffic: InSDN. In this paper, we present the result of a feature selection process on the InSDN dataset, based on the SHAP toolset, aimed at identifying the most relevant features for different types of attacks. We also compare the performances of different classification algorithms trained on both the full dataset and the reduced one, showing that, for many attack types, the classifiers performances are comparable.
Feature selection in ML-based SDN intrusion detection system
Morbidoni C.;
2024-01-01
Abstract
Within the branch of Software-Defined Networking (SDN), research in Cyber Security has underscored the pressing need to combat cyber-attacks. These crimes include the unauthorized access and manipulation of critical data, jeopardizing user confidentiality, authenticity, and system integrity. To address these challenges, the deployment of Intrusion Detection Systems (IDS) has become paramount. These systems play a crucial role in safeguarding both the SDN infrastructure and its users. IDSs operate much like classification systems, making them suitable for the application of machine learning techniques in identifying intrusions. These techniques rely on labeled datasets to train the system to differentiate between benign and malicious events based on various features. Once trained, the system can categorize new events as benign or malicious. Therefore, identifying which features are relevant for classification purposes is crucial. In the current literature, few studies have focused on the effectiveness of IDSs applied to SDNs. The performance evaluation of IDSs based on machine learning techniques within SDN environments involves the development of specialized datasets, comprising network traffic features essential for discerning attack patterns. Moreover, as the landscape of network attacks within SDN evolves, there arises a need for continuously updated datasets to evaluate IDS effectiveness. This paper aims to investigate which features are relevant to detect the most common attack types in an SDN. To do this, labeled datasets of network traffic in an SDN must be available. Unfortunately, to the best of our knowledge, there is only one publicly available dataset for SDN traffic: InSDN. In this paper, we present the result of a feature selection process on the InSDN dataset, based on the SHAP toolset, aimed at identifying the most relevant features for different types of attacks. We also compare the performances of different classification algorithms trained on both the full dataset and the reduced one, showing that, for many attack types, the classifiers performances are comparable.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.