Top 8 Cybersecurity Datasets For Your Next Machine Learning Project



Machine learning techniques play a critical role in detecting serious threats in the network. A good dataset helps create robust machine learning systems to address various network security problems, malware attacks, phishing, and host intrusion. For instance, the real-world cybersecurity datasets will help you work in projects like network intrusion detection system, network packet inspection system, etc, using machine learning models. 

Here is a list of the 8 top cybersecurity datasets you can use for your next machine learning project.

(The list is in no particular order)

1| ADFA Intrusion Detection Datasets

About: The ADFA Intrusion Detection Datasets are designed for the evaluation by system call based HIDS. The datasets cover both Linux and Windows and help in detecting anomaly-based intrusions on both Linux and Windows. The datasets are used as a benchmarking for traditional Host Based Intrusion Detection System (HIDS).



Know more here.

2| ISOT Botnet and Ransomware Detection Datasets

About: The ISOT Botnet dataset is a combination of several existing publicly available malicious and non-malicious datasets. The ISOT Ransomware Detection dataset consists of over 420 GB of ransomware and benign programmes execution traces. The ISOT HTTP botnet dataset comprises two traffic captures: malicious DNS data for nine different botnets and benign DNS for 19 different well-known software applications.

Know more here.

3| FakeNewsNet

About: FakeNewsNet is a fake news data repository, which contains two comprehensive datasets with diverse features in news content, social context, and spatiotemporal information. The dataset is constructed using an end-to-end system called FakeNewsTracker. The data repository can boost the study of various open research problems related to fake news study.

Know more here.

4| Malicious URLs Dataset

About: The Malicious URLs dataset consists of about 2.4 million URLs (examples) and 3.2 million features. The datasets are available in two types, Matlab and SVM-light. In Matlab format, the file url.mat contains FeatureTypes, a list of column indices for the data matrices that are real-valued features. In SVM-light format, the FeatureTypes is a text file list…

Source…