netML Malware Detection
Dataset Overview
This dataset focuses on detecting malware flows in network traffic through routers. The dataset is created for malware detection task by obtaining 30 out of more than 300 raw traffic data from Stratosphere IPS. While the original dataset was released as features for the netML challenge, the dataset below represents a faithful recreation of the netML malware dataset at the raw network packet level. Although the dataset doesn’t perfectly match the original netML challenge, it is used in the nPrint paper and to generate the nPrint results in the leaderboard.
Task Description
The task is to identify the various malware and benign traffic flows in the dataset.
Links and Facts
- Dataset Link: Google Drive
- Dataset Size (Uncompressed): < 10 GB
- Disallowed Features: None
- Number of Classes: 2 (easy), 19 (hard)
- pcapML Metadata Comment Format:
sampleID,easylabel_hardlabel
- Protocols: IPv4, TCP, UDP
- Metric to Optimize: Balanced Accuracy
Special Dataset Notes
Any special dataset notes here.
Citation(s)
original netML release
@article{barut2020netml,
title={Netml: A challenge for network traffic analytics},
author={Barut, Onur and Luo, Yan and Zhang, Tong and Li, Weigang and Li, Peilong},
journal={arXiv preprint arXiv:2004.13006},
year={2020}
}
nPrint release
@inproceedings{10.1145/3460120.3484758,
author = {Holland, Jordan and Schmitt, Paul and Feamster, Nick and Mittal, Prateek},
title = {New Directions in Automated Traffic Analysis},
year = {2021},
isbn = {9781450384544},
url = {https://doi.org/10.1145/3460120.3484758},
doi = {10.1145/3460120.3484758},
series = {CCS '21}
}