netML VPN-nonVPN Application Identification
Dataset Overview
This dataset focuses on detecting applications in traffic flows. The dataset was created using a subset of the VPN-nonVPN dataset. While the original dataset was released as features for the netML challenge, the dataset below represents a faithful recreation of the netML malware dataset at the raw network packet level. Although the dataset doesn’t perfectly match the original netML challenge, it is used in the nPrint paper and to generate the nPrint results in the leaderboard.
Task Description
The task is to identify the application of each traffic flow sample in the dataset.
Links and Facts
- Dataset Link: Google Drive
- Dataset Size (Uncompressed): < 1 GB
- Disallowed Features: None
- Number of Classes: 7 (easy) 18 (medium), 31 (hard)
- pcapML Metadata Comment Format:
sampleID,easylabel_mediumlabel_hardlabel
- Protocols: IPv4, TCP, UDP
- Metric to Optimize: Balanced Accuracy
Special Dataset Notes
None.
Citation(s)
original netML release
@article{barut2020netml,
title={Netml: A challenge for network traffic analytics},
author={Barut, Onur and Luo, Yan and Zhang, Tong and Li, Weigang and Li, Peilong},
journal={arXiv preprint arXiv:2004.13006},
year={2020}
}
nPrint release
@inproceedings{10.1145/3460120.3484758,
author = {Holland, Jordan and Schmitt, Paul and Feamster, Nick and Mittal, Prateek},
title = {New Directions in Automated Traffic Analysis},
year = {2021},
isbn = {9781450384544},
url = {https://doi.org/10.1145/3460120.3484758},
doi = {10.1145/3460120.3484758},
series = {CCS '21}
}