Link Search Menu Expand Document

netML Malware Detection

Dataset Overview

This dataset focuses on detecting malware flows in network traffic through routers. The dataset is created for malware detection task by obtaining 30 out of more than 300 raw traffic data from Stratosphere IPS. While the original dataset was released as features for the netML challenge, the dataset below represents a faithful recreation of the netML malware dataset at the raw network packet level. Although the dataset doesn’t perfectly match the original netML challenge, it is used in the nPrint paper and to generate the nPrint results in the leaderboard.

Task Description

The task is to identify the various malware and benign traffic flows in the dataset.

  • Dataset Link: Google Drive
  • Dataset Size (Uncompressed): < 10 GB
  • Disallowed Features: None
  • Number of Classes: 2 (easy), 19 (hard)
  • pcapML Metadata Comment Format: sampleID,easylabel_hardlabel
  • Protocols: IPv4, TCP, UDP
  • Metric to Optimize: Balanced Accuracy

Special Dataset Notes

Any special dataset notes here.

Citation(s)

original netML release

@article{barut2020netml,
  title={Netml: A challenge for network traffic analytics},
  author={Barut, Onur and Luo, Yan and Zhang, Tong and Li, Weigang and Li, Peilong},
  journal={arXiv preprint arXiv:2004.13006},
  year={2020}
}

nPrint release

@inproceedings{10.1145/3460120.3484758,
author = {Holland, Jordan and Schmitt, Paul and Feamster, Nick and Mittal, Prateek},
title = {New Directions in Automated Traffic Analysis},
year = {2021},
isbn = {9781450384544},
url = {https://doi.org/10.1145/3460120.3484758},
doi = {10.1145/3460120.3484758},
series = {CCS '21}
}

Leaderboard


Easy (2 Classes)

ModelBalanced AccuracyPaperCode
nPrint Autogluon (AutoML)92.4CCS 2021 - ArxivnPrint

Hard (19 Classes)

ModelBalanced AccuracyPaperCode
nPrint Autogluon (AutoML)86.1CCS 2021 - ArxivnPrint