Link Search Menu Expand Document

nPrint OS Detection

Dataset Overview

This dataset was created by re-labeling the traffic originally captured in the CICIDS 2017 dataset. The labels were generated by using the mapping of IP address to operating system avaialble on the CICIDS website. The traffic was split first by the source IP address of each packet, and then split into 100 packet samples (sequentially). Although two versions of the task were tested, the leaderboard and dataset below represent the harder OS detection task, across 13 classes.

Task Description

The task is to classify the operating system of the device that sent each 100 packet sample.

  • Dataset Link: Google Drive
  • Dataset Size (Uncompressed): < 1 GB
  • Disallowed Features: IPv4 Source IP, IPv4 Destination IP, TCP Ssource Port, TCP Destination Port, TCP SEQ and TCP ACK Numbers.
  • Number of Classes: 13
  • pcapML Metadata Comment Format: sampleID,easylabel_hardlabel
  • Protocols: IPv4, TCP
  • Metric to Optimize: Balanced Accuracy

Special Dataset Notes

None

Citation(s)

Original CICIDS Dataset

@article{sharafaldin2018toward,
  title={Toward generating a new intrusion detection dataset and intrusion traffic characterization.},
  author={Sharafaldin, Iman and Lashkari, Arash Habibi and Ghorbani, Ali A},
  journal={ICISSp},
  volume={1},
  pages={108--116},
  year={2018}
}

nPrint OS Detection Dataset

@inproceedings{10.1145/3460120.3484758,
author = {Holland, Jordan and Schmitt, Paul and Feamster, Nick and Mittal, Prateek},
title = {New Directions in Automated Traffic Analysis},
year = {2021},
isbn = {9781450384544},
url = {https://doi.org/10.1145/3460120.3484758},
doi = {10.1145/3460120.3484758},
series = {CCS '21}
}

Leaderboard (Hard Label)

ModelBalanced AccuracyPaperCode
nPrint Autogluon (AutoML)77.1CCS 2021 - ArxivnPrint