Link Search Menu Expand Document

Snowflake Fingerprintability

Dataset Overview

Tor is the most well-known tool for circumventing censorship. Unfortunately, Tor traffic has been shown to be detectable using deep-packet inspection. WebRTC is a popular web frame-work that enables browser-to-browser connections. Snowflake is a novel pluggable transport that leverages WebRTC to connect Tor clients to the Tor network. In theory, Snowflake was created to be indistinguishable from other WebRTC services. We evaluate the indistinguishability of Snowflake. We collect over 6,500 DTLS handshakes from Snowflake, Facebook Messenger, Google Hangouts, and Discord WebRTC connections. The dataset listed here further splits these handshakes into the browser that was useed to capture the traffic, either Google Chrome or Firefox, resulting in 7 classes of traffic.

Task Description

The task is to classify the (browser, application) tuple in each DTLS handshake in the datset.

  • Dataset Link: Google Drive
  • Dataset Size (Uncompressed): < 1GB
  • Disallowed Features: None
  • Number of Classes: 7
  • pcapML Metadata Comment Format: sampleID,label
  • Protocols: IPv4, UDP, DTLS
  • Metric to Optimize: Macro F1

Special Dataset Notes

None

Citation(s)

Original Release

@article{macmillan2020evaluating,
title={Evaluating snowflake as an indistinguishable censorship circumvention tool},
author={MacMillan, Kyle and Holland, Jordan and Mittal, Prateek},
journal={arXiv preprint arXiv:2008.03254},
year={2020}
}

Subsequent Dataset Release (Fine-grained Classes)

@inproceedings{10.1145/3460120.3484758,
author = {Holland, Jordan and Schmitt, Paul and Feamster, Nick and Mittal, Prateek},
title = {New Directions in Automated Traffic Analysis},
year = {2021},
isbn = {9781450384544},
url = {https://doi.org/10.1145/3460120.3484758},
doi = {10.1145/3460120.3484758},
series = {CCS '21}
}

Leaderboard


ModelMacro F1PaperCode
nPrint Autogluon (AutoML)99.8CCS 2021 - ArxivnPrint