Gibran Gómez, PhD Student, IMDEA Software Institute
Transport Layer Security (TLS) is utilized by several applications to secure network communication through encryption. Malware adoption of TLS is rapidly growing, disabling widespread approaches for detection on-the-wire that require to have access to plain-text contents of network communications to characterize malicious traffic. Due to traffic decryption disrupts privacy for all other types of communication (for instance, by using a Man-in-the-Middle approach), different supervised machine learning based strategies have been developed to build malware detectors directly from TLS metadata. Although, such solutions work just for a small subset of labeled samples. In this talk we’re going to present an unsupervised approach, that doesn’t have such limitation. Instead, it can be applied to labeled or unlabeled samples to cluster similar TLS flows, allowing to produce a model able to make predictions on previously unseen traces from a larger number of malware families.