Embedding Network Information for Machine Learning-based Intrusion Detection

DeFreeuw, Jonathan Daniel2020-07-122020-07-122019-01-18vt_gsexam:18767http://hdl.handle.net/10919/99342As computer networks grow and demonstrate more complicated and intricate behaviors, traditional intrusion detections systems have fallen behind in their ability to protect network resources. Machine learning has stepped to the forefront of intrusion detection research due to its potential to predict future behaviors. However, training these systems requires network data such as NetFlow that contains information regarding relationships between hosts, but requires human understanding to extract. Additionally, standard methods of encoding this categorical data struggles to capture similarities between points. To counteract this, we evaluate a method of embedding IP addresses and transport-layer ports into a continuous space, called IP2Vec. We demonstrate this embedding on two separate datasets, CTU'13 and UGR'16, and combine the UGR'16 embedding with several machine learning methods. We compare the models with and without the embedding to evaluate the benefits of including network behavior into an intrusion detection system. We show that the addition of embeddings improve the F1-scores for all models in the multiclassification problem given in the UGR'16 data.ETDIn Copyrightword embeddingsintrusion detectionEmbedding Network Information for Machine Learning-based Intrusion DetectionThesis