Embedding Network Information for Machine Learning-based Intrusion Detection
dc.contributor.author | DeFreeuw, Jonathan Daniel | en |
dc.contributor.committeechair | Tront, Joseph G. | en |
dc.contributor.committeemember | Yang, Yaling | en |
dc.contributor.committeemember | Marchany, Randolph C. | en |
dc.contributor.department | Electrical and Computer Engineering | en |
dc.date.accessioned | 2020-07-12T06:00:55Z | en |
dc.date.available | 2020-07-12T06:00:55Z | en |
dc.date.issued | 2019-01-18 | en |
dc.description.abstract | As computer networks grow and demonstrate more complicated and intricate behaviors, traditional intrusion detections systems have fallen behind in their ability to protect network resources. Machine learning has stepped to the forefront of intrusion detection research due to its potential to predict future behaviors. However, training these systems requires network data such as NetFlow that contains information regarding relationships between hosts, but requires human understanding to extract. Additionally, standard methods of encoding this categorical data struggles to capture similarities between points. To counteract this, we evaluate a method of embedding IP addresses and transport-layer ports into a continuous space, called IP2Vec. We demonstrate this embedding on two separate datasets, CTU'13 and UGR'16, and combine the UGR'16 embedding with several machine learning methods. We compare the models with and without the embedding to evaluate the benefits of including network behavior into an intrusion detection system. We show that the addition of embeddings improve the F1-scores for all models in the multiclassification problem given in the UGR'16 data. | en |
dc.description.abstractgeneral | As computer networks grow and demonstrate more complicated and intricate behaviors, traditional network protection tools like firewalls struggle to protect personal computers and servers. Machine learning has stepped to the forefront to counteract this by learning and predicting behavior on a network. However, this learned behavior fails to capture much of the information regarding relationships between computers on a network. Additionally, standard techniques to convert network information into numbers struggles to capture many of the similarities between machines. To counteract this, we evaluate a method to capture relationships between IP addresses and ports, called an embedding. We demonstrate this embedding on two different datasets of network traffic, and evaluate the embedding on one dataset with several machine learning methods. We compare the models with and without the embedding to evaluate the benefits of including network behavior into an intrusion detection system. We show that including network behavior into machine learning models improves the performance of classifying attacks found in the UGR’16 data. | en |
dc.description.degree | MS | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:18767 | en |
dc.identifier.uri | http://hdl.handle.net/10919/99342 | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | word embeddings | en |
dc.subject | intrusion detection | en |
dc.title | Embedding Network Information for Machine Learning-based Intrusion Detection | en |
dc.type | Thesis | en |
thesis.degree.discipline | Computer Engineering | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | MS | en |
Files
Original bundle
1 - 1 of 1