详情

原文传递 Cluster Computing For Automated Network Analysis At Scale.

题名：	Cluster Computing For Automated Network Analysis At Scale.
作者：	Brida, B. J.
关键词：	Big data, Programming languages, Data set, Operating systems, Databases, Open source software, Computer programs, Domain specific programming languages, Data analysis, Network protocols, Ecosystems, Computers, Digital data, Network analysis(management), Computer files, Mapreduce, Hadoop, Packet capture, Traffic analysis, Network analysis, Spark, Hbase, Not only structured query language, Hadoop distributed file system, Packet capture next generation, Network traffic, Internet protocol
摘要：	Conventional single node packet analyzers are unable to monitor network traffic at scale. In this thesis, elements of the Apache Hadoop ecosystem, including HBase, Spark, and MapReduce, are employed to conduct network traffic analysis on a large collection of network traffic. Limited analysis is conducted directly on packet capture next generation (pcapng) files on the Hadoop Distributed File System (HDFS) using MapReduce. Next, to allow for repeated analysis on the same dataset without reading all source files in their entirety for every calculation, pcapng files are parsed and relevant meta-data is bulk loaded into HBase, a Not Only Structured Query Language (NoSQL) database employing the HDFS for parallelization. This NoSQL database is then accessed via Apache Spark where pertinent data is loaded into DataFrames and additional analysis on the network traffic takes place. This research demonstrates the viability of custom, modular, automated analytics, employing open-source software to enable parallelization, to conduct traffic analysis at scale.
报告类型：	科技报告