关键词: |
Data analysis, Computer programming, Supervised machine learning, Information science, Programming languages, Machine learning, Network science, Data processing, Data mining, Big data, Orc(optimized row columnar), Sql(structured query language), Json(java script object notation), Jdbc(java database connectivity), Kml(keyhole markup language), Acars(aircraft communications addressing and reporting system), Adsb(automatic dependent surveillancebroadcast), Apache spark 2.0 program |
摘要: |
In todays data-intensive world, the power to analyze huge amounts of data is critical to the success of any organization, including the military. Many data analysis tools have been developed in the past decade along with the high-performance machine learning algorithms. At present, many of these tools unfortunately are out of reach of the target audiencesubject matter expertsbecause one must master some of the advanced computer science concepts to use these tools effectively. This thesis proposes to build a prototype data analysis platform that will hide the underlying complexity of the tools from the subject matter experts. Using the platform, the end users can analyze data through a simple, menu-driven interface. The prototype will be built using the programming language Python and the open-source, distributed data processing engine Apache Spark 2.0. Different components of Spark 2.0 will be studied and evaluated to determine the best approach for building the prototype. The effectiveness of the prototype will be examined using the ADSB (Automatic Dependent Surveillance - Broadcast) unfiltered flight data. The thesis concludes with the review of the prototype developed for ADSB and the recommendation on possible ways of extending the prototype. |