摘要: |
As aviation systems continue to operate with high traffic, large amounts of documents containing safety-relevant data continue to be generated via reporting systems such as the Aviation Safety Reporting System (ASRS). Advanced natural language processing techniques, specifically pre-trained language models, have shown great success in domain-specific applications; however, the text in aviation safety reports is inundated with jargon and thus not fully utilized by general pre-trained models. In this research, we work towards developing a safety-informed aerospace-specific language model by pre-training a Bidirectional Encoder Representations from Transformer (BERT) model on reports from the Aviation Safety Reporting System and the National Transportation Safety Board. The resulting model, called SafeAeroBERT, is fine-tuned for the specific task of document classification, and can be further tuned for named-entity recognition, relation detection, information retrieval, and summarization. Results from the classification task are compared between SafeAeroBERT, the base BERT, and SciBERT models and show SafeAeroBERT outperforms the general BERT and SciBERT on classifying reports about weather and procedure. SafeAeroBERT can be used on custom tasks, not limited to document classification, and is intended to aid an intelligent knowledge manager for safety report repositories. |