原文传递 CLASSIFICATION OF TIME SERIES PATTERNS FROM COMPLEX DYNAMIC SYSTEMS
题名: CLASSIFICATION OF TIME SERIES PATTERNS FROM COMPLEX DYNAMIC SYSTEMS
作者: Jack C. Schryver Nageswara Rao
关键词: systems;dynamic;series;class;lassi;erns;sifi;template;comp;cati
摘要: Numerical time series data is becoming more prevalent in large-scale databases created by empirical research conducted particularly within the transportation research community. Locating and identifying temporal patterns of interest in numerical time series data is an increasingly important problem for which there exists few available techniques. Numerical time series pose many interesting problems in classification, segmentation, prediction, diagnosis and anomaly detection. This research focuses on the problem of classification of numeric^ time .series data where price knowledge of pattern form is available. Our approach is nonparametric and allows for a wide variety of abstract template families for recognition of temporal patterns under a unified and well understood mathematical framework (automata theory) without having to specify functional forms. Finite state automata, which are normally used with nominal alphabets, were extended to recognize temporal patterns constructed from numerical alphabets to allow arbitrary stretching and compression, rescaling, alternative or repeated subpatterning, concatenation, and recursive patterning. This approach offers several advantages over ad hoc solutions or efforts to develop new special-purpose languages for temporal pattern recognition. It operates on raw time series and does not depend on an intermediate feature extraction step. It is particularly suitable when large training sets are not available, e.g., for supervised learning of multilayer perceptrons. The new algorithm builds upon previous work in dynamic programming algorithms for approximate strings automata matching by exploiting additional structure in numerical strings as compared with strings composed from nominal alphabets. The matching algorithm accepts numerical strings as input, and uses logical preconditions and memory for special variables to restrict search in a discretized range. It computes the substitution cost of the best-matching numerical string to the input recognized by the automaton. The automata are very compact and intuitive for even the casual user. The matching algorithm is implemented in an interactive graphical environment in which the user is given considerable freedom to explore the capabilities of the matching algorithm. The interactive interface allows the user to browse large numerical time series in a graphical window and select a template or recognizer from a template library. The user frames a subsequence for classification with the use of sliders, and simply presses a button with the mouse to apply the matching algorithm to the subsequence. The interface not only shows goodness-of-fit to the input string, but also displays the best-matching string by plotting it over the input data. The area of mismatch between input string and the template is highlight^ in red, thus providing the user with valuable insight into the quality of fit In order to demonstrate the power o, automata for temporal pattern-matching, templates were developed for recognition of flat, pieceswise linear, monotonic decreasing, monotonic increasing, oscillatory, concave, convex, and stopping sequences. It was demonstrated how more complex temporal patterns could be recognized using concatenation and recursion of simpler automata. The matching algorithm was validated on artificial input strings generated from seven different pattern families for which automata had been constructed. All costs were computed to fill a 7 x 7 confusion matrix. Costs along the diagonal were lower overall than off-diagonal costs. More general automata generated costs at least as low as specific automata contained by the general automata. Additional validation runs were completed to fill a 4 x 4 confusion matrix using templates and signals representing continuous downward, smooth, step and pump stopping strategies. The stopping sisals were derived from actual field data. All stopping templates fit each of the stopping signed quite well, although in general there was no incremental benefit for templates designed for specific signals. Automata or templates were sometimes able to provide good fits to temporal patterns for which they had not been designed, suggesting that template designers need to carefully define restrictive preconditions to narrow the set of recognizable temporal patterns. The matching algorithm shows great promise as a potential tool for the transportation analyst interested in characterization of time series data.
报告类型: 科技报告
检索历史
应用推荐