An Event Group Based Classification Framework for Multi-variate Sequential Data

Keywords: Multi-variate time series, Symbolic data mining, Pattern search, SAX motifs, X-of-N decision trees


Decision tree algorithms were not traditionally considered for sequential data classification, mostly because feature generation needs to be integrated with the modelling procedure in order to avoid a localisation problem. This paper presents an Event Group Based Classification (EGBC) framework that utilises an X-of-N (XoN) decision tree algorithm to avoid the feature generation issue during the classification on sequential data. In this method, features are generated independently based on the characteristics of the sequential data. Subsequently an XoN decision tree is utilised to select and aggregate useful features from various temporal and other dimensions (as event groups) for optimised classification. This leads the EGBC framework to be adaptive to sequential data of differing dimensions, robust to missing data and accommodating to either numeric or nominal data types. The comparatively improved outcomes from applying this method are demonstrated on two distinct areas – a text based language identification task, as well as a honeybee dance behaviour classification problem. A further motivating industrial problem – hot metal temperature prediction, is further considered with the EGBC framework in order to address significant real-world demands.

Author Biographies

Chao Sun, University of Sydney

Chao Sun obtained his Masters and PhD degrees in data mining from University of Wollongong (2007 and 2016 respectively). He has been working in the University of Sydney as a data scientist since 2016, and his main duty is supporting and collaborating with various research projects in the digital humanities and social sciences domain. Chao’s research interests include machine learning, temporal spatial data analysis and visualisation, text mining and social media network ontologies.

David Stirling, University of Wollongong
David Stirling received his B.E. in Tasmania (1976), and a M.Sc. degree (Digital Techniques) from Heriot-Watt University, Scotland (1980). He worked as a design engineer in the aerospace industry (Avionics, UK), as well as instrumentation and control engineer in paper manufacturing, and subsequently as a research engineer in the steel industry here in Australia. Having developed an early interest in AI and knowledge based systems; he also completed his Ph.D. (Machine Learning) from the University of Sydney (1995). Ultimately becoming a Principal Research Scientist within BHP Research for several years, he amassed over 16 years of industrial research experience before leaving to form his own consultancy (1998). He has since developed numerous applications involving knowledge discovery and/or specialized classifiers from noisy multivariate domain data in; minerals exploration, medical research, insurance, geo-hazards, general manufacturing and primary industrial operations. He latterly joined the University of Wollongong as a senior lecturer/researcher.
How to Cite
Sun, C., & Stirling, D. (2017). An Event Group Based Classification Framework for Multi-variate Sequential Data. Australasian Journal of Information Systems, 21.
Selected Papers from the Australasian Conference on Data Mining (AusDM)