John Paparrizos, University of Chicago - August 2018


August 29, 2018


John Paparrizos

Accelerating Internet of Things Data Analytics through Scalable Time-Series Representation Learning

Kernel methods, a class of machine learning algorithms for pattern recognition, have shown a great deal of promise in the analysis of complex, real-world, data. However, kernel methods remain largely unexplored in the analysis of time- varying measurements (i.e., time series), which is becoming increasingly prevalent across scientific disciplines, industrial settings, and Internet of Things (IoT) applications. Until now, research in time-series analysis has focused on designing methods for three components, namely, (i) representation methods; (ii) comparison functions; and (iii) indexing mechanisms. Unfortunately, these components have typically been investigated and developed independently, resulting in methods that are incompatible with each other. The lack of a unified approach has hindered progress towards scalable analytics over massive time-series collections. We propose to address this major drawback by leveraging kernel methods to automatically learn time-series representations (i.e., learn to effectively compress time series). Such compact representations are compatible with common indexing mechanisms and, importantly, preserve the invariance to time-series distortions offered by user-defined comparison methods. Therefore, our approach enables computational methods to operate directly over the compressed time-series data, which significantly improves their storage and computation requirements. 

We propose to evaluate the performance of our learned representations on five tasks of critical importance in time-series analysis, namely, indexing, classification, clustering, sampling, and visualization. Additionally, we have already established a partnership with a leading electric service supplier to develop a case study on predicting future energy demand using large-scale smart meter data. Finally, we plan to integrate our methods on Apache Spark, a prominent big data processing platform, to facilitate analytics over massive time-series collections.