Bayesian Nonparametric Models for Segmentation of Time-Series

Our goal is to build a system that allows helping users manage their privacy settings on online social networks (OSNs) like Facebook (FB). As an example, consider a user on FB who employs it in his role as an employee, but also as a private person in his leisure time. By learning his usual role behavior the idea is to, e.g., help prevent accidental cross-posts.

As my focus is on Machine Learning, I try to look at a generalization of this problem:

  • Input Sequential data, i.e. discrete-valued sequences with an underlying temporal order
  • Problems Clustering of sequences, Segmentation of sequences, Learning a model for prediction of future observations, etc
  • Constraints Plain results for simple interpretation, detailed results for prediction

Given a set of discrete-valued sequences, the goal is to learn accurate models that can effectively be turned into predictors for future observations while yielding plain results. For this end, we develop Bayesian nonparametric (BNP) models for sequence clustering/segmentation. In the context of Knowledge Discovery, BNP models are especially useful as they adjust their complexity to the data.

As the current work focuses on arbitrary discrete-valued sequences with an underlying temporal order, towards the end of my Ph.D. I’ll have to map the results back to the specific scenario on OSNs.