SLI | Pubs / Abstracts

Probabilistic Analysis of a Large Scale Urban Traffic Sensor Data Set

[ PDF ] | [ PS ]

Hutchins, Ihler, Smyth

in Second International Workshop on Knowledge Discovery from Sensor Data 2008.

Real-world sensor time series are often significantly noisier and more difficult to work with than the relatively clean data sets that tend to be used as the basis for experiments in many research papers. In this paper we report on a large case-study involving statistical data mining of over 100 million measurements from 1700 freeway traffic sensors over a period of seven months in Southern California. We discuss the challenges posed by the wide variety of different sensor failures and anomalies present in the data. The volume and complexity of the data precludes the use of manual visualization or simple thresholding techniques to identify these anomalies. We describe the application of probabilistic modeling and unsupervised learning techniques to this data set and illustrate how these approaches can successfully detect underlying systematic patterns even in the presence of substantial noise and missing data.

Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation

[ PDF ] | [ PS ]

Porteous, Newman, Ihler, Asuncion, Smyth, Welling

in Knowledge Discovery and Data Mining (KDD) 2008.

In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than the standard collapsed Gibbs sampler for LDA. No approximations are necessary, and we show that our fast sampling scheme produces exactly the same results as the standard (but slower) sampling scheme. Experiments on four real world data sets demonstrate speedups for a wide range of collection sizes. For the PubMed collection of over 8 million documents with a required computation time of 6 CPU months for LDA, our speedup of 5.7 can save 5 CPU months of computation.

Adaptive Inference in General Graphical Models

[ PDF ] | [ PS ]

Acar, Ihler, Mettu, Sumer

in Uncertainty in Artificial Intelligence (UAI) 2008.

Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of \emph{adaptive inference} is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional probabilities and dependencies in logarithmic time. We give experimental results for an implementation of our algorithm, and demonstrate its potential performance benefit in the study of protein structure.

Graphical Models for Statistical Inference and Data Assimilation

[ PDF ] | [ Link ]

Ihler, Kirshner, Ghil, Robertson, Smyth

Physica D: Nonlinear Phenomena, June 2007.

In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as well as any future predictions about it. From a statistical point of view, this process can be regarded as estimating many random variables which are related both spatially and temporally: given observations of some of these variables, typically corresponding to times past, we require estimates of several others, typically corresponding to future times.

Graphical models have emerged as an effective formalism for assisting in these types of inference tasks, particularly for large numbers of random variables. Graphical models provide a means of representing dependency structure among the variables, and can provide both intuition and efficiency in estimation and other inference computations. We provide an overview and introduction to graphical models, and describe how they can be used to represent statistical dependency and how the resulting structure can be used to organize computation. The relation between statistical inference using graphical models and optimal sequential estimation algorithms such as Kalman filtering is discussed. We then give several additional examples of how graphical models can be applied to climate dynamics, specifically estimation using multi-resolution models of large-scale data sets such as satellite imagery, and learning hidden Markov models to capture rainfall patterns in space and time.

Learning to detect events with Markov-modulated Poisson processes

[ Link ]

Ihler, Hutchins, Smyth

ACM Transactions on Knowledge Discovery from Data, Vol 1 Issue 3, Dec. 2007.

Time-series of count data occur in many different contexts, including Internet navigation logs, freeway traffic monitoring, and security logs associated with buildings. In this article we describe a framework for detecting anomalous events in such data using an unsupervised learning approach. Normal periodic behavior is modeled via a time-varying Poisson process model, which in turn is modulated by a hidden Markov process that accounts for bursty events. We outline a Bayesian framework for learning the parameters of this model from count time-series. Two large real-world datasets of time-series counts are used as testbeds to validate the approach, consisting of freeway traffic data and logs of people entering and exiting a building. We show that the proposed model is significantly more accurate at detecting known events than a more traditional threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and to make inferences about different aspects of events such as number of vehicles or people involved. The results indicate that the Markov-modulated Poisson framework provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.

Accuracy Bounds for Belief Propagation

[ PDF ] | [ PS ]

Ihler

in Uncertainty in Artificial Intelligence (UAI) 2007.

The belief propagation algorithm is widely applied to perform approximate inference on arbitrary graphical models, in part due to its excellent empirical properties and performance. However, little is known theoretically about when this algorithm will perform well. Using recent analysis of convergence and stability properties in belief propagation and new results on approximations in binary systems, we derive a bound on the error in BP's estimates for pairwise Markov random fields over discrete--valued random variables. Our bound is relatively simple to compute, and compares favorably with a previous method for bounding the accuracy of belief propagation.

Distributed Fusion in Sensor Networks: A graphical models perspective

[ PDF ]

Cetin, Chen, Fisher, Ihler, Moses, Wainwright, Willsky

IEEE Signal Processing Magazine, July 2006.

Distributed inference methods developed for graphical models comprise a principled approach for data fusion in sensor networks. The application of these methods, however, requires some care due to a number of issues that are particular to sensor networks. Chief of among these are the distributed nature of computation and deployment coupled with communications bandwidth and energy constraints typical of many sensor networks. Additionally, information sharing in a sensor network necessarily involves approximation. Traditional measures of distortion are not sufficient to characterize the quality of approximation as they do not address in an explicit manner the resulting impact on inference which is at the core of many data fusion problems. While both graphical models and a distributed sensor network have network structures associated with them, the mapping is not one to one. All of these issues complicate the mapping of a particular inference problem to a given sensor network structure. Indeed, there may be a variety of mappings with very different characteristics with regard to computational complexity and utilization of resources. Nevertheless, it is the case that many of the powerful distributed inference methods have a role in information fusion for sensor networks. In this article we present an overview of research conducted by the authors that has sought to clarify many of the important issues at the intersection of these domains. We discuss both theoretical issues and prototypical applications in addition to suggesting new lines of reasoning.

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

[ PDF ]

Porteous, Ihler, Smyth, Welling

in Uncertainty in Artificial Intelligence (UAI) 2006.

Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mixture models in the stick breaking representation. The advantage of this representation is improved modeling flexibility. For instance, one can design the prior distribution over cluster sizes or couple multiple infi- nite mixture models (e.g., over time) at the level of their parameters (i.e., the dependent Dirichlet process model). However, Gibbs samplers for in- finite mixture models (as recently introduced in the statistics literature) seem to mix poorly over cluster labels. Among others issues, this can have the adverse effect that labels for the same cluster in coupled mixture models are mixed up. We introduce additional moves in these samplers to improve mixing over cluster labels and to bring clusters into correspondence. An application to modeling of storm trajectories is used to illustrate these ideas.

Adaptive Event Detection with Time-Varying Poisson Processes

[ PDF ] | [ PS ]

Ihler, Hutchins, Smyth

in Knoweldge Discovery and Data Mining (KDD) 2006.

Time-series of count data are generated in many different contexts, such as web access logging, freeway traffic monitoring, and security logs associated with buildings. Since this data measures the aggregated behavior of individual human beings, it typically exhibits a periodicity in time on a number of scales (daily, weekly,etc.) that reflects the rhythms of the underlying human activity and makes the data appear non-homogeneous. At the same time, the data is often corrupted by a number of bursty periods of unusual behavior such as building events, traffic accidents, and so forth. The data mining problem of finding and extracting these anomalous events is made difficult by both of these elements. In this paper we describe a framework for unsupervised learning in this context, based on a time-varying Poisson process model that can also account for anomalous events. We show how the parameters of this model can be learned from count time series using statistical estimation techniques. We demonstrate the utility of this model on two datasets for which we have partial ground truth in the form of known events, one from freeway traffic data and another from building access data, and show that the model performs significantly better than a non-probabilistic, threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and make inferences about the detected events (e.g., popularity or level of attendance). Our experimental results indicate that the proposed time-varying Poisson model provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.

Learning Time-Intensity Profiles of Human Activity Using Nonparametric Bayesian Models

[ PDF ] | [ PS ]

Ihler, Smyth

in Neural Information Processing Systems (NIPS) 2006.

Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a non-parametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what factors are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-specific effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to realworld data sets of count data involving both vehicles and people are used to illustrate the technique.

Loopy Belief Propagation: Convergence and Effects of Message Errors

[ PDF ] | [ PS ]

Ihler, Fisher, Willsky

Journal of Machine Learning Research, May 2005.

Belief propagation (BP) is an increasingly popular method of performing approximate inference on arbitrary graphical models. At times, even further approximations are required, whether due to quantization of the messages or model parameters, from other simplified message or model representations, or from stochastic approximation methods. The introduction of such errors into the BP message computations has the potential to affect the solution obtained adversely. We analyze the effect resulting from message approximation under two particular measures of error, and show bounds on the accumulation of errors in the system. This analysis leads to convergence conditions for traditional BP message passing, and both strict bounds and estimates of the resulting error in systems of approximate BP message passing.

Nonparametric Belief Propagation for Sensor Network Self-Calibration

[ PDF ] | [ PS ]

Ihler, Fisher, Moses, Willsky

Journal of Selected Areas in Communication, Apr. 2005.

Automatic self-localization is a critical need for the effective use of ad-hoc sensor networks in military or civilian applications. In general, self-localization involves the combination of absolute location information (\eg GPS) with relative calibration information (\eg distance measurements between sensors) over regions of the network. Furthermore, it is generally desirable to distribute the computational burden across the network and minimize the amount of inter-sensor communication. We demonstrate that the information used for sensor localization is fundamentally local with regard to the network topology and use this observation to reformulate the problem within a graphical model framework. We then present and demonstrate the utility of \emph{nonparametric belief propagation} (NBP), a recent generalization of particle filtering, for both estimating sensor locations and representing location uncertainties. NBP has the advantage that it is easily implemented in a distributed fashion, admits a wide variety of statistical models, and can represent multi-modal uncertainty. Using simulations of small- to moderately-sized sensor networks, we show that NBP may be made robust to outlier measurement errors by a simple model augmentation, and that judicious message construction can result in better estimates. Furthermore, we provide an analysis of NBP's communications requirements, showing that typically only a few messages per sensor are required, and that even low bit-rate approximations of these messages can have little or no performance impact.

Particle Filtering Under Communications Constraints

[ PDF ] | [ PS ]

Ihler, Fisher, Willsky

in Statistical Signal Processing (SSP) 2005.

Particle filtering is often applied to the problem of object tracking under non-Gaussian uncertainty; however, sensor networks frequently require that the implementation be local to the region of interest, eventually forcing the large, sample-based representation to be moved among power-constrained sensors. We consider the problem of successive approximation (i.e., lossy compression) of each sample-based density estimate, in particular exploring the consequences (both theoretical and empirical) of several possible choices of loss function and their interpretation in terms of future errors in inference, justifying their use for measuring approximations in distributed particle filtering.

Estimating Dependency and Significance for High-Dimensional Data

[ PDF ]

Siracusa, Tieu, Ihler, Fisher, Willsky

in ICASSP 2005.

Understanding the dependency structure of a set of variables is a key component in various signal processing applications which involve data association. The simple task of detecting whether any dependency exists is particularly difficult when models of the data are unknown or difficult to characterize because of high-dimensional measurements. We review the use of nonparametric tests for characterizing dependency and how to carry out these tests with highdimensional observations. In addition we present a method to assess the significance of the tests.

[ PDF ] | [ PS ]

Ihler

Area Exam, MIT, 2004.

We present some background and results from the body of work collectively referred to as fast multipole methods (FMM). These comprise a set of techniques for speeding up so-called N -body problems, in which a potential function composed of a sum of pairwise interaction terms from N points is evaluated at an equally large number of locations. We present these methods from the viewpoint of low-rank block matrix approximations, first discussing a heuristic block-matrix approximation method [1], then moving into the analytic expansions which form the basis of both the original and new versions of the fast multipole method [2]. We attempt to provide sufficient background to understand and compute all the relevant results, yet present the material in sufficient generality that it is easy to understand the relationship to similar algorithms such as the fast Gauss transform [3].

Nonparametric Hypothesis Tests for Statistical Dependency

[ PDF ]

Ihler, Fisher, Willsky

IEEE Transactions on Signal Processing, Aug. 2004.

Determining the structure of dependencies among a set of variables is a common task in many signal and image processing applications, including multi-target tracking and computer vision. In this paper we present an information-theoretic, machine learning approach to problems of this type. We cast this problem as a hypothesis test between factorizations of variables into mutually independent subsets. We show that the likelihood ratio can be written as sums of two sets of Kullback-Leibler (KL) divergence terms. The first set captures the structure of the statistical dependencies within each hypothesis, while the second set measures the details of model differences between hypotheses. We then consider the case when the signal prior models are unknown, so that the distributions of interest must be estimated directly from data, showing that the second set of terms is (asymptotically) negligible and quantifying the loss in hypothesis separability when the models are completely unknown. We demonstrate the utility of nonparametric estimation methods for such problems, providing a general framework for determining and distinguishing between dependency structures in highly uncertain environments. Additionally, we develop a machine learning approach for estimating lower bounds on KL-divergence and mutual information from samples of high-dimensional random variables for which direct density estimation is infeasible. We present empirical results in the context of three prototypical applications: association of signals generated by sources possessing harmonic behavior, scene correspondence using video imagery, and detection of coherent behavior among sets of moving objects.

Message Errors in Belief Propagation

[ PDF ] | [ PS ]

Ihler, Fisher, Willsky

in Neural Information Processing Systems (NIPS) 2004.

Belief propagation (BP) is an increasingly popular method of performing approximate inference on arbitrary graphical models. At times, even further approximations are required, whether from quantization or other simplified message representations or from stochastic approximation methods. Introducing such errors into the BP message computations has the potential to adversely affect the solution obtained. We analyze this effect with respect to a particular measure of message error, and show bounds on the accumulation of errors in the system. This leads both to convergence conditions and error bounds in traditional and approximate BP message passing.

Nonparametric Belief Propagation for Sensor Network Self-Calibration

[ PDF ] | [ PS ]

Ihler, Fisher, Moses, Willsky

in ICASSP 2004.

Automatic self-calibration of ad-hoc sensor networks is a critical need for their use in military or civilian applications. In general, self-calibration involves the combination of absolute location information (e.g. GPS) with relative calibration information (e.g. estimated distance between sensors) over regions of the network. We formulate the self-calibration problem as a graphical model, enabling application of nonparametric belief propagation (NBP), a recent generalization of particle filtering, for both estimating sensor locations and representing location uncertainties. NBP has the advantage that it is easily implemented in a distributed fashion, can represent multi-modal uncertainty, and admits a wide variety of statistical models. This last point is particularly appealing in that it can be used to provide robustness against occasional high-variance (outlier) noise. We illustrate the performance of NBP using Monte Carlo analysis on an example network.

Nonparametric Belief Propagation for Self-Calibration in Sensor Networks

[ PDF ] | [ PS ]

Ihler, Fisher, Moses, Willsky

in Information Processing in Sensor Networks (IPSN) 2004.

Automatic self-calibration of ad-hoc sensor networks is a critical need for their use in military or civilian applications. In general, self-calibration involves the combination of absolute location information (e.g. GPS) with relative calibration information (e.g. time delay or received signal strength between sensors) over regions of the network. Furthermore, it is generally desirable to distribute the computational burden across the network and minimize the amount of inter-sensor communication. We demonstrate that the information used for sensor calibration is fundamentally local with regard to the network topology and use this observation to reformulate the problem within a graphical model framework. We then demonstrate the utility of \emph{nonparametric belief propagation} (NBP), a recent generalization of particle filtering, for both estimating sensor locations and representing location uncertainties. NBP has the advantage that it is easily implemented in a distributed fashion, admits a wide variety of statistical models, and can represent multi-modal uncertainty. We illustrate the performance of NBP on several example networks while comparing to a previously published nonlinear least squares method.

Using Sample-based Representations Under Communications Constraints

[ PDF ] | [ PS ]

Ihler, Fisher, Willsky

LIDS Tech Report #2601, 2004.

In many applications, particularly power-constrained sensor networks, it is important to conserve the amount of data exchanged while maximizing the utility of that data for some inference task. Broadly, this tradeoff has two major cost components�the representation�s size (in distributed networks, the communications cost) and the error incurred by its use (the inference cost).

We analyze this tradeoff for a particular problem: communicating a particle-based repre- sentation (and more generally, a Gaussian mixture or kernel density estimate). We begin by characterizing the exact communication cost of these representations, noting that it is less than might be suggested by traditional communications theory due to the invariance of the represen- tation to reordering. We describe the optimal, lossless encoder when the generating distribution is known, and pose a sub-optimal encoder which still benefits from reordering invariance.

However, lossless encoding may not be sufficient. We describe one reasonable measure of error for distribution-based messages and its consequences for inference in an acyclic network, and propose a novel density approximation method based on KD-tree multiscale representations which enables the communications cost and a bound on error to be balanced efficiently. We show several empirical examples demonstrating the method�s utility in collaborative, distributed signal processing under bandwidth or power constraints.

Nonparametric Belief Propagation

[ PDF ] | [ PS ]

Sudderth, Ihler, Freeman, Willsky

in Computer Vision and Pattern Recognition (CVPR) 2003.

In many applications of graphical models arising in computer vision, the hidden variables of interest are most naturally specified by continuous, non-Gaussian distributions. There exist inference algorithms for discrete approximations to these continuous distributions, but for the high-dimensional variables typically of interest, discrete inference becomes infeasible. Stochastic methods such as particle filters provide an appealing alternative. However, existing techniques fail to exploit the rich structure of the graphical models describing many vision problems. Drawing on ideas from regularized particle filters and belief propagation (BP), this paper develops a nonparametric belief propagation (NBP) algorithm applicable to general graphs. Each NBP iteration uses an efficient sampling procedure to update kernel-based approximations to the true, continuous likelihoods. The algorithm can accomodate an extremely broad class of potential functions, including nonparametric representations. Thus, NBP extends particle filtering methods to the more general vision problems that graphical models can describe. We apply the NBP algorithm to infer component interrelationships in a parts-based face model, allowing location and reconstruction of occluded features.

Efficient Multiscale Sampling from Products of Gaussian Mixtures

[ PDF ] | [ PS ]

Ihler, Sudderth, Freeman, Willsky

in Neural Information Processing Systems (NIPS) 2003.

The problem of approximating the product of several Gaussian mixture distributions arises in a number of contexts, including the nonparametric belief propagation (NBP) inference algorithm and the training of product of experts models. This paper develops two multiscale algorithms for sampling from a product of Gaussian mixtures, and compares their performance to existing methods. The first is a multiscale variant of previously proposed Monte Carlo techniques, with comparable theoretical guarantees but improved empirical convergence rates. The second makes use of approximate kernel density evaluation methods to construct a fast approximate sampler, which is guaranteed to sample points to within a tunable parameter $\epsilon$ of their true probability. We compare both multiscale samplers on a set of computational examples motivated by NBP, demonstrating significant improvements over existing methods.

Hypothesis Testing over Factorizations for Data Association

[ PDF ] | [ PS ]

Ihler, Fisher, Willsky

in Information Processing in Sensor Networks (IPSN) 2003.

The issue of data association arises frequently in sensor networks; whenever multiple sensors and sources are present, it may be necessary to determine which observations from different sensors correspond to the same target. In highly uncertain environments, one may need to determine this correspondence without the benefit of an \emph{a priori} known joint signal/sensor model. This paper examines the data association problem as the more general hypothesis test between factorizations of a single, learned distribution. The optimal test between known distributions may be decomposed into model-dependent and statistical dependence terms, quantifying the cost incurred by model estimation from measurements compared to a test between known models. We demonstrate how one might evaluate a two-signal association test efficiently using kernel density estimation methods to model a wide class of possible distributions, and show the resulting algorithm's ability to determine correspondence in uncertain conditions through a series of synthetic examples. We then describe an extension of this technique to multi-signal association which can be used to determine correspondence while avoiding the computationally prohibitive task of evaluating all hypotheses. Empirical results of the approximate approach are presented.

Nonparametric Belief Propagation

[ PDF ] | [ PS ]

Sudderth, Ihler, Freeman, Willsky

LIDS Technical Report # 2551, Aug. 2002.

In applications of graphical models arising in fields such as computer vision, the hidden variables of interest are most naturally specified by continuous, non-Gaussian distributions. However, due to the limitations of existing inference algorithms, it is often necessary to form coarse, discrete approximations to such models. In this paper, we develop a nonparametric belief propagation (NBP) algorithm, which uses stochastic methods to propagate kernel-based approximations to the true continuous messages. Each NBP message update is based on an efficient sampling procedure which can accomodate an extremely broad class of potential functions, allowing easy adaptation to new application areas. We validate our method using comparisons to continuous BP for Gaussian networks, and an application to the stereo vision problem.

Nonparametric Estimators for Online Signature Authentication

[ PDF ] | [ PS ]

Ihler, Fisher, Willsky

in ICASSP 2001.

We present extensions to our previous work in modelling dynamical processes. The approach uses an information theoretic criterion for searching over subspaces of the past observations, combined with a nonparametric density characterizing its relation to one-step-ahead prediction and uncertainty. We use this methodology to model handwriting stroke data, specifically signatures, as a dynamical system and show that it is possible to learn a model capturing their dynamics for use either in synthesizing realistic signatures and in discriminating between signatures and forgeries even though no forgeries have been used in constructing the model. This novel approach yields promising results even for small training sets.

Maximally Informative Subspaces: Nonparametric Estimation for Dynamical Systems

[ PDF ] | [ PS (zipped) ]

Ihler

Master's Thesis, MIT, Aug. 2000.

Modeling complex dynamical systems is a difficult problem with a wide range of ap- plications in prediction, discrimination, and simulation. Classical stochastic models make a number of simplifying assumptions to improve tractability (e.g. linear dynam- ics, Gaussian uncertainty). While such assumptions lead to algorithms which are both fast and optimal under the assumptions, there are a great many real world problems for which these assumptions are false. Recently, computational power has increased to the point where another method becomes feasible � purely example-based, or �non- parametric�, models. Yet these are limited because their computational requirements grow exponentially with the number of variables we observe about the system. For dynamical systems, in which we generally observe the past, this means that processes with any substantial past-dependence become intractable. In this thesis we present a novel dynamical system model making use of a nonparametric estimate of uncertainty, with an information-theoretic criterion for reducing the model�s required dimension while preserving as much of the predictive power in the observations as possible. To explore its behavior, we apply this technique to three dynamical systems � a �toy� nonlinear system (random telegraph waves), a real-world time series from predictive literature (Santa Fe laser data), and a more cutting-edge application (on-line signa- ture authentication). Each of these examples demonstrates techniques for improving the model�s performance and evidence of its effectiveness.

Learning Informative Statistics: A Nonparametric Approach

[ PDF ] | [ PS ]

Fisher, Ihler, Viola

in Neural Information Processing Systems (NIPS) 1999.

We discuss an information theoretic approach for categorizing and modeling dynamic processes. The approach can learn a compact and informative statistic which summarizes past states to predict future observations. Furthermore, the uncertainty of the prediction is characterized nonparametrically by a joint density over the learned statistic and present observation. We discuss the application of the technique to both noise driven dynamical systems and random processes sampled from a density which is conditioned on the past. In the first case we show results in which both the dynamics of random walk and the statistics of the driving noise are captured. In the second case we present results in which a summarizing statistic is learned on noisy random telegraph waves with differing dependencies on past states. In both cases the algorithm yields a principled approach for discriminating processes with differing dynamics and/or dependencies. The method is grounded in ideas from information theory and nonparametric statistics.