SLI | Classes / CS274A: Probabilistic Learning

CLOSED : 2010 OFFERING

Assignments and Exams:

HW1	1/13/10	Soln
HW2	1/27/10	Soln
Midterm	2/05/10	Soln
HW3,Data Δ	2/19/10	Soln
HW4,ZIP	3/12/10
Final	3/19/10	1:30-3:30
		Exam solutions

Student Comment Page

Lecture: Roland Hall (RH) 184, MWF 2-3pm

Instructor: Prof. Alex Ihler

Introduction to probabilistic models, inference, and learning.

CS274A is an introductory course to probabilistic approaches to learning from data. Probabilistic models form an important part of many areas of computer science, and probabilistic learning (in this context, automatically constructing probabilistic models from data) has become an important tool in sub-fields such as artificial intelligence, data mining, speech recognition, computer vision, bioinformatics, signal processing, and many more. CS274A will provide an introduction to the concepts and principles which underly probabilistic models, and apply these principles to the development, analysis, and practical application of machine learning algorithms.

The course will focus primarily on parametric probabilistic modeling, including data likelihood, parameter estimation using likelihood and Bayesian approaches, hypothesis testing and classification problems, density estimation, clustering, and regression. Related problems, including model selection, overfitting, and bias/variance trade-offs will also be discussed.

Background.

The course is intended to be an introduction to probabilistic learning, and thus has few explicit requirements. Students are expected to be familiar with basic concepts from probability, linear algebra, multivariate calculus, etc. Homeworks will use the MATLAB programming environment, but no prior experience with MATLAB is required for the course.

Course format.

Three lectures per week (MWF). Homeworks due in class approximately every two weeks. Two exams (midterm and final). Grading: 40% homework, 25% midterm, 35% final.

Office Hours.

Office hours for the course are 4pm Fridays, or by appointment.

Collaboration.

Discussion of the course concepts and methods among the students is encouraged; however, all work handed in should be completely your own. In order to strike a balance, we'll use the "work product" rule: while discussing anything related to the homework, you should retain no work product created during the discussion. In other words, you can meet and discuss the problems, describe the solution, etc., but then all parties must go away from the meeting with no record (written notes, code, etc.) from the meeting and do the homework problem on your own. If you work on a whiteboard, just erase it when you're done discussing. Don't show someone else your homework, or refer to it during the discussion, since by this policy you must then throw it away.

Textbooks.

The required textbook for the course is Bishop's "Pattern Recognition and Machine Learning", but lectures are likely to follow the book only loosly. Other recommended reading include MacKay's "Information Theory, Inference, and Learning Algorithms" (available online at http://www.inference.phy.cam.ac.uk/mackay/itila/), Duda, Hart, and Stork's "Pattern Classification", and Hastie, Tibshirani, and Friedman's "Elements of Statistical Learning".

Matlab

Often we will write code for the course using the Matlab environment. Matlab is accessible through NACS computers at several campus locations (e.g., MSTB-A, MSTB-B, and the ICS lab), and if you want a copy for yourself student licenses are fairly inexpensive ($100). Personally, I do not recommend the open-source Octave program as a replacement, as the syntax is not 100% compatible and may cause problems (for me or you).

If you are not familiar with Matlab, there are a number of tutorials on the web:

University of Utah, very short
CMU / UMichigan tutorial, also short
University of Florida's tutorial, more complete
Union College / Cyclismo.Org tutorial, also good
UMaryland guide, lots of pointers to other tutorials and reference manuals

You may want to start with one of the very short tutorials, then use the longer ones as a reference during the rest of the term.

(Tentative) Schedule of Topics.

Week 1	01/04/2010	PDF, : Introduction, probability distributions; frequentist vs. Bayesian viewpoints
	01/06/2010	PDF, Lecture : Bayes' rule, exponential family distributions
	01/08/2010	PDF, Lecture : multivariate distributions; conditional independence; Bayes' nets;
For a review of probability, a few good references are: Prof. Smyth's 274A handout #1 on probability; the textbook by Olofsson, "Probability, Statistics & Stochastic Processes" (Bayes Rule, 43-56; random variables and expectation, 77-108; joint distributions, 159-?) and a UCLA stat wiki that is not verbose, but might serve as a reminder; see e.g. Fundamentals, Rules, RVs, Expectations.
Week 2	01/11/2010	PDF, Lecture : more graphical models; multivariate Gaussians	Read Prof. Smyth's handout #2
	01/13/2010	PDF, Lecture : introduction to learning, (multivariate Gaussians), likelihood, parameters
	01/15/2010	PDF, Lecture : ML learning I: data likelihood, univariate ML; bias & variance
Week 3	01/18/2010	MLK Holiday
	01/20/2010	PDF, Lecture : ML learning II: exponential family, multivariate
	01/22/2010	PDF, Lecture : ML learning II: multivariate models
Week 4	01/25/2010	PDF, Lecture : Bayesian learning I: priors, posterior distributions; MAP & MPE estimates
	01/27/2010	PDF, Lecture : Bayesian learning II: conjugate priors; beta-binomial
	01/29/2010	PDF, Lecture : Bayesian learning III: Gaussian models; Bayes optimal decisions
Week 5	02/01/2010	No class
	02/03/2010	Review
	02/05/2010	Midterm exam
Week 6	02/08/2010	PDF, Lecture : Hypothesis testing, class-conditional models; predictive distributions
	02/10/2010	PDF, Lecture : Regression I: linear regression; regression as parameter estimation
	02/12/2010	PDF, Lecture : Regression II: bias & variance; priors
Week 7	02/15/2010	Presidents day holiday
	02/17/2010	PDF, Lecture : Regression III: priors, posteriors
	02/19/2010	Notes, Audio, Example : Regression to classification: logistic regression; Reading PRML Ch 4
Week 8	02/22/2010	Audio : Classification and density estimation
	02/24/2010	Audio : Mixture models and EM: Reading Jordan handout and Smyth handout, PRML Ch 9
	02/26/2010	Mixture models and EM
Week 9	03/01/2010
	03/03/2010	Notes : Complexity and model selection; marginal likelihood, BIC approximation; Reading PRML 3.5, 4.4
	03/05/2010	Hinton talk
Week 10	03/08/2010	Audio : Time series: autoregressive models, HMMs, filtering and smoothing tasks; Reading PRML Ch 13
	03/10/2010	Review?
	03/12/2010
Final Exam	03/19/2010	Final exam, 1:30-3:30pm