# Course overview

Math 561 Algebraic and Geometric Methods in Statistics

Mondays & Wednesdays 11:25am - 12:40pm

At a glance:

Input:

interest in applied data analysis and new techniques for developing statistical models,

critical thinking & skepticism,

enthusiasm for learning new concepts.

Output:

an understanding of nonlinear algebra and its impact in statistics,

deep knowledge of exponential families and what role model structure plays in applied data analysis,

experience in working with non-traditional data such as sparse networks in an applied field,

an understanding of the state of the art in testing model/data fit.

Interactive and not a traditional exam-based course:

Homework;

Group projects;

Communication and presentation an essential element for PhD students.

As I say on day 1 of each semester:

This can be your class. Make it work for you.

### What is this course about?

Algebraic statistics as a field started in the 1990s at a convergence point between computational algebra and mathematical statistics. The motivation for this set of blended tools was applied analysis of categorical data and design of experiments. The early days saw a flourish of theoretical research about algebraic, geometric, and combinatorial methods that can be useful in statistics. In the recent decade, there has been a strong push for applications, as detailed for example in this overview paper.

The official department course syllabus can be found here.

No exams are planned for this graduate elective course.

### Course Description (Bulletin):

Algebraic structures are present in a broad variety of statistical contexts, involving both parametric and non-parametric statistical models for continuous and discrete random variables. A broad range of algebraic tools is used to better understand model structure, improve statistical inference, and explore new classes of models. The course offers an overview of fundamental theoretical constructions relevant to some of the more popular recent applications in the field: exact conditional test for discrete data, likelihood geometry, parameter identifiability and model selection, network models with applications to social sciences and neuroscience, and phylogenetics and tree-based evolutionary models in biology.

Enrollment: Graduate elective.

### Textbook(s):

Algebraic statistics, (2018), Seth Sullivant, American Mathematical Society, Graduate Studies in Mathematics. [e-version of this text is available to all current students through Illinios Tech library!]

### Other references:

Lectures in algebraic statistics, (2008) Mathias Drton, Bernd Sturmfels, and Seth Sullivant. Oberwolfach lecture series, Birkauser, ISBN 978-3-7643-8905-5 (available online).

Markov bases in algebraic statistics, (2012) Akimichi Takemura, Hisayuki Hara, and Satoshi Aoki, Springer series in statistics, ISBN 978-1-4614-3719-2.

### Other required material:

Several research papers in various mathematics and statistics journals published in the last 20 years, depending on topic emphasis. Use of a statistical software, such as R, and related algebraic computation packages. (All software will be free/open source.)

### Prerequisites:

Undergraduate course in mathematical statistics (such as MATH 476 or beyond), basic proficiency in either computational algebra, graduate-level linear algebra, discrete optimization, or combinatorics.

### Objectives:

Students will develop the ability to recognize geometric structure in statistical models, realize its impact on statistical inference procedures in practice, and identify open challenges.

Students will become familiar with the major tools used in the field of algebraic statistics, including some basics of computational algebra, optimization, graph theory, and matroid theory.

Students will understand the basic notions of model geometry and geometry of parameter spaces and how the statistical problems can be translated to abstract mathematical problems which can often be solved directly with tools from other fields of mathematics.

Students will practice their knowledge of these techniques through the required use of a statistical computing software.

Students will further explore the field, and develop their research and communication skills through a course project and presentation. The project will concern a topic approved by the instructor. Topics can include (computational) applications of the course material to student’s own research area, and expository talks (with proofs) on material not covered in class (such as from a research paper).

### Lecture schedule:

2 75-minute lectures per week