- Class: Monday and Wednesday 12:00-1:15 pm, HH 370
- Office Hours: Tuesdays and Thursdays 3-5 pm, GLLB 204 or Zoom
This course focuses on the application of regression to inform decision-making, particularly using interpretable models to understand the effect of interventions on business outcomes. Students learn to model experimental and observational data and infer causality instead of correlation only. Prerequisite: DATA 5600
By the end of this course, you will be able to:
- Specify identification strategies for estimating causal effects.
- Design effective experiments and apply appropriate methods for experimental data.
- Model observational data and infer causality using a variety of techniques.
Successful students in this course will demonstrate conceptual understanding and skill mastery by applying the modeling workflow within their chosen business context and as part of a group. Each student is an essential member of a community of learners and should consider the instructor as both a teacher and a mentor.
Students can focus on learning by using the following study tips:
- Prepare for class by studying assigned material and identifying questions.
- Engage during class by asking questions, taking notes, and actively coding.
- Apply what you learn in class by working on projects.
- Evaluate what you’re learning by reviewing and reflecting on course materials.
- Reinforce what you’re learning by utilizing office hours and working with classmates.
After completing the course, student resumes should reflect the tools, skills, and methods they have learned and showcase the projects they have completed. For example:
DATA 5620 Advanced Regression for Causal Inference serves as one of the courses in the modeling sequence together with DATA 5610 Advanced Machine Learning for Analytics and DATA 5630 Deep Forecasting.
Each student will need to bring a laptop, either their own or one rented from Utah State. While students are welcome to use their preferred tools, the following data stack is recommended and certain tools are required, as indicated below.
Python is a general purpose, open source programming language developed by computer scientists. It is the most commonly used programming language for data wrangling, visualizations, and modeling. See the data stack training for details on how to best install and manage Python versions and project environments.
A code editor or integrated development environment (IDE), outside of an open source programming language, is a data analyst’s most important tool. Positron is a next-generation data science IDE. Built on VS Code’s open source core, Positron combines the multilingual extensibility of VS Code with essential data tools common to language-specific IDEs. See the data stack training for a summary of Positron’s data-friendly features.
GitHub is an online hosting service for project repositories managed using Git, a powerful version control system and the industry standard for software development and data projects. Git and GitHub facilitates collaboration on a single code base and enables students to organize an online portfolio of work. See the data stack training for the basics of using Git and GitHub and a project template.
Quarto is an open source publishing system that combines text, code, and output. Quarto documents are similar to Jupyter notebooks, except the content can be rendered into a variety of formats, including PDFs, Word documents, PowerPoint presentations, Revealjs slide decks, interactive dashboards, websites, etc. While Quarto is not required for the course, students will be required to submit code and output in a PDF format. See the data stack training for more details on Quarto, including how to use Quarto to render a Jupyter notebook into a PDF.
Students may use their preferred AI to assist in studying and completing assignments. All students have access to Copilot through Utah State. However, students must remember that the objective of this course is learning. AI can contribute to learning, including helping to debug code and explain concepts in new ways. AI can also be a detriment to learning, including when students use AI to think for them. See the data stack training for details on getting access to AI and a discussion on using AI responsibly.
We will be studying Nick Huntington-Klein’s The Effect: An Introduction to Research Design and Causality (available free online). He also has a video series that complements the book.
We will be referencing Andrew Heiss’ Program Evaluation for Public Service course, which also includes a video series.
It may also be helpful to use Richard McElreath’s Statistical Rethinking video series as a reference when we discuss Bayesian methods.
Assignments are designed to be aligned with what students will be expected to do in practice. No credit will be given for late work unless an arrangement is made prior to the relevant deadline. Students are encouraged to review their graded work and ask questions to avoid repeated mistakes.
Letter grades will follow the standard rubric and will be determined as follows.
| A | 93-100% | B- | 80-82% | D+ | 67-69% |
| A- | 90-92% | C+ | 77-79% | D | 63-66% |
| B+ | 87-89% | C | 73-76% | D- | 60-62% |
| B | 83-86% | C- | 70-72% | E | 0-59% |
This class is all about participation. If students aren’t attending, they can’t contribute. Students will take turns preparing slides and presenting to lead the discussion in class. When relevant, students should include relevant code when leading the discussion.
Interviews are an opportunity for students to demonstrate their personal understanding and prepare for future real-world job interviews. Designed to complement group project work, interviews will include questions about course concepts, project work (including code), and reflections on performance in the course.
Interviews with the instructor will occur at the beginning, middle, and end of the semester during office hours or by appointment.
Projects are the focus of learning by doing in the course, serving as the means for students to apply their conceptual understanding and skill mastery both as a group and within their business domain of interest. Students will complete two group projects, one focused on experimental data and one focused on observational data. The groups will both present and submit a report.
The week before the presentations, groups will submit a draft of their slides to get feedback and have time for revision. The other students in the class, as well as the group members themselves, will help evaluate each of the presentations.
Please note that the instructor reserves the right to change the following schedule at any time and will provide students sufficient notice as it relates to assignment deadlines.
- Causal Inference
- Modeling Workflow
- Ch. 1-2 of The Effect
- Decisions and Data
- Probability and Statistics
- Ch. 3-4 of The Effect

