ICSLE 2019 Tutorial

learning with insight

Inferring Causal Effects from Learning Analytics: Discovering the Nature of Bias

This tutorial explores the ‘Potential-Outcome’ causal framework and its underlying assumptions: Stable Unit Treatment Value Assumption (SUTVA), Positivity, Ignorability, and Consistency. It presents guidelines on how to conduct causal analyses in observational study settings and more precisely in educational areas, where observations are collected through learning analytics. It provides a set of techniques that together allows to infer valid causal effects. Among others, this includes: 1) graphically representing variables and their interrelationships corresponding to research questions; 2) measuring and reducing the level of imbalance in a dataset using matching techniques such as Coarsened Exact Matching (CEM), Mahalanobis Distance Matching (MDM), and Propensity Score Matching (PSM) and their associated imbalance metrics, namely, L1 vector norm, Average Mahalanobis Imbalance (AMI), and Difference in Means; 3) defining bias using a directed acyclic graph (DAG); 4) d-separating any path (association) between any pair of variables by controlling for the proper set of confounding factors; 5) checking the level of data balance achieved through d-separation using the former matching techniques and data imbalance metrics; 6) calculating propensity scores; and 7) using Inverse Probability of Treatment Weighting (IPTW; a valid approach to use propensity scores) to estimate the expected value of each potential outcome and the average causal effect within the derived pseudo-population. A dataset originating from recent educational studies will be used in the hands-on portions of the tutorial.


Kumar, Vivekanandan Suresh
Athabasca University, Canada

Dr. Kumar is a Professor in the School of Computing and Information Systems at Athabasca University, Canada. He holds the Natural Sciences and Engineering Research Council of Canada’s (NSERC) Discovery Grant on Anthropomorphic Pedagogical Agents, funded by the Government of Canada. His research focuses on developing anthropomorphic agents, which mimic and perfect human-like traits to better assist learners in their regulatory tasks. His research includes investigating technology-enhanced erudition methods that employ big data learning analytics, self-regulated learning, co-regulated learning, causal modeling, and machine learning to facilitate deep learning and open research. For more information, visit http://vivek.athabascau.ca.


Boulanger, David
Athabasca University, Canada

David Boulanger is a student and data scientist involved in the learning analytics research group at Athabasca University. His primary research focus is on observational study designs and the application of computational tools and machine learning algorithms in learning analytics including writing analytics.


Fraser, Shawn N.
Athabasca University, Canada

Dr. Fraser is an Associate Dean in Teaching & Learning and Associate Professor at Athabasca University, and an Adjunct Assistant Professor in Physical Education and Recreation at the University of Alberta. His research interests include understanding how stress can impact upon rehabilitation success for heart patients. He teaches research methods courses in the Faculty of Health Disciplines and is interested in interdisciplinary approaches to studying and teaching research methods and data analysis.

Tutorial Format

Participants will be given a set of exercises to help them understand and practice individually the concepts taught during the tutorial. They will be offered a research question formulated from an educational dataset that will also be supplied to them and will be asked to estimate causal effects and their effect size after having balanced data. Calculations to estimate average causal effects will be done using R and RStudio. Exercises and step-wise instructions will be given to participants throughout the tutorial as each new concept will be introduced. Moreover, participants will be invited to work on these activities in pair to give them fair chances of successfully completing each step. This tutorial will also be an ongoing interactive discussion.


The tutorial targets educational researchers, data scientists, Masters and PhD students. This tutorial will be of particular interest to those who want to get initiated to causal inferencing from observational data and seek an alternative research method to the traditional randomized experiment. Some background in statistics and programming (e.g., descriptive statistics, probability, linear regression, R) and research methods is an asset. Participants only need to register to this tutorial session in order to engage in the interactive discussion; there is no call for papers.

Prior Experience

Our team has previously presented similar workshops and tutorials at the 2018 International Conference on Technology for Education (T4E) “Inferring Causal Effects from Learning Analytics: Discovering the Nature of Bias”, the 2018 International Conference on Intelligent Tutoring Systems (ITS) “Automating Educational Research Through Learning Analytics: Data Balancing and Matching Techniques” (https://learninganalytics.ca/its-2018-tutorial-on-automating-educational-research/), the 2017 International Conference on Artificial Intelligence in Education (AIED) “Matching Techniques: Hands-on Approach to Measuring and Modeling Educational Data,” and the 2018 International Conference on Smart Learning Environments (ICSLE) “Open Research and Observational Study for 21st Century Learning.”


March 19, 2019 (Tuesday) – Day 2

12:00 – 13:00Lunch
(Boxes lunches distributed at room 314)
13:00 – 13:45
(Room 382)
Tutorial: Observational Studies and Learning Analytics – Theoretical Section
Vivekanandan S. Kumar, David Boulanger, Shawn N. Fraser
13:45 – 14:45
(Room 382)
Tutorial: Observational Studies and Learning Analytics – Hands-on Section
Vivekanandan S. Kumar, David Boulanger, Shawn N. Fraser
14:45 – 15:00Tea/Coffee Break
PLACE: Outside of room 332 and 333A, B & C
15:00 – 16:00
(Room 382)
“Smart Learning – Experience Reports” Session
Challenges in recruiting and retaining participants for smart learning environment studies
Isabelle Guillot, Claudia Guillot, Rébecca Guillot, Jérémie Seanosky, David Boulanger, Shawn N. Fraser, Vivekanandan Kumar, Kinshuk
16:00 – 17:00Joint Activities with US-China Smart Education Conference
Keynote Speech: Stephen Attenborough
17:00 – 19:00Exhibitor Reception
19:00 – 21:00Joint Activities with US-China Smart Education Conference
Ed Tech Ascend Pitch Competition
Hardware/Software Requirements & Resources

Internet connectivity will be required for participants to download the tutorial materials. Participants, who are keen to engage more actively with the causal analysis and programming activities, are requested to bring a laptop (Windows Vista/7/8/10, Mac OS X, Linux) to install R and RStudio.

  1. Install RStudio Desktop Free Edition (requires R 3.0.1+): https://www.rstudio.com/products/rstudio/download/#download
  2. If you do not have R installed on your computer, install the latest version of R: https://cran.rstudio.com/
  3. Download the tutorial materials.


  1. Roy, J. A Crash Course in Causality: Inferring Causal Effects from Observational Data. Coursera. Retrieved December 6, 2018, from https://www.coursera.org/learn/crash-course-in-causality
  2. Boyer, A., & Bonnin, G. (2016). Higher Education and the Revolution of Learning Analytics. In 2016 ICDE Presidents’ Summit (pp. 1–20).
  3. Bradshaw, J. M., Hoffman, R. R., Woods, D. D., & Johnson, M. (2013). The seven deadly myths of” autonomous systems”. IEEE Intelligent Systems, 28(3), 54-61.
  4. Ho, D., Imai, K., King, G., & Stuart, E. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15, 199-236.
  5. Iacus, S. M., King, G., Porro, G., & Katz, J. N. (2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 1–24.
  6. King, G., Lucas, C., & Nielsen, R. A. (2014). The Balance-Sample Size Frontier in Matching Methods for Causal Inference. American Journal of Political Science.
  7. King, G., Nielsen, R., Coberley, C., & Pope, J. E. (2011). Comparative Effectiveness of Matching Methods for Causal Inference. Unpublished Manuscript, 15, 1–26. http://doi.org/
  8. Olmos, A., & Govindasamy, P. (2015). Propensity Scores: A Practical Introduction Using R. Journal of MultiDisciplinary Evaluation, 11(25), 68–88.
  9. Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs. The New England Journal of Medicine, 342(25), 1887–1892.
  10. Hannan, E. L. (2008). Randomized Clinical Trials and Observational Studies: Guidelines for Assessing Respective Strengths and Limitations. JACC: Cardiovascular Interventions, 1(3), 211–217. article.
  11. Silverman, S. L. (2009). From Randomized Controlled Trials to Observational Studies. The American Journal of Medicine, 122(2), 114–120. article.
  12. King, G., & Nielsen, R. A. (2016). Why propensity score should not be used for matching, (617).
  13. Sullivan, G. M., & Feinn, R. (2012). Using Effect Size-or Why the P Value Is Not Enough. Journal of graduate medical education, 4(3), 279-82.
  14. Rosenbaum, P. R. (2015). Two R packages for sensitivity analysis in observational studies. Observational Studies, 1(1), 1-17.