T4E 2019 Workshop

learning with insight

Causal Inferencing with Observational Data
for Evidence-based Pedagogy:
Adopting AI to Education

The traditional, gold-standard randomized controlled trial is not well suited to meet the needs of educational research due to sensitive ethical issues such as student discrimination and intrusiveness [1-2]. Alternatives such as observational studies have long been considered as incapable of reliably estimating causal effects [3-4]. In the past two decades, however, the field of causality has experienced tremendous transformations. Researchers have established theories and validated methodologies to infer causality with observational data, opening new horizons for the large-scale adoption of artificial intelligence (AI) in education and the transformation of pedagogy into an evidence-based process [5-6]. Nevertheless, this workshop’s authors advocate that these new horizons are conditional on 1) open educational research [7] for the sharing, aggregation, and networking of learning data and 2) instructional designs compatible with the principles of causal analysis. It is, therefore, crucial to expose educational researchers and practitioners of tomorrow to these emerging technologies that will shape 21st-century education.


Kumar, Vivekanandan Suresh
Athabasca University, Canada

Dr. Kumar is a Professor in the School of Computing and Information Systems at Athabasca University, Canada. He holds the Natural Sciences and Engineering Research Council of Canada’s (NSERC) Discovery Grant on Anthropomorphic Pedagogical Agents, funded by the Government of Canada. His research focuses on developing anthropomorphic agents, which mimic and perfect human-like traits to better assist learners in their regulatory tasks. His research includes investigating technology-enhanced erudition methods that employ big data learning analytics, self-regulated learning, co-regulated learning, causal modeling, and machine learning to facilitate deep learning and open research. For more information, visit http://vivek.athabascau.ca.


Boulanger, David
Athabasca University, Canada

David Boulanger is a student and data scientist involved in the learning analytics research group at Athabasca University. His primary research focus is on observational study designs and the application of computational tools and machine learning algorithms in learning analytics including writing analytics.


Fraser, Shawn N.
Athabasca University, Canada

Dr. Fraser is an Associate Dean in Teaching & Learning and Associate Professor at Athabasca University, and an Adjunct Assistant Professor in Physical Education and Recreation at the University of Alberta. His research interests include understanding how stress can impact upon rehabilitation success for heart patients. He teaches research methods courses in the Faculty of Health Disciplines and is interested in interdisciplinary approaches to studying and teaching research methods and data analysis.


This workshop introduces the Calculus of Causation [6], an approach to causal inferencing with observational data based on causal diagrams and matching techniques (e.g., Coarsened Exact Matching, Mahalanobis Distance Matching) that will approximate the more powerful randomized block design rather than the completely randomized design [8-12]. The gist of the methodology consists in pruning a set of observations to approximate a randomized dataset. Data points generating most imbalance, also called bias, are removed one after another. The treatment effect can be estimated every time a data point is pruned, allowing one to observe how it changes as the dataset becomes more balanced, or as the subpopulation gets more precisely defined. The proposed approach provides non-programmer and non-statistician educational practitioners with causal insights, a key tool toward the implementation of evidence-based pedagogy. This workshop also explains the building blocks of causal diagrams and how they can be used to identify and eliminate sources of bias [14]. Participants will learn by doing through a set of nine simple programming exercises that will help them understand the impact of bias and how to handle it through d-separation.

  • Participants take home a set of frameworks, methodologies, and tools to infer causality from observational data and to publish results based on data observed in their educational settings.
  • Participants will gain a deep understanding of the role that big data and causal inferencing with observational data will play in the removal of barriers to AI adoption in education by making AI more transparent and more accountable, central to promote trust among educational stakeholders.
Workshop Format

Participants with higher technical skills will be paired with participants with less background in statistics and programming and will work together in matching data points and estimating treatment effects on simulated ground truth causal models and datasets. The presenter will also assist participants throughout the workshop and a group discussion will be held at the end of the workshop.


The workshop targets educational researchers, Masters and PhD students, data scientists, teachers, educational policy makers, and instructional designers. Some background in statistics (e.g., descriptive statistics, probability, linear regression) and research methods is an asset. Participants only need to register to this workshop session in order to engage in the interactive discussion and exercises; there is no call for papers. As many participants as possible are welcome.

Prior Experience

Five workshops/tutorials on the topic of causal inferencing with observational data applied in education were presented in past international conferences:

  • 2017 International Conference on Artificial Intelligence in Education (AIED)
    Matching Techniques: Hands-on Approach to Measuring and Modeling Educational Data
  • 2018 International Conference on Smart Learning Environments (ICSLE)
    Open Research and Observational Study for 21st Century Learning
  • 2018 International Conference on Intelligent Tutoring Systems (ITS)
    Automating Educational Research Through Learning Analytics: Data Balancing and Matching Techniques
  • 2018 International Conference on Technology for Education (T4E)
    Inferring Causal Effects from Learning Analytics: Discovering the Nature of Bias
  • 2019 International Conference on Smart Learning Environments (ICSLE)
    Inferring Causal Effects from Learning Analytics: Discovering the Nature of Bias

Tuesday, 10 December 2019

13:30 – 15:00Lunch and Poster Session 2
Industry and Workshop Demo Session
15:00 – 15:15Introduction – Presentation
15:15 – 15:30Installation of R and RStudio
15:30 – 15:45Exercise 1: Calculating a causal estimate
15:45 – 16:00Exercise 2: Discrete confounder & exact matching
16:00 – 16:15Exercise 3: Discrete confounder & data imbalance
16:15 – 16:30Exercise 4: Continuous confounder & matching technique
16:30 – 17:00Tea
17:00 – 17:15Exercise 5: Intervention (randomized experiment)
17:15 – 17:30Exercise 6: Collider
17:30 – 17:45Exercise 7: Descendant of a confounder
17:45 – 18:00Exercise 8: Descendant of a collider
18:00 – 18:15Exercise 9: Unobserved confounder & sensitivity test
18:15 – 18:30Discussion
Hardware/Software Requirements & Resources

Participants, who are keen to engage more actively with the causal analysis and programming activities, are requested to bring a laptop to install R and RStudio. An Internet connection will be required for participants to access and download the workshop materials.

  1. Install RStudio Desktop Free Edition (requires R 3.0.1+): https://www.rstudio.com/products/rstudio/download/#download
  2. If you do not have R installed on your computer, install the latest version of R: https://cran.rstudio.com/
  3. Download the workshop materials for the hands-on section.


  1. Hannan, E. L. (2008). Randomized Clinical Trials and Observational Studies: Guidelines for Assessing Respective Strengths and Limitations. JACC: Cardiovascular Interventions, 1(3), 211–217. article.
  2. Silverman, S. L. (2009). From Randomized Controlled Trials to Observational Studies. The American Journal of Medicine, 122(2), 114–120. article.
  3. Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs. The New England Journal of Medicine, 342(25), 1887–1892.
  4. Schölkopf, B., Blei, D., & Huszár, F. (2019). Panel: Causality. The Machine Learning Summer School. Stellenbosch, South Africa. https://www.youtube.com/watch?v=ynVr_zzUXtw
  5. Murphy, R. F. (2019). Artificial Intelligence Applications to Support K–12 Teachers and Teaching: A Review of Promising Applications, Challenges, and Risks. Santa Monica, CA: RAND Corporation. https://www.rand.org/pubs/perspectives/PE315.html.
  6. Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. In Basic Books.
  7. Kumar, V., Romero, J., & Romyn, D. (2016). Open Research. 8th Canadian Science Policy Conference (CSPC 2016). Retrieved from https://2016canadiansciencepolicyconfere.sched.com/event/7kp0/open-research-la-recherche-ouverte
  8. Ho, D., Imai, K., King, G., & Stuart, E. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15, 199-236.
  9. Iacus, S. M., King, G., Porro, G., & Katz, J. N. (2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 1–24.
  10. King, G., Nielsen, R., Coberley, C., & Pope, J. E. (2011). Comparative Effectiveness of Matching Methods for Causal Inference. Unpublished Manuscript, 15, 1–26. http://doi.org/
  11. King, G., & Nielsen, R. A. (2016). Why propensity score should not be used for matching, (617).
  12. Olmos, A., & Govindasamy, P. (2015). Propensity Scores: A Practical Introduction Using R. Journal of MultiDisciplinary Evaluation, 11(25), 68–88.
  13. King, G., Lucas, C., & Nielsen, R. A. (2017). The Balance-Sample Size Frontier in Matching Methods for Causal Inference. American Journal of Political Science.
  14. Roy, J. A Crash Course in Causality: Inferring Causal Effects from Observational Data. Coursera. Retrieved August 10, 2018, from https://www.coursera.org/learn/crash-course-in-causality
  15. Scheines, R. (2016). CCD Summer Short Course. Retrieved from https://www.ccd.pitt.edu/video-tutorials/