T4E 2018 Workshop

Inferring Causal Effects from Learning Analytics: Discovering the Nature of Bias

This workshop explores the ‘Potential-Outcome’ causal framework and its underlying assumptions: Stable Unit Treatment Value Assumption (SUTVA), Positivity, Ignorability, and Consistency. It presents guidelines on how to conduct causal analyses in observational study settings and more precisely in educational areas, where observations are collected through learning analytics. It provides a set of techniques that together allows to infer valid causal effects. Among others, this includes: 1) graphically representing variables and their interrelationships corresponding to research questions; 2) measuring and reducing the level of imbalance in a dataset using matching techniques such as Coarsened Exact Matching (CEM), Mahalanobis Distance Matching (MDM), and Propensity Score Matching (PSM) and their associated imbalance metrics, namely, L_1 vector norm, Average Mahalanobis Imbalance (AMI), and Difference in Means; 3) defining bias using a directed acyclic graph (DAG); 4) d-separating any path (association) between any pair of variables by controlling for the proper set of confounding factors; 5) checking the level of data balance achieved through d-separation using the former matching techniques and data imbalance metrics; 6) calculating propensity scores; and 7) using Inverse Probability of Treatment Weighting (IPTW; a valid approach to use propensity scores) to estimate the expected value of each potential outcome and the average causal effect within the derived pseudo-population. A dataset originating from recent educational studies will be used in the hands-on portions of the workshop.

Organizers
vive_kumar

Kumar, Vivekanandan Suresh
Athabasca University, Canada
vive@athabascau.ca

Dr. Kumar is a Professor in the School of Computing and Information Systems at Athabasca University, Canada. He holds the Natural Sciences and Engineering Research Council of Canada’s (NSERC) Discovery Grant on Anthropomorphic Pedagogical Agents, funded by the Government of Canada. His research focuses on developing anthropomorphic agents, which mimic and perfect human-like traits to better assist learners in their regulatory tasks. His research includes investigating technology-enhanced erudition methods that employ big data learning analytics, self-regulated learning, co-regulated learning, causal modeling, and machine learning to facilitate deep learning and open research. For more information, visit http://vivek.athabascau.ca.

david_boulanger

Boulanger, David
Athabasca University, Canada
dboulanger@athabascau.ca

David Boulanger is a student and data scientist involved in the learning analytics research group at Athabasca University. His primary research focus is on observational study designs and the application of computational tools and machine learning algorithms in learning analytics including writing analytics.

shawn_fraser

Fraser, Shawn N.
Athabasca University, Canada
shawnf@athabascau.ca

Dr. Fraser is an Associate Dean in Teaching & Learning and Associate Professor at Athabasca University, and an Adjunct Assistant Professor in Physical Education and Recreation at the University of Alberta. His research interests include understanding how stress can impact upon rehabilitation success for heart patients. He teaches research methods courses in the Faculty of Health Disciplines and is interested in interdisciplinary approaches to studying and teaching research methods and data analysis.

Workshop Format

Participants will be given a set of exercises to help them understand and practice individually the concepts taught during the workshop. They will be offered a research question formulated from an educational dataset that will also be supplied to them and will be asked to estimate causal effects and their effect size after having balanced data. Calculations to estimate average causal effects will be done using R and RStudio. Exercises and step-wise instructions will be given to participants throughout the workshop as each new concept will be introduced. Moreover, participants will be invited to work on these activities in pair to give them fair chances of successfully completing each step. This workshop will also be an ongoing interactive discussion.

Audience

The workshop targets educational researchers, data scientists, Masters and PhD students. This workshop will be of particular interest to those who want to get initiated to causal inferencing from observational data and seek an alternative research method to the traditional randomized experiment. Some background in statistics and programming (e.g., descriptive statistics, probability, linear regression, R) and research methods is an asset. Participants only need to register to this workshop session in order to engage in the interactive discussion; there is no call for papers.

Prior Experience

Our team has previously presented similar workshops and tutorials at the 2018 International Conference on Intelligent Tutoring Systems (ITS) “Automating Educational Research Through Learning Analytics: Data Balancing and Matching Techniques” (http://learninganalytics.ca/research/its-2018-tutorial-on-automating-educational-research/), the 2017 International Conference on Artificial Intelligence in Education (AIED) “Matching Techniques: Hands-on Approach to Measuring and Modeling Educational Data,” and the 2018 International Conference on Smart Learning Environments (ICSLE) “Open Research and Observational Study for 21st Century Learning.”

Program

Wednesday, 12 December 2018 (Session 1)

Time Event
13:15 – 14:15 Lunch
14:15 – 15:00 Workshop W02: Theoretical Section
15:00 – 15:45 Workshop W02: Hands-on Section
15:45 – 16:00 Tea
Hardware/Software Requirements & Resources

Internet connectivity will be required for participants to download the workshop materials. Participants, who are keen to engage more actively with the causal analysis and programming activities, are requested to bring a laptop (Windows Vista/7/8/10, Mac OS X, Linux) to install R and RStudio.

  1. Install RStudio Desktop Free Edition (requires R 3.0.1+): https://www.rstudio.com/products/rstudio/download/#download
  2. If you do not have R installed on your computer, install the latest version of R: https://cran.rstudio.com/
  3. Download the workshop presentation (theoretical section).
  4. Download the workshop material for the hands-on section.

References

  1. Roy, J. A Crash Course in Causality: Inferring Causal Effects from Observational Data. Coursera. Retrieved December 6, 2018, from https://www.coursera.org/learn/crash-course-in-causality
  2. Boyer, A., & Bonnin, G. (2016). Higher Education and the Revolution of Learning Analytics. In 2016 ICDE Presidents’ Summit (pp. 1–20).
  3. Bradshaw, J. M., Hoffman, R. R., Woods, D. D., & Johnson, M. (2013). The seven deadly myths of” autonomous systems”. IEEE Intelligent Systems, 28(3), 54-61.
  4. Ho, D., Imai, K., King, G., & Stuart, E. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15, 199-236.
  5. Iacus, S. M., King, G., Porro, G., & Katz, J. N. (2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 1–24.
  6. King, G., Lucas, C., & Nielsen, R. A. (2014). The Balance-Sample Size Frontier in Matching Methods for Causal Inference. American Journal of Political Science.
  7. King, G., Nielsen, R., Coberley, C., & Pope, J. E. (2011). Comparative Effectiveness of Matching Methods for Causal Inference. Unpublished Manuscript, 15, 1–26. http://doi.org/10.1.1.230.3451
  8. Olmos, A., & Govindasamy, P. (2015). Propensity Scores: A Practical Introduction Using R. Journal of MultiDisciplinary Evaluation, 11(25), 68–88.
  9. Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs. The New England Journal of Medicine, 342(25), 1887–1892.
  10. Hannan, E. L. (2008). Randomized Clinical Trials and Observational Studies: Guidelines for Assessing Respective Strengths and Limitations. JACC: Cardiovascular Interventions, 1(3), 211–217. article.
  11. Silverman, S. L. (2009). From Randomized Controlled Trials to Observational Studies. The American Journal of Medicine, 122(2), 114–120. article.
  12. King, G., & Nielsen, R. A. (2016). Why propensity score should not be used for matching, (617).
  13. Sullivan, G. M., & Feinn, R. (2012). Using Effect Size-or Why the P Value Is Not Enough. Journal of graduate medical education, 4(3), 279-82.
  14. Rosenbaum, P. R. (2015). Two R packages for sensitivity analysis in observational studies. Observational Studies, 1(1), 1-17.