Import the following package. If the package is not installed in your R environment, install it using the `install.packages("package_name")`

command.

`library("cem")`

This exercise reuses the dataset from Exercise 2.

```
data <- read.csv("../data/exercise2.csv")
head(data, n=10)
```

```
## talent effort skill treatment
## 1 FALSE 0.6497020 0.6828916 TRUE
## 2 TRUE 0.6848052 0.9369752 TRUE
## 3 FALSE 0.7935741 0.4630684 TRUE
## 4 TRUE 0.4313167 0.4561438 FALSE
## 5 TRUE 0.4718738 0.7214476 FALSE
## 6 TRUE 0.4208080 0.2688652 FALSE
## 7 FALSE 0.5741308 0.5328195 FALSE
## 8 FALSE 0.6814296 0.6213216 TRUE
## 9 FALSE 0.5700953 0.3671640 FALSE
## 10 TRUE 0.2904192 0.7253196 FALSE
```

A metric called the \(L_1\) vector norm measures the amount of imbalance within a dataset. This imbalance is created by the bias of confounders. The \(L_1\) vector norm produces a number between 0 and 1. A value of 0 indicates that there is no bias or imbalance in the dataset, while a value of 1 denotes a totally imbalanced dataset. The following command calculates the \(L_1\) vector norm for the original dataset. Note that the variables (columns) that are not confounders must be passed as an argument to the function.

`L1.meas(data$treatment, data, drop=c('treatment', 'effort', 'skill'))`

```
##
## Multivariate Imbalance Measure: L1=0.360
## Percentage of local common support: LCS=100.0%
```

The \(L_1\) value implies that the dataset is mildly imbalanced and that source(s) of bias (i.e., talent) need to be controlled for. Remember that controlling for means doing exact matching. Hence, students will be clustered in the low-talent and high-talent groups.

```
low_talent <- data[data$talent==0,]
high_talent <- data[data$talent==1,]
```

Next, measure the level of data imbalance within each group of low-talent and high-talent students.

`L1.meas(low_talent$treatment, low_talent, drop=c('treatment', 'effort', 'skill'))`

```
##
## Multivariate Imbalance Measure: L1=0.000
## Percentage of local common support: LCS=100.0%
```

`L1.meas(high_talent$treatment, high_talent, drop=c('treatment', 'effort', 'skill'))`

```
##
## Multivariate Imbalance Measure: L1=0.000
## Percentage of local common support: LCS=100.0%
```

It can, therefore, be noticed that doing exact matching eliminates bias also called data imbalance! Note that the \(L_1\) vector norm can work with a dataset with any number of counfounders.