Regression Discontinuity Design: A Powerful Tool for Analyzing Social and Socio-Economic Issues and How to Use it in Python

Regression discontinuity design (RDD) is a statistical method used to estimate the causal effect of a treatment or intervention by exploiting a naturally occurring discontinuity in a variable known as the “running variable”. The running variable is used to determine eligibility for treatment. Participants who score above a certain threshold on the running variable are assigned to receive the treatment, while those who score below the threshold are assigned to the control group. RDD is often used in situations where it is difficult or impossible to conduct a randomized controlled trial, such as in policy evaluations, education research, or social science research.

The basic idea behind RDD is that the treatment effect can be estimated by comparing the outcomes of individuals just above and just below the threshold. Since the individuals on either side of the threshold are similar in all other relevant ways except for the treatment, the difference in outcomes between them can be attributed to the treatment. This is because other factors that might have affected the outcome are less likely to vary systematically at the threshold.

For instance, Bowblis and Smith (2021) utilized the RDD method to examine whether occupational licensing of social services has a positive effect on nursing home quality. In 1987, the Omnibus Budget Reconciliation Act (OBRA-87) was enacted, requiring nursing homes with at least 121 beds to employ at least one full-time qualified social worker. This policy change provided an excellent opportunity for Bowblis and Smith (2021) to analyze the effects of occupational licensing on nursing home quality. By comparing changes in various quality measures between skilled nursing facilities (SNFs) with just over 121 beds and those with just below 121 beds, they could determine whether an increase in licensed social workers leads to better quality. This approach can be considered a natural experiment as it involves two groups of SNFs that are similar in all aspects except for one: the status of being above the 121 beds threshold. This allows for a direct attribution of any differences in quality change to occupational licensing. The authors found no evidence that increased licensure of social workers results in an improvement in the quality of patient care, patient quality of life, or the quality of social services provided.

Another important concept in an RDD analysis is the treatment variable, a binary variable that indicates whether an individual or unit receives the treatment. The treatment variable changes from 0 to 1 at the cutoff point. The treatment variable is then used as the independent variable in the regression model to estimate the treatment effect on the outcome variable while controlling for the running variable.

Choosing the Right Bandwidth

The bandwidth determines the size of the window around the threshold within which units are assigned to treatment or control. For example, suppose that the running variable is age, and the cutoff age is 18 years. The bandwidth could be set to 1 year, meaning that the treatment effect is estimated for individuals whose age is within 1 year of the cutoff age (i.e., ages 17 to 19). Alternatively, the bandwidth could be set to 2 years, meaning that the treatment effect is estimated for individuals whose age is within 2 years of the cutoff age (i.e., ages 16 to 20). The above-mentioned study by Bowblis and Smith (2021) used a bandwidth of 20 beds.

The choice of bandwidth is important because it can affect the accuracy of the treatment effect estimate. A bandwidth that is too narrow may not capture the full effect of the treatment, while a bandwidth that is too wide may include individuals who are not affected by the treatment, leading to biased estimates.

Some Other Academic Studies Using RDD

Sojourner and Yang (2022) conducted an RDD analysis to investigate whether unions promote the enforcement of workers’ rights to safe and healthy workplaces. The study compared two groups of establishments: those where unions narrowly won National Labor Relations Board (NLRB) union-certification elections, and those where unions narrowly lost such elections. RDD was used because it is difficult to establish causality from a statistical relationship between unionization and workplace-safety law enforcement. This is because workplaces that are more dangerous may have higher union participation and government regulator attention. Therefore, it is difficult to discern whether the union itself is making a difference or if the workplace’s inherent danger is influencing the results. To determine whether unions are helping or harming, similar workplaces in terms of safety and propensities to unionize must be compared. This would provide a better understanding of whether being in a union is truly making a difference.

Overall, the findings suggest that union certification has a beneficial effect on several aspects of Occupational Safety and Health Administration (OSHA) inspections, including higher inspection rates, a greater proportion of inspections conducted with the presence of a union representative, increased citations of violations, and more significant penalties imposed.

Pereira and Fernandez-Vazquez (2022) investigate whether electing women to government positions can lower corruption rates. They achieve this by utilizing the introduction of gender quotas in Spanish local elections as an exogenous shock to the gender composition of local governments. The quotas were gradually introduced using two population thresholds – 5,000 in 2007 and 3,000 in 2011. This approach allows for the implementation of a regression discontinuity design, which allocates observations to treatment (gender quota) based on the population size. The authors assess the causal effect of increased female representation on corruption levels by comparing changes in corruption rates between municipalities just above the thresholds and those just below.

The study’s findings indicate that the implementation of gender quotas resulted in an exogenous increase in the proportion of women elected to public office, which in turn contributed to a lasting reduction in corruption levels.

 

How to Use RDD in Python

In Python, there are several libraries that can be used to perform RDD analysis. One example is the RDD package, which provides tools for conducting such analysis. The package offers a range of models, including local linear regression, polynomial regression, and kernel regression. Additionally, it provides tools for estimating bandwidths, testing for the presence of discontinuities, and visualizing the results. RDD can be installed using pip:

pip install rdd

Here is a simplified example of how to perform a regression discontinuity analysis using the RDD package:

import rdd
import pandas as pd

# Load the data
df = pd.read_csv('data.csv')

# Specify the variables used in the estimation
x = df['running_variable']
y = df['outcome']

threshold = 121

# Specify the bandwidth
bandwidth = rdd.optimal_bandwidth(df['y'], df['x'], cut=threshold)
print("Optimal bandwidth:", bandwidth_opt)

#Limit the dataset to include only observations that fall within the threshold bandwidth.

data_new = rdd.truncated_data(df, 'x', bandwidth_opt, cut=threshold)

# Fit the model
model = rdd.rdd(data_new,'x', 'y', cut=threshold)

# Print the results
print(model.fit().summary())

Note that in the above example, you need to have a dataset with at least three variables: a treatment variable (which takes a binary value indicating whether the individual received the treatment or not), an outcome variable (which is the variable you want to predict), and a running variable (which is the variable used to identify the discontinuity). Also, in the example, the model assumes that the discontinuity occurs at 121. If your discontinuity occurs at a different value, you need to specify it in the “cut” parameter.

 

References:

Bowblis, J. R., & Smith, A. C. (2021). Occupational Licensing of Social Services and Nursing Home Quality: A Regression Discontinuity Approach. ILR Review74(1), 199–223. https://doi.org/10.1177/0019793919858332

Pereira, M. M., & Fernandez, V. P. (2022). Does Electing Women Reduce Corruption? A Regression Discontinuity Approach. Legislative Studies Quarterly, https://doi.org/10.1111/lsq.12409

Sojourner, A., & Yang, J. (2022). Effects of Union Certification on Workplace-Safety Enforcement: Regression-Discontinuity Evidence. ILR Review75(2), 373–401. https://doi.org/10.1177/0019793920953089

 

Like (1)