Fuzzy Regression Discontinuity Analysis in Python – An Introduction and Application in ESG Studies

In a previous article, we discussed regression discontinuity design (RDD). To distinguish it from the fuzzy regression discontinuity design (Fuzzy RDD) addressed in this article, we’ll refer to it as sharp RDD

In a sharp RDD, units (e.g., individuals, schools, districts) are assigned to treatment or control based on a cutoff score on a continuous variable. The idea is that units just above the cutoff score are very similar to units just below the cutoff score, except that they receive the treatment. Therefore, any differences in outcomes between these two groups can be attributed to the treatment.

In reality, the assignment mechanism is often not perfectly precise or deterministic. There may be measurement errors or other factors that introduce some degree of uncertainty or “fuzziness” into the assignment process. Fuzzy RDD is a statistical method used to estimate causal effects when treatment assignment is based on a continuous variable with some uncertainty or fuzziness in the assignment mechanism. It acknowledges this uncertainty by allowing for the possibility that units near the threshold score may have a non-zero probability of being assigned to either the treatment or control groups, depending on some unobserved factors. In simpler terms, it accounts for situations where some units scoring above the threshold don’t receive treatment, while some scoring below it do. 

Aktaruzzaman and Farooq (2016) investigates the impact of participation in microcredit programs on household expenditures in Bangladesh using a fuzzy RDD approach. Studying the impact of microcredit on borrowers’ expenditures is crucial for understanding its role in poverty alleviation, economic empowerment, and social welfare. The assignment to the treatment group is considered fuzzy in this study due to potential challenges in strictly adhering to the eligibility criteria set by microcredit programs. In Bangladeshi microcredit programs, there are instances where households that do not meet the specified criteria (such as owning more than 50 decimals of land) still receive credit. This deviation from the eligibility rule introduces uncertainty and “fuzziness” in the assignment process.

To address this issue, the study adopts a fuzzy RDD approach. This method allows for a more flexible treatment assignment by considering a range around the eligibility threshold rather than a strict cutoff point. By using a fuzzy RDD, the study accounts for potential violations of the eligibility criteria and captures the real-world complexities of microcredit program implementation, such as households selling land to become eligible for credit. The research provides valuable insights into the complex dynamics of how microcredit programs impact household financial behavior.

Zhuo et al. (2022) explores the impact of cross-regional environmental protection (CEP) on urban green total factor productivity (GTFP) in China’s prefecture-level cities. The implementation of CEP policies involves multiple cities or regions working together to achieve common environmental goals. A fuzzy RDD is used instead of a sharp RDD for several reasons. For example, the assignment of treatment in such cases may not be straightforward, as the impact of the policy can spill over to neighboring areas, creating a fuzzy boundary between treated and control groups. The interdependence of cities within a region can also blur the distinction between treated and control groups, making it difficult to clearly define the assignment to treatment in a cross-regional context. The study finds that CEP can bring both green and development effects, leading to reductions in energy intensity and carbon emissions while promoting innovation and industrial structure upgrading. 

The fundamental concept of fuzzy RDD involves using a probabilistic model to estimate the probability of assignment to treatment, based on the distance from the threshold score and other observable covariates. This estimated probability serves as a weight in the regression analysis of the outcome variable, allowing for a more accurate estimation of the treatment’s causal effect. By applying greater weight to observations with a higher probability of being assigned to the treatment group, and less weight to those more likely assigned to the control group, this approach adjusts for the inherent ‘fuzziness’ in the assignment process.

Fuzzy RDD analysis can be conducted in Python using the rdrobust package. Below is a general outline demonstrating how to execute fuzzy regression discontinuity in Python with the rdrobust package. Please note, this is intended as a brief illustration.

pip install rdrobust
pip install pandas
import pandas as pd
from rdrobust import rdrobust

# Illustrative data (swap out with your own dataset)
df = pd.DataFrame({
    'outcome': [5, 7, 15, 6, 9, 10, 11, 7, 6, 12, 15, 17],
    'running_variable': [1, 2, 7, 2, 4, 5, 6, 3, 4, 3, 8, 9],
    'treatment': [0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1],
    'covariate1': [10, 12, 11, 13, 14, 12, 15, 13, 14, 10, 16, 15],
    'covariate2': [20, 22, 21, 23, 24, 22, 25, 23, 21, 25, 26, 22]
})

# Define the outcome, running variable, treatment, and covariates
y = df['outcome']
x = df['running_variable']
z = df['treatment']
covariates = df[['covariate1', 'covariate2']]  

# Perform fuzzy RDD with covariates
rdrobust_results = rdrobust(y=y, x=x, fuzzy=z, covs=covariates)

# Print the results
print(rdrobust_results)

In this example, `df‘ is a DataFrame containing the following columns:

  • outcome: The outcome variable.
  • running_variable: The running variable.
  • treatment: A binary variable that equals 1 if assigned to the treatment group and 0 if assigned to the control group.
  • covariate1 and covariate2: Additional variables that could affect the outcome under investigation.

Please note that if you don’t specify a bandwidth, the package will use its default selection procedures. By default, it will automatically select optimal bandwidths based on the data, typically using methods such as cross-validation or other algorithms designed to balance bias and variance in the regression discontinuity analysis.


In conclusion, this article has provided an introductory overview of fuzzy regression discontinuity analysis in Python, highlighting its application in ESG (Environmental, Social, and Governance) studies. By leveraging the rdrobust package, researchers can effectively analyze data with fuzzy treatment assignment, allowing for more nuanced insights into the impact of interventions or policies. As ESG considerations continue to gain prominence in various fields, the integration of fuzzy RDD analysis in Python offers a valuable tool for decision-makers seeking to understand the complex relationships between interventions and outcomes within the context of sustainability and social responsibility.



References:

Aktaruzzaman, K., & Farooq, O. (2016). Impact of microcredit on borrowers’ expenditures: a fuzzy regression discontinuity design approach. Applied Economics48(38), 3685–3694. 

Zhuo, C., Xie, Y., Mao, Y., Chen, P., & Li, Y. (2022). Can cross-regional environmental protection promote urban green development: Zero-sum game or win-win choice? Energy Economics106, 105803. https://doi.org/10.1016/j.eneco.2021.105803

Like (0)