Causal Modeling with Observational Data
Synopsis
4. Introduction
Causal inference focuses on causal relationship identification and estimation, which is an important concern for research conducted within the fields of economics and social sciences, given that policy interventions and theories can be explained based not only on associations but also causal relations. In an ideal scenario, causal effects can be identified using experimental data, i.e., through experiments that use the concept of randomized controlled trials, so that comparison can be made without any kind of bias between groups and the control group [65]. But, in economics and most social sciences, it is not possible to perform experiments due to high costs. Thus, it is often the case that researchers have to rely on observational data, which are collected in a manner that is non-experimental, i.e., they are collected in a non-random manner, which gives rise to challenges such as confounding, selection, and reverse causality [66]. Yet, the main interest in observational data stems from the nature of the challenges, as they are a reflection of real-world data collected over a wide range of populations over long periods of time, allowing for the estimation of effects of events not amenable to manipulation. Hence, more effective methodologies are formulated for estimating causal effects from non-experimental studies.
4.1 Why Observational Data Matters
4.1.1 Limitations of Randomized Experiments
Even though randomized experiments are considered the gold standard for making causal inferences, some of the drawbacks of randomized experiments have to be addressed in detail. Ethical conditions associated with randomized experiments include the inability of researchers to randomly expose people to harmful interventions as well as to withhold beneficial interventions in domains like health, educational settings, and policy [67]. Secondly, randomized experiments can be expensive and sometimes challenging to conduct on a large scale. Even when the experiment is conducted with a degree of success, it remains a problem that such experiments often lack external validity, which means that while they are conducted in a controlled atmosphere, the results obtained in such a controlled situation cannot be generalized well to a larger or varied population, institutions, or time period [68]. In fact, while results from an experiment done with randomization assure a fair degree of internal validity, it cannot truly reflect the complex nature of a real-world scenario.
4.1.2 Advantages of Observational Data
Observational data also possess several important advantages that make it particularly valuable as a tool for economic and social science research. These include its usually large scale and real-world nature, allowing researchers to analyze behavior and outcomes as they occur in real-life settings and across varied settings and populations. Also important is the fact that observational data is usually not subject to the limitations set by time and history, which is evident in most experiments. More particularly, observational data allows researchers to analyze rare events and long-run effects such as financial crises, intergenerational mobility, and early-life events, which would otherwise be impossible or extremely difficult through random assignment-based experiments [69]. All these make observational data very central in most economic and related social science research, despite its challenges for causal inference.
4.1.3 Challenges with Observational Data
The observational data, while extremely useful, also presents important challenges in terms of constructing causal inference. An important challenge is that of confounding variables, or those that are related at least to some degree to our treatment and our outcomes, such that it becomes difficult to disentangle our desired causal effect. The use of observed treatment effects, as found generally in these types of observational data, must confront another critically important challenge: that of selection bias, where individuals or groups tend to select themselves into treatment or control groups on the basis of characteristics that are, at least partially, unobservable by our researchers. Finally, issues of measurement error or data missingness, either one or both, raise important questions about our results, casting doubt on our established empirical relationship.
4.2 Causal Graphs and Directed Acyclic Graphs (DAGs)
4.2.1 Introduction to Causal Graphs
A causal graph is a formal tool that is often employed to depict causal associations among variables within a system. The key aim for employing a causal graph is to ensure the presentation of a clear picture of how distinct variables are assumed to influence each other causally. Further, a causal graph has its variables represented by nodes, while arrows are employed to show the direction of influence from the cause to the effect [71]. Essentially, a causal graph is considered instrumental in helping authoritatively understand the potential causes of confounding when drawing causal inferences among variables within a system.
4.2.2 Directed Acyclic Graphs (DAGs)
Directed Acyclic Graphs, abbreviated as DAG, is a special form of causal graphs that is used in representing cause-effect relationships in a system in a well-structured and logically consistent manner. The term directed is used in the context of directed arrows, which reflect causal influences from a cause to its actual effect, and acyclic relates to the absence of feedback cycles in a causal framework, where a variable can neither directly nor indirectly cause itself. The reason why DAGs have significant application and usage in economics is that they explicitly discuss the causal assumptions, showing how one variable is affected through various connections. Using basic economic concepts, a DAG might have education as one variable affecting another variable, income, which then affects another variable, health (education → income → health), demonstrating direct and indirect causal effects [72].
4.2.3 Confounders, Mediators, and Colliders
In causal analysis, variables that are considered as confounders, mediators, and colliders play important roles, and it is always important to identify the roles of these variables in understanding their influence in specifying models correctly as well as in obtaining correct causal analyses. Confounders are variables that affect both the outcome and treatment and, unless properly controlled, end up creating a spurious association. Mediators, on the other hand, explain how to carry out a causal effect and thus, when controlled, they block the effect of interest. The appearance of collider bias is also possible when variables influence other two or more variables and one is controlled, which results in artificial correlation with its causes [73]. Identifying the roles played by these variables, although using causal graphs, helps one establish which variables should always be controlled and which variables should never be controlled in analysis.
4.2.4 Using DAGs for Causal Reasoning
DAGs are very powerful tools for causal reasoning because they explicitly and transparently lay out the structure of causal relationships. Mapping out variables and the directed paths between them, DAGs help researchers identify the causal paths-a treatment takes to affect an outcome as well as non-causal paths that may introduce bias. They also make visible potential sources of spurious correlations, such as confounding or collider bias, which may not be immediately obvious from data alone [74]. Using this graphical approach, a researcher determines appropriate adjustment strategies, explicates their underlying assumptions more clearly, and strengthens the validity of the causal interpretations from observational data. Key Elements in Causal Graphs and DAGs is illustrated in table 4.1.








