Causal Inference 因果推断
人工智能导论-第2章 逻辑推理4 - 因果推理
法回到该节点。有向无环图刻画了图中所有节点之间的依赖关系。
DAG 可用于描述数据的生成机制。
这样描述变量联合分布或者数据生成机制的模型,被称为 “贝叶斯网络”。
结构因果模型定义
定义 2.15
结构因果模型:结构因果模型由两组变量集合和以及一组函数组成。其中,是根据
<
′ +′
,
′ +′
>
+′
。
+ ′
辛普森悖论表明,在某些情况下,忽略潜在的“第三个变量”(本例中性别就是用药与否
和恢复率之外的第三个变量),可能会改变已有的结论,而我们常常却一无所知。
从观测结果中寻找引发结果的原因、考虑数据生成的过程,由果溯因,就是因果推理
因果推理(Causal Inference): Simpson’s Paradox (辛普森悖论)
女生
六个最大的院系中,4个院系女生录取率大于男生。
如果按照这样的分类,女生实际上比男生的录取率还高一点点。
女生更愿意申请那些竞争压力很大的院系(比如英语系),
但是男生却更愿意申请那些相对容易进的院系(比如工程学系)。
Peter J. Bickel, Eugene A. Hammel,O’Connell, J. W, Sex bias in graduate admissions: Data from
87
263
87
69
93
73
表2.4.2 以性别分组后的某组病人在
是否尝试新药以后的恢复情况
表2.4.1列出了某组病人在是否尝试新药以后的恢复情况:不用药病人的恢复率高于用药病人的恢
几种常见论证方法及举例解说
几种常见论证方法及举例解说在逻辑学和辩论中,论证是指为了支持一个主张或立场而提出的理由和证据。
论证可以通过多种方法来实现,以下将介绍几种常见的论证方法,并给出相应的例子进行解释。
1. 演绎推理法(deductive reasoning):演绎推理法是一种从普遍原理或前提出发,通过逻辑推理得出特定结论的论证方法。
常用的演绎推理有三种形式:假言推理(hypothetical syllogism)、分类推理(categorical syllogism)和假设推理(disjunctive syllogism)。
-假言推理:如果A,则B。
如果B,则C。
所以,如果A,则C。
例如:如果今天下雨,那么街道就会湿滑。
如果街道湿滑,那么开车会很危险。
所以,如果今天下雨,开车会很危险。
-分类推理:所有A都是B。
一些C是A。
所以,一些C是B。
例如:所有狗都是动物。
一些小黑是狗。
所以,一些小黑是动物。
-假设推理:或者A,或者B。
不是A。
所以,是B。
例如:要么今天下雨,要么今天晴天。
今天不下雨。
所以,今天是晴天。
2. 归纳推理法(inductive reasoning):归纳推理法是一种从具体事实或例证出发,推导出普遍结论的论证方法。
归纳推理常用的形式有:类比推理(analogy)、因果推理(causal inference)和统计推理(statistical inference)。
-类比推理:情境A和情境B在一些方面相似。
情境A有一些特点。
所以,情境B也可能有这个特点。
例如:在过去的足球比赛中,小明总是表现出色,他具有很强的进球能力。
现在他参加了一场新的足球比赛,我们可以预期他在这场比赛中也会有很好的表现。
-因果推理:事件A和事件B在时间上或空间上相关。
事件A发生之后,事件B也发生。
所以,事件A可能是导致事件B的原因。
例如:在实验中,给一组学生提供辅导课程后,他们的考试成绩显著提升。
因此,我们可以推断辅导课程对学生成绩的提升起到了积极影响。
智能运维因果推断
智能运维因果推断
智能运维因果推断(Intelligent Maintenance Causal Inference,IMCI)是一种用于智能运维的深度学习算法,它可以有效地利用大量历史数据和运维指标,从根本上提高设备性能,改善运维效率。
IMCI的主要任务是利用因果推断理论,从历史数据中抽取有用的信息,从而预测未来可能的情况,为运维决策提供支持。
在IMCI中,我们的目标是通过利用历史数据中的信息,预测未来可能的情况。
我们通过运用机器学习算法,建立预测模型,以期从数据中找到“原因”,从而找到最佳的因果解释,这就是基于扰动的因果微积分。
IMCI的关键技术包括,数据预处理,建立因果图,根据因果图建立因果模型,模型训练,模型预测,以及最终的结果可视化等。
值得注意的是,我们在利用因果图进行预测的同时,也会使用主观概率理论来优化预测结果。
这一方法能有效解决统计学方法论中的“混杂因”难题,接近于人类的智能。
IMCI的核心算法架构主要基于深度学习框架,包括深度神经网络、深度强化学习、强化学习等。
在实际应用中,我们会对模型进行优化,以提高预测精度和泛化能力。
同时,我们也会使用基于因果推断的商家经营智能诊断实践,即,混合因果网络发现新技术HCM和基于因果的深度归因技术,以期获得更优的预测结果。
IMCI在实际运维领域有广泛的应用,比如设备故障预测、性能优化、预测维护等。
此外,IMCI也可以用于设备自适应维护,帮助设备实时自动监测运行状态,根据设备性能指标调整维护计划,从而提高运维效率。
综上所述,智能运维因果推断(IMCI)是一种深度学习算法,它可以有效地利用历史数据和运维指标,从根本上提高设备性能,改善运维效率。
孟德尔随机化名词解释
孟德尔随机化名词解释孟德尔随机化名词解释孟德尔随机化•定义:孟德尔随机化(Mendelian randomization)是一种利用遗传变异作为自然随机化实验设计的方法,用于评估因果关系。
•举例:研究人员利用孟德尔随机化方法来研究饮酒习惯与心脏病之间的因果关系。
他们利用遗传变异影响一个人对酒精代谢的基因作为自然随机化,将人群分为喝酒习惯不同的两组,然后比较两组人群患心脏病的风险,从而判断饮酒是否对心脏病的发病有影响。
遗传变异•定义:遗传变异(Genetic variation)指基因或DNA序列在个体或种群间的差异,是在遗传多样性中的一种表现形式。
•举例:人类有不同基因型的血型,如A、B、AB和O型。
这些不同血型的产生是由于基因上的遗传变异导致的。
自然随机化•定义:自然随机化(Natural randomization)是指由于自然而非人为因素导致的随机分组效果,常用于随机控制试验以外的研究设计中。
•举例:研究者通过观察大型流行病爆发的地区,可以利用该地区居民在感染疾病上的差异来进行研究。
由于感染流行病没有人为干预,因此可以把这种差异视为自然随机化,从而用于评估某一因素与疾病之间的关系。
因果关系•定义:因果关系(Causal relationship)指一个事件或因素引起另一个事件或因素发生或变化的关系。
•举例:研究表明,吸烟暴露与肺癌之间存在因果关系。
吸烟是引起肺癌发生的主要原因之一,大量研究发现吸烟者患肺癌的风险远远高于非吸烟者。
评估•定义:评估(Evaluation)是指对某个对象、事件或过程进行全面或部分的审查、判断和检查,以获得有关性能、效果和价值的信息。
•举例:研究人员对锻炼与心血管疾病之间的关系进行评估。
他们收集了参与者的锻炼习惯、血液生化指标等数据,并通过统计分析来评估锻炼对心血管健康的影响。
试验设计•定义:试验设计(Experimental design)是指为了解决特定问题而制定的一系列试验计划和步骤,以达到统计推断的目的。
流行病学——2 病因与病因推断
5
(二)病因模型
1.生态学模型(ecological 1.生态学模型(ecological model) 生态学模型
是指引起某种疾病发生不可缺少的因素。 是指引起某种疾病发生不可缺少的因素。
2.促成病因 2.促成病因(contributory cause) 促成病因( cause)
是指某因素存在时可能导致某病发生的概 率增加,但该病发生时并非一定具有该因素。 率增加,但该病发生时并非一定具有该因素。
4
流行病学病因概念的优越性 流行病学病因概念的优越性:
10
(三)因果联系方式
11
二、病因研究方法
流行病学是进行病因研究不可缺少的方法。 流行病学是进行病因研究不可缺少的方法。 流行病学在提供病因线索 验证病因方面起着独 流行病学在提供病因线索和验证病因方面起着独 提供病因线索和 特的作用。 特的作用。
病因研究合理的顺序: 病因研究合理的顺序:
描述性研究 分析性研究 实验性研究
病因与病因推断
( cause of disease & causal inference )
天津医科大学公共卫生学院 流行病学教研室 齐秀英
1
主要内容
一、病因的概念 二、病因研究方法 三、因果推断
2
一、病因的概念
(一)流行病学病因
那些能使人群发病概率升高的因素,就可 那些能使人群发病概率升高的因素, 认为是病因,其中某个或多个因素不存在时, 认为是病因,其中某个或多个因素不存在时,人 群疾病频率就会下降。 群疾病频率就会下降。
流行病学正文部分-第6版第10章.病因与因果推断
第十章病因与因果推断研究结果的解说涉及到因果推断的问题。
病因的研究不仅同疾病的诊断有关,还直接关系到疾病的治疗和预防。
因此,基础医学、临床医学和预防医学均非常重视病因的研究。
流行病学研究中的病因(cause of disease)与因果推断(causal inference),后者还包括处理-效应关系,实际上是分析或实验流行病学的指导框架和评价准则,对于形成正确的因果思维和准确地理解研究结果,是至关重要的。
第一节病因的概念一、病因的定义(一)从决定论因果观到概率论因果观的发展传统的决定论因果观认为,一定的原因必然导致一定的结果。
实际上,从经验证据得出的结论只能是归纳性的,归纳性结论只能是概率性的。
而且更重要的是,客观世界本身的发展变化就是概率性的。
现代科学产生了概率论因果观或称广义因果律:原因就是使结果发生概率升高的事件或特征,即一定的原因只是可能而不是必然导致一定的结果。
(二)现代流行病学的病因定义早期的流行病学着重于研究传染病的流行问题,对病因的注意力主要放在空气、水和居住条件等环境卫生上,如Snow发现伦敦宽街霍乱流行与饮水污染有关。
在细菌学兴起以后,对病因的注意力转到特定的病原体上,如Koch发现了结核病和霍乱的病原体。
20世纪50年代以后,现代流行病学研究逐渐扩展到非传染性疾病,“病因”也就不仅仅限于传染病的病原体。
Lilienfeld(1980)从流行病学角度这样给出了病因的定义:那些能使人群发病概率升高的因素,就可认为是病因,其中某个或多个因素不存在时,人群疾病频率就会下降。
因此,现代流行病学中的病因观是符合概率论因果观的。
流行病学中的病因一般称为危险因素(risk factor),它的含义就是使疾病发生概率升高的因素,这里的危险(风险)是指不利事件发生的概率。
(三)防治效应的原因定义防治实验研究也是一种因果性研究。
研究因果关系的实验是指:在受控条件下,研究者有意改变一个或多个因素(处理),并前瞻地确定其效应的研究。
交互作用 因果推断
交互作用因果推断英文回答:Interaction and Causal Inference.Interaction effects are a fundamental concept in causal inference. They represent the effect of one variable on the relationship between two other variables. For example, the effect of age on the relationship between education and income is an interaction effect.Interaction effects can be difficult to identify and estimate. One reason for this is that they can be confounded by other variables. For example, the effect of age on the relationship between education and income could be confounded by the fact that older people are more likely to have higher levels of education and income.Another reason why interaction effects can be difficult to identify and estimate is that they can be nonlinear. Forexample, the effect of age on the relationship between education and income could be linear for younger people but nonlinear for older people.Despite the challenges, interaction effects are an important part of causal inference. They can provide important insights into the relationships between variables and can help us to make better decisions.Here are some tips for identifying and estimating interaction effects:Use a causal diagram to identify potential interaction effects.Use statistical tests to test for interaction effects.Use graphical methods to visualize interaction effects.Consider using a nonparametric approach to estimate interaction effects.中文回答:交互作用和因果推断。
super learner因果推断
超级学习者(Super Learner):因果推断1. 背景介绍超级学习者(Super Learner)是指一种强大的统计学习算法,它能够结合多种不同的机器学习方法,从而提高预测模型的准确性和稳定性。
因果推断(Causal Inference)是指通过观察数据来推断出变量之间的因果关系,而不仅仅是相关关系。
本文将探讨超级学习者在因果推断中的应用。
2. 超级学习者的特点超级学习者是一种集成学习(Ensemble Learning)算法,它可以将多种不同的机器学习算法进行组合,从而得到更加准确和稳定的预测结果。
它的特点包括:- 可以结合各种不同类型的机器学习算法,包括线性回归、决策树、支持向量机等;- 通过交叉验证等方法来选取最佳的组合算法;- 提高了模型的泛化能力,可以在不同数据集上都取得良好的预测效果。
3. 因果推断的挑战因果推断是一个重要而又复杂的问题,它面临着多种挑战,包括:- 随机化实验的难度:在实际情况中很难对所有变量进行随机化实验,因此需要寻找其他方法来进行因果推断;- 潜在的混淆变量:变量之间可能存在潜在的混淆关系,使得直接的因果推断变得困难;- 非线性关系:很多实际问题中的变量之间并不是简单的线性关系,因此需要特殊的方法来进行推断。
4. 超级学习者在因果推断中的应用超级学习者作为一种集成学习算法,可以应用于因果推断问题中,具有以下优势:- 能够充分利用多种不同类型的机器学习算法,从而更好地捕捉变量之间的复杂关系;- 可以通过交叉验证等方法来选择最佳的预测模型,提高推断的准确性和稳定性;- 可以处理非线性关系和潜在的混淆变量,从而更好地进行因果推断。
5. 案例分析通过一个简单的案例来说明超级学习者在因果推断中的应用。
假设我们希望研究吸烟与肺癌之间的因果关系,但由于伦理和实践的原因无法进行随机化实验。
我们可以利用超级学习者算法来构建预测模型,从而推断出吸烟对肺癌的影响。
通过结合多种机器学习方法,我们可以更加准确地捕捉吸烟与肺癌之间的复杂关系,从而进行有效的因果推断。
因果推断与潜变量模型
因果推断与潜变量模型因果推断(Causal Inference)是研究中的一个重要概念,用于探究一个因素对另一个因素产生的影响程度。
而潜变量模型(Latent Variable Model)则是一种数学统计方法,用于剖析背后的潜在机制和所关注的现象之间的关系。
本文将探讨因果推断与潜变量模型的关系,并介绍其在研究中的应用。
一、因果推断的基本原理因果推断是通过对比对照组和实验组之间的差异,来推断一个因素对另一个因素产生的影响。
在因果推断中,研究者需要排除与因果关系无关的干扰因素,以确保所观察到的差异是由所研究的因素引起的。
1. 随机对照试验随机对照试验是因果推断中最常用的方法之一。
在随机对照试验中,研究者将参与者随机分配到对照组和实验组,从而消除了个体差异和其他干扰因素的影响。
通过比较两组的差异,研究者可以推断出因素对研究结果的影响程度。
2. 自然实验自然实验是指利用自然界已经存在的条件进行研究。
由于无法直接控制自然条件,研究者常常利用现有的数据进行分析。
自然实验的优点是可以提供现实世界的数据,但也容易受到额外的干扰因素影响。
二、潜变量模型的概念和基本原理潜变量模型是一种用于解释背后现象的数学统计模型。
潜变量是指无法直接观察到的变量,但可以通过观察与其相关的指标来推断其存在和影响。
1. 结构方程模型结构方程模型是潜变量模型的一种常见形式,用于研究多个潜变量之间的关系。
结构方程模型基于观察到的变量与潜变量之间的关系,通过拟合模型来推断潜变量之间的关系和影响。
2. 因子分析因子分析是一种常用的潜变量模型方法,用于分析多个观察到的变量与一个或多个潜变量之间的关系。
通过因子分析,可以识别出影响观察变量的共同潜在因素,并量化它们之间的关系。
三、因果推断与潜变量模型的关系因果推断和潜变量模型在研究中通常是相辅相成的。
因果推断关注于探究因果关系,而潜变量模型则用于建模和解释这些潜在因果关系背后的机制。
使用潜变量模型可以帮助研究者解决因果推断中的难题,如潜在的干扰因素、观察不到的变量和复杂的关联关系。
因果推断的哲学问题
因果推断的哲学问题英文回答:## Philosophical Problems with Causal Inference.Causal inference is a fundamental task in manyscientific disciplines, including epidemiology, economics, and psychology. However, there are a number ofphilosophical problems that arise when trying to makecausal inferences from observational data.One problem is the problem of confounding. Confounding occurs when there is a third variable that is associatedwith both the exposure and the outcome, and that can biasthe estimate of the causal effect. For example, if we are interested in studying the effect of smoking on lung cancer, we need to take into account the fact that smokers are also more likely to be exposed to other risk factors for lung cancer, such as air pollution. If we do not control for these other risk factors, we may overestimate the effect ofsmoking on lung cancer.Another problem is the problem of selection bias. Selection bias occurs when the participants in a study are not representative of the population that we are interested in generalizing to. For example, if we are interested in studying the effect of a new drug on heart disease, we need to make sure that the participants in our study are representative of the population of people with heart disease. If the participants in our study are all young and healthy, we may underestimate the effect of the drug on heart disease.Finally, there is the problem of reverse causality. Reverse causality occurs when the outcome of interest actually causes the exposure. For example, if we are interested in studying the effect of poverty on crime, we need to take into account the fact that crime can also lead to poverty. If we do not control for this reverse causality, we may underestimate the effect of poverty on crime.These are just some of the philosophical problems thatarise when trying to make causal inferences from observational data. It is important to be aware of these problems when conducting causal inference studies, and to take steps to minimize their impact.## 中文回答:因果推断中的哲学问题。
多模态数据集的因果关系
多模态数据集的因果关系
随着计算机技术和人工智能的发展,多模态数据集的应用越来越广泛。
多模态数据指的是来自不同传感器或不同数据源的数据,如图像、音频、文本等。
这些数据之间可能存在一定的因果关系,而找到这些因
果关系对于数据分析、预测等任务具有重要意义。
多模态数据集的因果关系可以通过因果推断(causal inference)来分析。
因果推断是一种基于概率的统计方法,用于判断因果关系。
在多
模态数据集中,因果推断可以帮助我们研究不同类型数据之间的关系,如图像和文本数据之间的关系。
在因果推断中,我们需要使用因果图(causal graph)来表示不同变
量之间的因果关系。
因果图是一种图形化模型,用于表达变量之间的
依赖关系和因果关系。
在多模态数据集中,我们可以使用因果图来表
示不同类型数据之间的因果关系,如图像和文本数据之间的因果关系。
除了因果图,因果推断还需要使用因果模型来建立因果关系。
因果模
型是一种基于概率的模型,用于计算因果关系的概率。
在多模态数据
集中,我们可以使用因果模型来预测图像和文本数据之间的关系,并
根据预测结果进行数据分析和预测。
总之,多模态数据集的因果关系对于数据分析和预测具有重要意义。
通过因果推断和因果模型,我们可以找到不同类型数据之间的因果关系,并对数据进行准确的分析和预测。
因果推理检验 cit 代码
因果推理检验 cit 代码我不确定你是想找 `CIT` 方法的因果推理检验代码,还是想找 `CIT` 方法在 Python 中的代码实现。
为你分别提供相关内容,希望对你有帮助。
**1. CIT 方法因果推理检验代码**CIT(Causal Inference Test)方法是一种用于因果推断的统计方法,通过对因果假设进行检验,以确定因果关系是否存在。
具体检验流程可参考差异甲基化位点因果推断检验的4个假设条件:1. SNP 与 Phenotype 具有显著关联;2. SNP 与 Methy 具有显著关联(在校正 Phenotype 的影响之后);3. Methy 与 Phenotype 具有显著关联(在校正 SNP 的影响之后);4. SNP 与 Phenotype 相互独立(在校正 Methy 的影响之后)。
如果上述4个条件同时满足,则可以认为 SNP→Methy→Phenotype 的关系在当前实验数据下具有统计显著性。
**2. Python 实现 CIT 方法因果推断**如果你想在 Python 中实现 `CIT` 方法因果推断,可以使用 `causallearn` 库。
首先需要确保已经安装 `causallearn` 库,如果没有安装,可以使用以下命令进行安装:```pip install causallearn```以下是一个简单的示例,演示如何使用 `causallearn` 库进行因果推断:```pythonimport pandas as pdfrom causallearn.search import ConstraintBasedSearchdef run_cit(data):# 创建因果学习模型model = ConstraintBasedSearch()# 设定潜在结果变量outcome_variable = "y"# 设定潜在原因变量treatment_variable = "x"# 使用因果推断检验model.fit(data, outcome_variable, treatment_variable)# 获取因果效应估计结果effect = model.effect()return effect# 示例数据data = pd.DataFrame({"x": [0, 1, 0, 1],"y": [0, 1, 1, 0]})# 运行因果推断effect = run_cit(data)print("因果效应估计结果:", effect)```在上述示例中,我们首先导入所需的库和模块,然后定义了一个 `run_cit` 函数,该函数接受一个 `DataFrame` 类型的数据作为输入,并返回因果效应的估计结果。
人工智能开发技术的因果推断和反事实学习解析
人工智能开发技术的因果推断和反事实学习解析人工智能(Artificial Intelligence,简称AI)是计算机科学的一个领域,旨在使机器能够模拟和执行人类智能的任务。
随着技术的不断进步,人工智能的发展逐渐走向了因果推断和反事实学习解析这一方向。
在传统的人工智能开发中,算法主要以基于规则的方法为基础,依靠事先编写的规则和推断引擎来进行推理和推断。
但这种方法往往需要专家知识和大量手动编写的规则,对于复杂问题的分析和解决显得捉襟见肘。
因果推断和反事实学习解析的出现,为人工智能的发展带来了全新的思路和方法。
因果推断(Causal Inference)是指在观察到相关现象的基础上,通过分析因果关系,推测导致某个现象发生的原因。
人工智能在因果推断中的应用可以帮助我们理解和解释现象之间的因果联系,从而更好地预测和干预事件的发生。
例如,在药物研究领域,人工智能可以通过分析大量的数据,推断某种药物对特定疾病的治疗效果,并预测可能的副作用。
反事实学习解析(Counterfactual Learning to Rank)则更加关注于分析和理解与实际情况相反的假设条件下会发生的事情。
通过对已有数据进行反事实推断,人工智能可以学习到实际情况之外的结果和可能性,并进行更加全面和深入的分析。
例如,在金融领域,人工智能可以通过模拟不同的投资决策,预测可能的回报率、风险以及其他可能的结果。
因果推断和反事实学习解析的核心在于对数据的分析和建模,以及对变量之间关系的探索。
在人工智能的开发过程中,数据成为了至关重要的资源。
大量的历史数据可以为算法提供有价值的信息,帮助其理解和分析问题。
对于因果推断和反事实学习解析的应用来说,数据的选择和处理尤为重要,需要保证数据的准确性和代表性,减少因为数据偏差导致的错误结果。
此外,人工智能的发展还需要注意伦理和隐私保护的问题。
在进行因果推断和反事实学习解析时,涉及到个人隐私和敏感信息的处理。
celf++ 原理
2. 变量选择:从数据集中选择可能的因果变量,这些变量与目标变量有关。
celf++ 原理
3. 建模:使用机器学习模型(如线性回归、决策树等)建立预测模型,将因果变量作为特 征,目标变量作为输出。
4. 因果推断:通过模型预测目标变量,并根据预测结果进行因果推断。CELF++方法使用 了倾向得分匹配(propensity score matching)和工具变量回归(instrumental variable regression)等技术来减轻观测数据中的潜在偏差。
celБайду номын сангаас++ 原理
CELF++(Causal Effect Learned from Inference)是一种用于因果推断的统计方法, 旨在从观测数据中推断出因果关系。其原理基于潜在因果图(latent causal graph)的假设 ,该图表示了变量之间的因果关系。
CELF++方法的步骤如下:
5. 评估:评估因果推断的准确性和可靠性,通常使用交叉验证等技术进行模型评估。
celf++ 原理
CELF++方法的优点是能够从观测数据中推断出因果关系,而无需进行实验。然而,它也 有一些限制,如对数据质量和模型假设的敏感性,以及对因果关系的假设的限制性。
总的来说,CELF++是一种基于潜在因果图假设的统计方法,用于从观测数据中推断出因 果关系,并通过建模和推断步骤来实现这一目标。
列举6类以上的ccer方法学类别
列举6类以上的ccer方法学类别
CCER(Causal Inference in Statistics and Econometrics Research)是指因果推断在统计学和经济学研究中的应用。
CCER方法学类别包括但不限于以下六类:
1. 实验设计法:通过随机分配实验组和对照组,控制其他变量,从而确定因果关系。
2. 自然实验法:利用自然事件等外部因素的影响来识别因果关系。
3. 差分法:利用两个或多个时间点的数据比较来确定因果关系。
4. 倾向得分匹配法:根据样本特征将实验组和对照组进行匹配,以减少混杂因素的干扰。
5. 工具变量法:利用与感兴趣因素相关但不直接影响结果的工具变量来估计因果效应。
6. 断点回归设计法:利用某些触发事件的出现来确定因果关系。
除了以上方法,还有诸如双重差分法、面板数据模型等CCER方法学类别。
这些方法可以帮助研究者更准确地分析数据,排除混杂因素的干扰,从而更精确地得出因果结论。
基于遗传变异的因果推断方法
基于遗传变异的因果推断方法遗传变异(Genetic Variation)通常是指基因组中存在的个体间的遗传差异。
而在因果推断(Causal Inference)的上下文中,通常指的是通过观察数据来推断某个因果关系。
将这两者结合,基于遗传变异的因果推断方法通常涉及使用基因变异来推断某个基因与某个特定性状之间的因果关系。
以下是一些常见的基于遗传变异的因果推断方法:1. 双重差分法(Two-sample Mendelian Randomization):•原理:利用基因变异对暴露(影响性状的因素)进行“随机分配”,然后通过比较不同基因型的个体来推断暴露对性状的因果效应。
•步骤:选择一个基因与暴露相关,但与其他潜在混淆因素不相关的基因。
然后,比较不同基因型的个体在性状上的差异,消除了混淆因素的影响。
2. 嵌套式因果推断(Instrumental Variable Analysis):•原理:利用基因变异作为仪器变量,将其作为暴露的随机化工具,从而减轻混淆。
•步骤:选择一个基因变异作为仪器变量,该基因变异与暴露相关,但与性状的其他潜在影响因素不相关。
然后,使用这个仪器变量进行因果推断。
3. 拓扑学数据分析(Topological Data Analysis):•原理:利用基因变异和性状之间的拓扑结构,通过综合考虑多个基因的影响来推断因果关系。
•步骤:将基因变异和性状的关系表示为拓扑图,通过分析拓扑结构来识别可能的因果路径。
这些方法在基于遗传变异的因果推断方面提供了一些工具和框架,但在实践中需要谨慎选择仪器变量和进行适当的统计校正,以确保因果推断的可靠性。
此外,遗传变异的使用在特定场景和问题上可能受到一些假设的限制,因此研究人员需要对问题的背景和数据的特性有深入的理解。
因果推理综述——《ASurveyonCausalInference》一文的总结和梳理
因果推理综述——《ASurveyonCausalInference》⼀⽂的总结和梳理因果推理本⽂档是对《A Survey on Causal Inference》⼀⽂的总结和梳理。
简介关联与因果先有的鸡,还是先有的蛋?这⾥研究的是因果关系,因果关系与普通的关联有所区别。
不能仅仅根据观察到的两个变量之间的关联或关联来合理推断两个变量之间的因果关系。
对于两个相互关联的事件A和B,可能存在的关系A造成BB造成AA和B是共同原因的结果,但不互相引起。
其他⽤⼀个简单的例⼦来说明关联关系和因果关系之间的区别:随着冰淇淋销量的增加,溺⽔死亡的⽐率急剧上升。
如果根据关联关系来判断,冰淇淋的畅销会导致溺⽔的多发。
显然,这种结论⾮常荒谬,根据我们的常识来判断,溺⽔事件的多发是因为⽓温升⾼(游泳的⼈数激增)带来的影响,⽽冰淇凌的销量的增加也是因为天⽓炎热,所以在这⾥,⽓温是冰淇凌销量和溺⽔事件数⽬的共同原因,实际上冰淇凌和溺⽔并⽆直接的因果关系。
事实上,相关性是对称的(双向箭头),⽽因果关系是不对称的(单向箭头)因果关系Pearl在《The Book of Why: The New Science of Cause and Effect》⼀书中将因果关系分为三个层次(他称之为“因果关系之梯”)。
⾃底到顶分别是:关联、⼲预、反事实推理。
最底层的是关联(Association),也就是我们通常意义下所认识的深度学习在做的事情,通过观察到的数据找出变量之间的关联性。
这⽆法得出事件互相影响的⽅向,只知道两者相关,⽐如我们知道事件A发⽣时,事件B也发⽣,但我们并不能挖掘出,是不是因为事件A的发⽣导致了事件B的发⽣。
第⼆层级是⼲预(Intervention),也就是我们希望知道,当我们改变事件A时,事件B是否会跟着随之改变。
最⾼层级是反事实(Conterfactuals),也可以理解为“执果索因”,也就是我们希望知道,如果我们想让事件B发⽣某种变化时,我们能否通过改变事件A来实现。
时间因果推断模型
时间因果推断模型(Time-to-Event Causal Inference Models)是统计学和流行病学中用于分析时间至事件数据(time-to-event data)的模型,这类数据通常涉及到感兴趣事件(如疾病发生、死亡等)的发生时间。
因果推断是统计学中的一个重要领域,它旨在评估某个暴露(如药物、治疗或生活方式因素)对结果(如健康状态或疾病)的影响,同时控制其他混杂因素。
在时间因果推断模型中,我们通常关注的是暴露与结果之间的时间关系,并试图确定这种关系是否表明因果关系。
这类模型可以帮助研究者评估暴露的效应,即使在存在混杂因素的情况下也能得出较为可靠的结论。
常见的时间因果推断模型包括:1. Cox比例风险模型(Cox Proportional Hazards Model):这是最常用的处理时间至事件数据的模型之一。
它允许研究者评估多个暴露变量对事件发生风险的影响,同时考虑到个体的生存函数。
Cox模型假设事件的危险率(hazard rate)与时间无关,这意味着事件发生的概率在每一时刻都是恒定的。
2. 加速失效时间模型(Accelerated Failure Time Model, AFT Model):这个模型假设事件的危险率随时间而变化,但它并不直接估计危险率,而是估计失效时间的加速率。
这种模型在某些情况下更适合数据分析,尤其是当数据的生存曲线弯曲时。
3. 参数生存模型(Parametric Survival Models):这类模型通过假设生存函数遵循某个特定的参数分布来估计参数,例如Weibull分布或对数正态分布。
参数生存模型可以提供对事件发生时间分布的精确描述。
4. 半参数模型(Semiparametric Models):这些模型结合了参数模型和非参数模型的特点,例如Cox模型就是一个半参数模型,因为它不直接估计危险率,但需要假设危险率与某个基线危险率相乘。
在使用这些模型时,研究者需要考虑如何处理缺失数据、如何选择合适的模型、如何评估模型的假设是否得到满足以及如何解释模型结果等问题。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
JMLR Workshop and Conference Proceedings6:39–58NIPS2008workshop on causalityCausal InferenceJudea Pearl JUDEA@ University of California,Los AngelesComputer Science DepartmentLos Angeles,CA,90095-1596,USAEditor:Isabelle Guyon,Dominik Janzing,and Bernhard SchölkopfAbstractThis paper reviews a theory of causal inference based on the Structural Causal Model(SCM)described in(Pearl,2000a).The theory unifies the graphical,potential-outcome(Neyman-Rubin),decision analytical,and structural equation approaches to causation,and provides botha mathematical foundation and a friendly calculus for the analysis of causes and counterfac-tuals.In particular,the paper establishes a methodology for inferring(from a combination ofdata and assumptions)the answers to three types of causal queries:(1)queries about the effectof potential interventions,(2)queries about counterfactuals,and(3)queries about the direct(orindirect)effect of one event on another.Keywords:Structural equation models,confounding,graphical methods,counterfactuals,causal effects,potential-outcome.1.IntroductionThe research questions that motivate most quantitative studies in the health,social and behav-ioral sciences are not statistical but causal in nature.For example,what is the efficacy of a given drug in a given population?Whether data can prove an employer guilty of hiring discrimina-tion?What fraction of past crimes could have been avoided by a given policy?What was the cause of death of a given individual,in a specific incident?These are causal questions because they require some knowledge of the data-generating process;they cannot be computed from the data alone.Remarkably,although much of the conceptual framework and algorithmic tools needed for tackling such problems are now well established,they are hardly known to researchers in the field who could put them into practical use.Why?Solving causal problems mathematically requires certain extensions in the standard mathe-matical language of statistics,and these extensions are not generally emphasized in the main-stream literature and education.As a result,large segments of the research communityfind it hard to appreciate and benefit from the many results that causal analysis has produced in the past two decades.These results rest on advances in three areas:1.Nonparametric structural equations2.Graphical modelsc○2010J.PearlP EARL3.Symbiosis between counterfactual and graphical methods.This paper aims at making these advances more accessible to the general research commu-nity by,first,contrasting causal analysis with standard statistical analysis,second,comparing and unifying existing approaches to causal analysis,andfinally,providing a friendly formalism for counterfactual analysis,within which most(if not all)causal questions can be formulated, analyzed and resolved.We will see that,although full description of the data generating process cannot be inferred from data alone,many useful features of the process can be estimated from a combination of (1)data,(2)prior qualitative knowledge,and/or(3)experiments.Thus,the challenge of causal inference is to answer causal queries of practical interest with minimum number of assump-tions and with minimal experimentation.Following an introductory section which defines the demarcation line between associational and causal analysis,the rest of the paper will deal with the estimation of three types of causal queries:(1)queries about the effect of potential inter-ventions,(2)queries about counterfactuals(e.g.,whether event x would occur had event y been different),and(3)queries about the direct and indirect effects.2.From Associational to Causal Analysis:Distinctions and Barriers2.1The Basic Distinction:Coping With ChangeThe aim of standard statistical analysis,typified by regression,estimation,and hypothesis test-ing techniques,is to assess parameters of a distribution from samples drawn of that distribution. With the help of such parameters,one can infer associations among variables,estimate the likelihood of past and future events,as well as update the likelihood of events in light of new evidence or new measurements.These tasks are managed well by standard statistical analysis so long as experimental conditions remain the same.Causal analysis goes one step further;its aim is to infer not only the likelihood of events under static conditions,but also the dynamics of events under changing conditions,for example,changes induced by treatments or external interventions.This distinction implies that causal and associational concepts do not mix.There is nothing in the joint distribution of symptoms and diseases to tell us that curing the former would or would not cure the latter.More generally,there is nothing in a distribution function to tell us how that distribution would differ if external conditions were to change—say from observational to experimental setup—because the laws of probability theory do not dictate how one property of a distribution ought to change when another property is modified.This information must be provided by causal assumptions which identify relationships that remain invariant when external conditions change.These considerations imply that the slogan“correlation does not imply causation”can be translated into a useful principle:one cannot substantiate causal claims from associations alone, even at the population level—behind every causal conclusion there must lie some causal as-sumption that is not testable in observational studies.12.2Formulating the Basic DistinctionA useful demarcation line that makes the distinction between associational and causal concepts crisp and easy to apply,can be formulated as follows.An associational concept is any rela-tionship that can be defined in terms of a joint distribution of observed variables,and a causal 1.The methodology of“causal discovery”(Spirtes,et al.2000;Pearl2000a,chapter2)is likewise based on the causalassumption of“faithfullness”or“stability.”40C AUSAL I NFERENCEconcept is any relationship that cannot be defined from the distribution alone.Examples of as-sociational concepts are:correlation,regression,dependence,conditional independence,likeli-hood,collapsibility,propensity score,risk ratio,odd ratio,marginalization,conditionalization,“controlling for,”and so on.Examples of causal concepts are:randomization,influence,ef-fect,confounding,“holding constant,”disturbance,spurious correlation,faithfulness/stability, instrumental variables,intervention,explanation,attribution,and so on.The former can,while the latter cannot be defined in term of distribution functions.This demarcation line is extremely useful in causal analysis for it helps investigators to trace the assumptions that are needed for substantiating various types of scientific claims.Every claim invoking causal concepts must rely on some premises that invoke such concepts;it cannot be inferred from,or even defined in terms statistical associations alone.2.3Ramifications of the Basic DistinctionThis principle has far reaching consequences that are not generally recognized in the standard statistical literature.Many researchers,for example,are still convinced that confounding is solidly founded in standard,frequentist statistics,and that it can be given an associational defi-nition saying(roughly):“U is a potential confounder for examining the effect of treatment X on outcome Y when both U and X and U and Y are not independent.”That this definition and all its many variants must fail(Pearl2000a,Section6.2)2is obvious from the demarcation line above; if confounding were definable in terms of statistical associations,we would have been able to identify confounders from features of nonexperimental data,adjust for those confounders and obtain unbiased estimates of causal effects.This would have violated our golden rule:behind any causal conclusion there must be some causal assumption,untested in observational studies. Hence the definition must be false.Therefore,to the bitter disappointment of generations of epi-demiologist and social science researchers,confounding bias cannot be detected or corrected by statistical methods alone;one must make some judgmental assumptions regarding causal relationships in the problem before an adjustment(e.g.,by stratification)can safely correct for confounding bias.Another ramification of the sharp distinction between associational and causal concepts is that any mathematical approach to causal analysis must acquire new notation for expressing causal relations–probability calculus is insufficient.To illustrate,the syntax of probability calculus does not permit us to express the simple fact that“symptoms do not cause diseases”, let alone draw mathematical conclusions from such facts.All we can say is that two events are dependent—meaning that if wefind one,we can expect to encounter the other,but we cannot distinguish statistical dependence,quantified by the conditional probability P(disease|symptom) from causal dependence,for which we have no expression in standard probability calculus. Scientists seeking to express causal relationships must therefore supplement the language of probability with a vocabulary for causality,one in which the symbolic representation for the relation“symptoms cause disease”is distinct from the symbolic representation of“symptoms are associated with disease.”2.4Two Mental Barriers:Untested Assumptions and New NotationThe preceding two requirements:(1)to commence causal analysis with untested,3theoretically or judgmentally based assumptions,and(2)to extend the syntax of probability calculus,consti-2.Any intermediate variable U on a causal path from X to Y satisfies this definition,without confounding the effectof X on Y.3.By“untested”I mean untested using frequency data in nonexperimental studies.41P EARLtute the two main obstacles to the acceptance of causal analysis among statisticians and among professionals with traditional training in statistics.Associational assumptions,even untested,are testable in principle,given sufficiently large sample and sufficientlyfine measurements.Causal assumptions,in contrast,cannot be verified even in principle,unless one resorts to experimental control.This difference stands out in Bayesian analysis.Though the priors that Bayesians commonly assign to statistical parameters are untested quantities,the sensitivity to these priors tends to diminish with increasing sample size.In contrast,sensitivity to prior causal assumptions,say that treatment does not change gender,remains substantial regardless of sample size.This makes it doubly important that the notation we use for expressing causal assumptions be meaningful and unambiguous so that one can clearly judge the plausibility or inevitability of the assumptions articulated.Statisticians can no longer ignore the mental representation in which scientists store experiential knowledge,since it is this representation,and the language used to access it that determine the reliability of the judgments upon which the analysis so crucially depends.How does one recognize causal expressions in the statistical literature?Those versed in the potential-outcome notation(Neyman,1923;Rubin,1974;Holland,1988),can recognize such expressions through the subscripts that are attached to counterfactual events and variables, e.g.Y x(u)or Z xy.(Some authors use parenthetical expressions,e.g.Y(0),Y(1),Y(x,u)or Z(x,y).)The expression Y x(u),for example,stands for the value that outcome Y would take in individual u,had treatment X been at level x.If u is chosen at random,Y x is a random variable, and one can talk about the probability that Y x would attain a value y in the population,written P(Y x=y).Alternatively,Pearl(1995)used expressions of the form P(Y=y|set(X=x))or P(Y=y|do(X=x))to denote the probability(or frequency)that event(Y=y)would occur if treatment condition X=x were enforced uniformly over the population.4Still a third notation that distinguishes causal expressions is provided by graphical models,where the arrows convey causal directionality.5However,few have taken seriously the textbook requirement that any introduction of new notation must entail a systematic definition of the syntax and semantics that governs the nota-tion.Moreover,in the bulk of the statistical literature before2000,causal claims rarely appear in the mathematics.They surface only in the verbal interpretation that investigators occasion-ally attach to certain associations,and in the verbal description with which investigators justify assumptions.For example,the assumption that a covariate not be affected by a treatment,a necessary assumption for the control of confounding(Cox,1958,p.48),is expressed in plain English,not in a mathematical expression.Remarkably,though the necessity of explicit causal notation is now recognized by most leaders in thefield,the use of such notation has remained enigmatic to most rank andfile researchers,and its potentials still lay grossly underutilized in the statistics based sciences.The reason for this,can be traced to the unfriendly and ad-hoc way in which causal analysis has been presented to the research community,resting primarily on the restricted paradigm of controlled randomized trials advanced by Rubin(1974).The next section provides a conceptualization that overcomes these mental barriers;it offers both a friendly mathematical machinery for cause-effect analysis and a formal foundation for counterfactual analysis.4.Clearly,P(Y=y|do(X=x))is equivalent to P(Y x=y),This is what we normally assess in a controlled experiment,with X randomized,in which the distribution of Y is estimated for each level x of X.5.These notational clues should be useful for detecting inadequate definitions of causal concepts;any definitionof confounding,randomization or instrumental variables that is cast in standard probability expressions,void of graphs,counterfactual subscripts or do(*)operators,can safely be discarded as inadequate.42C AUSAL I NFERENCE3.Structural Causal Models (SCM)and The Language of Diagrams3.1Semantics:Causal Effects and CounterfactualsHow can one express mathematically the common understanding that symptoms do not cause diseases?The earliest attempt to formulate such relationship mathematically was made in the 1920’s by the geneticist Sewall Wright (1921),who used a combination of equations and graphs.For example,if X stands for a disease variable and Y stands for a certain symptom of the disease,Wright would write a linear equation:y =βx +u (1)where x stands for the level (or severity)of the disease,y stands for the level (or severity)of the symptom,and u stands for all factors,other than the disease in question,that could possibly affect Y .In interpreting this equation one should think of a physical process whereby Nature examines the values of x and u and,accordingly,assigns variable Y the value y =βx +u .Similarly,to “explain”the occurrence of disease X ,one could write x =v ,where V stand for all factors affecting X .To express the directionality inherent in this process,Wright augmented the equation with a diagram,later called “path diagram,”in which arrows are drawn from (perceived)causes to their (perceived)effects and,more importantly,the absence of an arrow makes the empirical claim that the value Nature assigns to one variable is not determined by the value taken by another.In Figure 1,for example,the absence of arrow from Y to X represent the claim that symptom Y is not among the factors V which affect disease X .The variables V and U are called “exogenous”;they represent observed or unobserved background factors that the modeler decides to keep unexplained,that is,factors that influence but are not influenced by the other variables (called “endogenous”)in the model.If correlation is judged possible between two exogenous variables,U and V ,it is customary to connect them by a dashed double arrow,as shown in Figure 1(b).(b)(a)x = vy = x + u βFigure 1:A simple structural equation model,and its associated diagrams.Unobserved exoge-nous variables are connected by dashed arrows.To summarize,path diagrams encode causal assumptions via missing arrows,representing claims of zero influence,and missing double arrows (e.g.,between V and U ),representing the (causal)assumption Cov (U ,V )=0.The generalization to nonlinear systems of equations is straightforward.For example,the non-parametric interpretation of the diagram of Figure 2(a)corresponds to a set of three func-tions,each corresponding to one of the observed variables:z =f Z (w )x=f X (z ,v )(2)y =f Y (x ,u )43P EARL(a)(b)Z x Figure 2:(a)The diagram associated with the structural model of equation (2).(b)The diagramassociated with the modified model,M x 0,of equation (3),representing the interven-tion do (X =x 0).where W ,V and U are assumed to be jointly independent but,otherwise,arbitrarily distributed.Remarkably,unknown to most economists and pre-2000philosophers,6structural equation models provide a formal interpretation and symbolic machinery for analyzing counterfactual relationships of the type:“Y would be y had X been x in situation U=u ,”denoted Y x (u )=y .Here U represents the vector of all exogenous variables.7The key idea is to interpret the phrase “had X been x 0”as an instruction to modify the original model and replace the equation for X by a constant x 0,yielding the sub-model.z =f Z (w )x=x 0(3)y =f Y (x ,u )the graphical description of which is shown in Figure 2(b).This replacement permits the constant x 0to differ from the actual value of X (namely f X (z ,v ))without rendering the system of equations inconsistent,thus yielding a formal interpre-tation of counterfactuals in multi-stage models,where the dependent variable in one equation may be an independent variable in another (Balke and Pearl,1994ab;Pearl,2000b).For exam-ple,to compute E (Y x 0),the expected effect of setting X to x 0,(also called the average causal effect of X on Y ,denoted E (Y |do (x 0))or,generically,E (Y |do (x ))),we solve equation (3)for Y in terms of the exogenous variables,yielding Y x 0=f Y (x 0,u ),and average over U and V .It is easy to show that in this simple system,the answer can be obtained without knowing the form of the function f Y (x ,u )or the distribution P (u ).The answer is given by:E (Y x 0)=E (Y |do (X =x 0)=E (Y |x 0)which is estimable from the observed distribution P (x ,y ,z ).This result hinges on the assumption that W ,V ,and U are mutually independent and on the topology of the graph (e.g.,that there is no direct arrow from Z to Y .)In general,it can be shown (Pearl 2000a,Chapter 3)that,whenever the graph is Markovian (i.e.,acyclic with independent exogenous variables)the post-interventional distribution P (Y =y |do (X =x ))is given by the following expression:P (Y =y |do (X =x ))=∑t P (y |t ,x )P (t )(4)6.Connections between structural equations and a restricted class of counterfactuals were recognized by Simon and Rescher (1966).These were later generalized by Balke and Pearl (1995)who used modified models to permit counterfactual conditioning on dependent variables.7.Because U =u may contain detailed information about a situation or an individual,Y x (u )is related to what philoso-phers called “token causation,”while P (Y x =y |Z =z )characterizes “Type causation,”that is,the tendency of X to influence Y in a sub-population characterized by Z =z .44C AUSAL I NFERENCEwhere T is the set of direct causes of X(also called“parents”)in the graph.Again,we see thatall factors on the right hand side are estimable from the distribution P of observed variables and,hence,the counterfactual probability P(Y x=y)is estimable with mere partial knowledge of the generating process–the topology of the graph and independence of the exogenous variables isall that is needed.When some variables in the graph(e.g.,the parents of X)are unobserved,we may not beable to learn(or“identify”as it is called)the post-intervention distribution P(y|do(x))by simpleconditioning,and more sophisticated methods would be required.Likewise,when the queryof interest involves several hypothetical worlds simultaneously,e.g.,P(Y x=y,Y x′=y′)8,the Markovian assumption may not suffice for identification and additional assumptions,touchingon the form of the data-generating functions(e.g.,monotonicity)may need to be invoked.Theseissues will be discussed in Sections3.2and5.This interpretation of counterfactuals,cast as solutions to modified systems of equations,provides the conceptual and formal link between structural equation models,used in economicsand social science and the Neyman-Rubin potential-outcome framework to be discussed in Sec-tion4.Butfirst we discuss two long-standing problems that have been completely resolved inpurely graphical terms,without delving into algebraic techniques.3.2Confounding and Causal Effect EstimationThe central target of most studies in the social and health sciences is the elucidation of cause-effect relationships among variables of interests,for example,treatments,policies,precondi-tions and outcomes.While good statisticians have always known that the elucidation of causalrelationships from observational studies must be shaped by assumptions about how the datawere generated,the relative roles of assumptions and data,and ways of using those assump-tions to eliminate confounding bias have been a subject of much controversy.The structuralframework of Section3.1puts these controversies to rest.C OVARIATE S ELECTION:T HE BACK-DOOR CRITERIONConsider an observational study where we wish tofind the effect of X on Y,for example,treatment on response,and assume that the factors deemed relevant to the problem are structuredas in Figure3;some are affecting the response,some are affecting the treatment and some are22Figure3:Graphical model illustrating the back-door criterion.Error terms are not shown ex-plicitly.affecting both treatment and response.Some of these factors may be unmeasurable,such asgenetic trait or life style,others are measurable,such as gender,age,and salary level.Ourproblem is to select a subset of these factors for measurement and adjustment,namely,that if 8.Read:The probability that Y would be y if X were x and y′if X were x′.45P EARLwe compare treated vs.untreated subjects having the same values of the selected factors,we get the correct treatment effect in that subpopulation of subjects.Such a set of factors is called a “sufficient set”or a set“appropriate for adjustment”.The problem of defining a sufficient set, let alonefinding one,has baffled epidemiologists and social science for decades(see Greenland et al.,1999;Pearl,1998,2003for review).The following criterion,named“back-door”in Pearl(1993a),settles this this problem by providing a graphical method of selecting a sufficient set of factors for adjustment.It states thata set S is appropriate for adjustment if two conditions hold:1.No element of S is a descendant of X2.The elements of S“block”all“back-door”paths from X to Y,namely all paths that endwith an arrow pointing to X.9Based on this criterion we see,for example,that the sets{Z1,Z2,Z3},{Z1,Z3},and{W2,Z3}, each is sufficient for adjustment,because each blocks all back-door paths between X and Y.The set{Z3},however,is not sufficient for adjustment because,as explained above,it does not block the path X←W1←Z1→Z3←Z2→W2→Y.The implication offinding a sufficient set S is that,stratifying on S is guaranteed to remove all confounding bias relative the causal effect of X on Y.In other words,it renders the causal effect of X on Y estimable,viaP(Y=y|do(X=x))P(Y=y|X=x,S=s)P(S=s)(5)=∑sSince all factors on the right hand side of the equation are estimable(e.g.,by regression)from the pre-interventional data,the causal effect can likewise be estimated from such data without bias.The back-door criterion allows us to write equation(5)directly,after selecting a sufficient set S from the diagram,without resorting to any algebraic manipulation.The selection criterion can be applied systematically to diagrams of any size and shape,thus freeing analysts from judging whether“X is conditionally ignorable given S,”a formidable mental task required in the potential-outcome framework(Rosenbaum and Rubin,1983).The criterion also enables the analyst to search for an optimal set of covariate—namely,a set S that minimizes measurement cost or sampling variability(Tian et al.,1998).G ENERAL CONTROL OF CONFOUNDINGAdjusting for covariates is only one of many methods that permits us to estimate causal effects in nonexperimental studies.A much more general identification criterion is provided by the following theorem:Theorem1(Tian and Pearl,2002)A sufficient condition for identifying the causal effect P(y|do(x))is that every path between X and any of its children traces at least one arrow emanating from a measured variable.10 For example,if W3is the only observed covariate in the model of Figure3,then there exists no sufficient set for adjustment(because no set of observed covariates can block the paths from 9.A set S of nodes is said to block a path p if either(i)p contains at least one arrow-emitting node that is in S,or(ii)p contains at least one collision node that is outside S and has no descendant in S.See(Pearl,2000a,pp.16-7).If S blocks all paths from X to Y it is said to“d-separate X and Y.”10.Before applying this criterion,one may delete from the causal graph all nodes that are not ancestors of Y.46C AUSAL I NFERENCEX to Y through Z3),yet P(y|do(x))can nevertheless be estimated since every path from X to W3 (the only child of X)traces either the arrow X→W3,or the arrow W3→Y,both emanating from a measured variable(W3).In this example,the variable W3acts as a“mediating instrumental variable”(Pearl,1993b;Chalak and White,2006)and yields the estimand:P(Y=y|do(X=x))=∑w3P(W3=w3|do(X=x))P(Y=y|do(W3=w3))=∑w3P(w3|x)∑x′P(y|w3,x′)P(x′)(6)More recent results extend this theorem by(1)presenting a necessary and sufficient condi-tion for identification(Shpitser and Pearl,2006),and(2)extending the condition from causal effects to any counterfactual expression(Shpitser and Pearl,2007).The corresponding unbiased estimands for these causal quantities are readable directly from the diagram.The mathematical derivation of causal effect estimands,like equations(5)and(6)is merely afirst step toward computing quantitative estimates of those effects fromfinite samples,us-ing the rich traditions of statistical estimation and machine learning.Although the estimands derived in(5)and(6)are non-parametric,this does not mean that one should refrain from us-ing parametric forms in the estimation phase of the study.For example,if the assumptions of Gaussian,zero-mean disturbances and additive interactions are deemed reasonable,then theestimand given in(6)can be converted to the product E(Y|do(x))=r W3X r YW3·X x,where r Y Z·Xis the(standardized)coefficient of Z in the regression of Y on Z and X.More sophisticated estimation techniques can be found in Rosenbaum and Rubin(1983),and Robins(1999).For example,the“propensity score”method of Rosenbaum and Rubin(1983)was found to be quite useful when the dimensionality of the adjusted covariates is high and the data is sparse(See Pearl2000a,2nd edition,2009a,pp.348–52).It should be emphasized,however,that contrary to conventional wisdom(e.g.,Rubin(2009)), propensity score methods are merely efficient estimators of the right hand side of(5);they can-not be expected to reduce bias in case the set S does not satisfy the back-door criterion(Pearl 2009abc).3.3Counterfactual Analysis in Structural ModelsNot all questions of causal character can be encoded in P(y|do(x))type expressions,in much the same way that not all causal questions can be answered from experimental studies.For example, questions of attribution(e.g.,I took an aspirin and my headache is gone,was it due to the aspirin?)or of susceptibility(e.g.,I am a healthy non-smoker,would I be as healthy had I been a smoker?)cannot be answered from experimental studies,and naturally,this kind of questions cannot be expressed in P(y|do(x))notation.11To answer such questions,a probabilistic analysis of counterfactuals is required,one dedicated to the relation“Y would be y had X been x in situation U=u,”denoted Y x(u)=y.As noted in Section3.1,the structural definition of counterfactuals involves modified mod-els,like M x0of equation(3),formed by the intervention do(X=x0)(Figure2(b)).Call thesolution of Y in model M x the potential response of Y to x,and denote it by the symbol Y x(u).11.The reason for this fundamental limitation is that no death case can be tested twice,with and without treatment.For example,if we measure equal proportions of deaths in the treatment and control groups,we cannot tell how many death cases are actually attributable to the treatment itself;it is quite possible that many of those who died under treatment would be alive if untreated and,simultaneously,many of those who survived with treatment would have died if not treated.47。