⼀⽂读懂倾向得分匹配法(PSM)举例及stata实现(⼀)
本⽂主要包括倾向匹配得分命令简介、语法格式、倾向匹配得分操作步骤思路,涉及倾向匹配得分应⽤、平衡性检验、共同取值范围检验、核密度函数图等内容。
1
命令简介
Stata does not have a built-in command for propensity score matching, a non-experimental method of sampling that produces a control group whose distribution of covariates is similar to that of the treated group. However, there are several user-written modules for this method. The following modules are among the most popular:
Stata没有⼀个内置的倾向评分匹配的命令,⼀种⾮实验性的抽样⽅法,它产⽣⼀个控制组,它的协变量分布与被处理组的分布相似。但是,这个⽅法有⼏个⽤户编写的模块。以下是最受欢迎的模块(主要有如下⼏个外部命令)psmatch2.ado
pscore.ado
nnmatch.ado
psmatch2.ado was developed by Leuven and Sianesi (2003) and pscore.ado by Becker and Ichino (2002). More recently, Abadie, Drukker, Herr, and Imbens (2004) introduced nnmatch.ado. All three modules support pair-matching as well as subclassification.
You can find these modules using the command as follows:
net search psmatch2
net search pscore
net search nnmatch
You can install these modules using the .ssc or command, for example:
ssc install psmatch2, replace
After installation, read the help files to find the correct usage, for example:
help psmatch2
上述主要介绍了如何获得PSM相关的命令,总结⼀下⽬前市⾯上⽤的较好的命令为psmatch2.
PSM 相关命令
help psmatch2
help nnmatch
help psmatch
help pscore
stata怎么发音持续获取最新的 PSM 信息和程序
持续获取最新的 PSM 信息和程序
findit propensity score
findit matching
psmatch2 is being continuously improved and developed. Make sure to keep your version up-to-date as follows ssc install psmatch2, replace
where you can check your version as follows:
which psmatch2
2
语法格式
语法格式为:
help psmatch2
psmatch2 depvar [indepvars] [ ifexp] [ inrange] [, outcome(varlist) pscore(varname) neighbor( integer) radius caliper(real) mahalanobis(varlist) ai( integer) population altvariance kernel llr kerneltype( type) bwidth(real) spline nknots( integer) common trim(real) noreplacement descending odds index logit ties quietly w(matrix) ate]
选项含义为:
depvar因变量;
indepvars表⽰协变量;
outcome(varlist)表⽰结果变量;
logit指定使⽤logit模型进⾏拟合,默认的是probit模型;
neighbor(1)指定按照1:1进⾏匹配,如果要按照1:3进⾏匹配,则设定为neighbor(3);
radius表⽰半径匹配
核匹配 (Kernel matching)
其他匹配⽅法
⼴义精确匹配(Coarsened Exact Matching) || help cem
局部线性回归匹配 (Local linear regression matching)
样条匹配 (Spline matching)
样条匹配 (Spline matching)
马⽒匹配 (Mahalanobis matching)
pstest $X, both做匹配前后的均衡性检验,理论上说此处只能对连续变量做均衡性检验,对分类变量的均衡性检验应该重新整理数据后运⽤χ2检验或者秩和检验。但此处对于分类变量也有⼀定的参考价值。
psgraph对匹配的结果进⾏图⽰。
3
案例应⽤
政策背景:国家⽀持⼯作⽰范项⽬( National Supported Work,NSW )
研究⽬的:检验接受该项⽬(培训)与不接受该项⽬(培训)对⼯资的影响。基本思想:分析接受培训组(处理组,treatment group )接受培训⾏为与不接受培训⾏为在⼯资表现上的差异。但是,现实可以观测到的是处理组接受培训的事实,⽽处理组没有接受培训会怎样是不可能观测到的,这种状态也成为反事实( counterfactual )。
匹配法就是为了解决这种不可观测事实的⽅法。在倾向得分匹配⽅法( Propensity Score Matching )中,根据处理指⽰变量将样本分为两个组,⼀是处理组,在本例中就是在 NSW 实施后接受培训的组;⼆是对照组( comparison group ),在本例中就是在 NSW 实施后不接受培训的组。倾向得分匹配⽅法的基本思想是,在处理组和对照组样本通过⼀定的⽅式匹配后,在其他条件完全相同的情况下,通过接受培训的组(处理组)与不接受培训的组(对照组)在⼯资表现上的差异来判断接受培训的⾏为与⼯资之间的因果关系。
1、⾸先进⾏数据结构查看
use"ldw_exper.dta", clear ed desc
2、描述性分析
tabulatet, summarize(re78) means standard
3、倾向匹配得分
3.1 ⾸先进⾏排序,⽣成随机数种⼦
setseed20180105//产⽣随机数种⼦ gen u=runiform sortu //排序或者 orderu
3.2 倾向匹配得分
local v1 "t" local v2 "age edu black hisp married re74 re75 u74 u75" globalx "`v1' `v2' " psmatch2 $x, out(re78) neighbor( 1) ate ties logit common // 1:1 匹配$表⽰引⽤宏变量,等价于 psmatch2 t age edu black hisp married re74 re75 u74 u75, out(re78) neighbor( 1) ate ties logit common 3.3 查看匹配后数据打开数据编辑窗⼝,会发现软件⾃动⽣成了⼏个新变量:其中_pscore是每个观测值对应的倾向值;_id是⾃动⽣成的每⼀个观测对象唯⼀的ID(事实上这列变量即是对_pscore排序);_treated表⽰某个对象是否试验组;_n1表⽰的是他被匹配到的对照对象的_id(如果是1:3匹配,还会⽣成_n2, _n3);_pdif表⽰⼀组匹配了的观察对象他们概率值的差。3.4 均衡性检验 pstest $v2, both graph3.5共同取值范围 psgraph3.6 核密度函数图 twoway(kdensity _ps if_treat== 1,legend(label( 1"Treat")))(kdensity _ps
if_treat== 0, legend(label( 2"Control"))),xtitle(Pscore > ) title( "Before Matching") . twoway(kdensity _ps if_treat==
1,legend(label( 1"Treat")))(kdensity _ps if(_weight!= 1&_weight!=.), legend(label( 2"Control"))), > xtitle(Pscore) title( "After Matching")