空间计量与stata命令
将空间权重矩阵扩大的stata代码
标题:将空间权重矩阵扩大的stata代码一、介绍空间权重矩阵是空间统计分析的重要工具,用来衡量地理空间单位之间的空间连接关系。
在进行空间计量分析时,常常需要对空间权重矩阵进行扩大,以满足实际研究需求。
本文将介绍如何使用stata代码来扩大空间权重矩阵。
二、准备工作在使用stata代码进行空间权重矩阵扩大之前,首先需要准备好相关的数据和软件环境。
确保已经安装好stata软件,并且具有要分析的空间权重矩阵数据。
三、导入数据在stata中,需要使用"import delimited"命令来导入空间权重矩阵数据。
假设要导入的数据文件名为"spatial_weights.csv",则可以使用以下命令导入数据:```import delimited "spatial_weights.csv", clear```四、扩大空间权重矩阵使用stata代码来扩大空间权重矩阵的方法比较简单,只需要使用"gen"命令来生成新的空间权重矩阵即可。
假设要将原始的空间权重矩阵扩大10倍,可以使用以下命令:```gen new_weight = old_weight * 10```这里的"old_weight"是原始的空间权重矩阵数据,"new_weight"是经过扩大后的新空间权重矩阵数据。
根据实际需求,可以将10修改为其他倍数。
五、保存数据扩大空间权重矩阵后,需要将结果保存到新的数据文件中。
可以使用"export delimited"命令来保存数据,例如:```export delimited "new_spatial_weights.csv", replace```这样就将扩大后的空间权重矩阵数据保存到了新的文件"new_spatial_weights.csv"中。
计量经济学stata命令汇总
计量经济学stata命令汇总1. 数据处理与描述性统计summarize 变量1 变量2…计算变量的均值、中位数等统计量tabulate 变量1 变量2…制表histogram 变量画单变量直方图scatter 变量1 变量2…画双变量散点图graph twoway 程序名变量1 变量2…绘制双变量图形sort 变量按照变量排序by 变量: 命令按照变量拆分数据并执行命令replace 变量=表达式替换变量中的值generate 新变量=表达式生成新变量egen 新变量=函数(变量) 生成新变量2. 回归分析regress 因变量自变量1 自变量2…普通最小二乘回归reg 相关变量,robust 异方差鲁棒性回归logit 因变量自变量1 自变量2…二元Logit模型probit 因变量自变量1 自变量2…二元Probit模型tobit 因变量自变量1 自变量2… 截尾变量(下界或上界)cens(下界或上界) 截尾Tobit模型heckman 因变量自变量1 自变量2… 难以观察到自变量矩阵决策过程变量名称=接收权值做二阶段回归Heckman选择模型pheckman 因变量自变量1 自变量2… 难以观察到自变量矩阵决策过程经验Bayes做二阶段回归Pooled Heckman选择模型xtset 变量1 变量2…指定面板数据xtreg 因变量自变量1 自变量2…, fe/be/fevd/arellano间隔估计xtlogit 因变量自变量1 自变量2…, fe面板Logit模型xtprobit 因变量自变量1 自变量2…, fe面板Probit模型3. 时间序列分析dfuller 变量单位根检验tsset 变量指定时间序列数据tsline 变量绘制时间序列图arma 阶数, lags(*laglist*) ARMA过程估计arima 阶数, lags(*laglist*) 差分阶数(*diff*) 现有模型(*model*) ARIMA模型估计arch hq/aic, lags(*laglist*) ARCH模型估计garch q=p o=r t=m, arch(q) garch(p) GARCH模型估计ivregress (2SLS)因变量自变量1(内生变量)编号=gmm/cluster(varname) 内生变量外生变量IV或2SLS回归分析4. 面板数据分析&横截面数据分析xtsum 等对面板数据的描述统计量xttest0 2个变量计算相对于H0的t值,考虑了异方差和面板数据结构(前提是两个变量符合随机效应或固定效应假设)xttobit 因变量自变量1 自变量2… 下界 cens(下界或上界)面板Tobit模型xtreg 因变量自变量1 自变量2…, fe/be/fevd/arellano面板回归模型xtlogit/xtprobit 因变量自变量1 自变量2…, fe面板分类模型5. 高级统计方法cluster 变量聚类分析pca 变量1 变量2…, components(4)主成分分析mvreg 因变量向量1 向量2…, clustervar(cluster)多元回归及聚类分析multilevel 因变量自变量1 自变量2…, mle 内部命令(通常是cov)多层线性模型分析glm 因变量自变量1 自变量2…, family(binomial) 连接函数(logit/probit) 难以观察到自变量(即随机拦截模型)其他选项广义线性模型分析heckprob/reg3 因变量自变量1 自变量2… 等随机效应模型分析。
用STATA做空间计量
用STATA做空间计量How can I calculate Moran's I in Stata?Note: The commands shown in this page are user-written Stata commands that must be downloaded. To install the package of spatial analysis tools, type findit spatgsa in the command window.Moran's I is a measure of spatial autocorrelation--how related the values of a variable are based on the locations where they were measured. Using a set of user-written Stata commands, we can calculate Moran's I in Stata. We will be using the spatwmat command to generate a matrix of weights based on the locations in our data and the spatgsa command to calculate Moran's I or other spatial autocorrelation measures.Let's look at an example. Our dataset, ozone, contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (station), the latitude and longitude ofthe station (lat and lon), and the average of the highest eight hour daily averages (av8top). This data, and other spatial datasets, can be downloaded from the University of Illinois's Spatial Analysis Lab. We can look at a summary of our location variables to see the range of locations under consideration.use/stat/stata/faq/ozone.dta, clearsummarize lat lonVariable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------lat | 32 34.0146 .2228168 33.6275 34.69012lon | 32 -117.7078 .5683853 -118.5347 -116.2339Based on the minimum and maximum values of these variables, we can calculate the greatest Euclidean distance we might measure between two points in our dataset.display sqrt((34.69012 - 33.6275)^2 + (-116.2339 - -118.5347)^2)2.5343326Knowing this maximum distance between two points in our data, we can generate a matrix based on the distances between points. In the spatwmat command, we name the weights matrix to be generated, indicate which of our variables are the x- and y-coordinate variables, and provide a range of distance values that are of interest in the band option. All of the distances are of interest in this example, so we create a band with an upper bound greater than our largest possible distance. If we did not care about distances greater than 2, we could indicate this in the band option.spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 3)The following matrix has been created:1. Inverse distance weights matrix ozoneweightsDimension: 32x32Distance band: 0 < d <= 3Friction parameter: 1Minimum distance: 0.11st quartile distance: 0.4Median distance: 0.63rd quartile distance: 1.0Maximum distance: 2.4Largest minimum distance: 0.50Smallest maximum distance: 1.23As described in the output, the command above generated a matrix with 32 rows and 32 columns because our data includes 32 locations. Each off-diagonal entry [i, j] in the matrix is equal to1/(distance between point i and point j). Thus, the matrix entries for pairs of points that are close together are higher than for pairs of points that are far apart. If you wish to look at the matrix, you can display it with the matrix list command. With our matrix of weights, we can now calculate Moran's I.spatgsa av8top, weights(ozoneweights) moranMeasures of global spatial autocorrelationWeights matrix-------------------------------------------------------------- Name: ozoneweightsType: Distance-based (inverse distance) Distance band: 0.0 < d <= 3.0Row-standardized: No-------------------------------------------------------------- Moran's I-------------------------------------------------------------- Variables | I E(I) sd(I) z p-value*--------------------+-----------------------------------------av8top | 0.248 -0.032 0.036 7.679 0.000-------------------------------------------------------------- *1-tail testBased on these results, we can reject the null hypothesis that there is zero spatial autocorrelation present in the variable av8top at alpha = .05.VariationsBinary Matrix: If there exists some threshold distance d such that pairs with distances less than d are neighbors and pairs with distances greater than d are not, you can create a binary neighbors matrix with the spatwmat command (indicating bin and setting band to have anupper bound of d) and use this weights matrix for calculating Moran's I. We could do this for d = .75:spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 .75) binThe following matrix has been created:1. Distance-based binary weights matrix ozoneweightsDimension: 32x32Distance band: 0 < d <= .75Friction parameter: 1Minimum distance: 0.11st quartile distance: 0.4Median distance: 0.63rd quartile distance: 1.0Maximum distance: 2.4Largest minimum distance: 0.50Smallest maximum distance: 1.23spatgsa av8top, weights(ozoneweights) moranMeasures of global spatial autocorrelationWeights matrix-------------------------------------------------------------- Name: ozoneweightsType: Distance-based (binary)Distance band: 0.0 < d <= 0.75Row-standardized: No--------------------------------------------------------------Moran's I-------------------------------------------------------------- Variables | I E(I) sd(I) z p-value*--------------------+-----------------------------------------av8top | 0.188 -0.032 0.033 6.762 0.000-------------------------------------------------------------- *1-tail testIn this example, the binary formulation of distance yields a similar result. We can reject the null hypothesis that there is zero spatial autocorrelation present in the variable av8top at alpha = .05.Using an existing matrix: If you have calculated a weights matrix according to some other metric than those available in spatwmat and wish to use it in calculating Moran's I, spatwmat allows you to read in a Stata dataset of the required dimensions and format it as a distance matrix that can be used by spatgsa. If altweights.dta is a dataset with 32 columns and 32 rows, it could be converted to a weighted matrix aweights to be used in spatgsa analyzing av8top:spatwmat using "C:\altweights.dta", name(aweights)How do I generate a variogram for spatial data in Stata?When analyzing geospatial data, describing the spatial pattern of a measured variable is of great importance. User written Stata commands allow you to explore such patterns. This page will use the variog and variog2 command. To install this, type findit variog in your command window.The variog command allows you to calculate and graph a variogram for regularly spaced one-dimensional data. The variog2 command allows you to calculate and graph a variogram for two-dimensional data without constraints on spacing. In both cases, the variogram illustrates how differences in a measured variable Z vary as the distances between the points at which Z is measured increase.Let's look at an example. Our dataset contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. Thedataset includes the station number (station), the latitude and longitude of the station (lat and lon), and the average of the highest eight hour daily averages (av8top). This data, and other spatial datasets, can be downloaded from the GeoDa Center for Geospatial Analysis and Computation.use /stat/stata/faq/ozone, clearclist in 1/5station av8top lat lon1. 60 7.225806 34.13583 -117.92362. 69 5.899194 34.17611 -118.31533. 724.052885 33.82361 -118.18754. 74 7.181452 34.19944 -118.53475. 756.076613 34.06694 -117.7514For the sake of an example, let's imagine that instead of specific latitude and longitude locations, the stations are evenly spaced along a single latitude. If we assume the observations are in the order in which the stations appear, we can use the variog command. In the command, we indicate the measured outcome and we will opt for the calculated values to be listed. By default, a plot of thesemi-variogram will be generated.variog av8top, list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 2.328506 31 || 2 2.615086 30 || 3 2.629862 29 || 4 2.983584 28 || 5 3.415026 27 ||----------------------------------|| 6 2.923007 26 || 7 4.104437 25 || 8 3.378503 24 || 9 3.531528 23 || 10 4.49281 22 ||----------------------------------|| 11 5.22965 21 || 12 6.657857 20 || 13 6.5462 19 || 14 6.126221 18 || 15 6.556983 17 ||----------------------------------|| 16 6.451519 16 |+----------------------------------+Next, let's generate a variogram using the latitude and longitude of the stations. For this, we will use the variog2 command. While the lag distance in variog was assumed to be the distance between each evenly spaced observation, variog2 requires the user to specify the lag distance. Let's look at a summary of our coordinates to get a sense of the distances existing in our data. summarize lat lonVariable | Obs Mean Std. Dev. MinMax-------------+--------------------------------------------------------lat | 32 34.0146 .2228168 33.6275 34.69012lon | 32 -117.7078 .5683853 -118.5347 -116.2339 Based on this, we can calculate the maximum possible distance we might see in our data.dis sqrt((33.6275 - 34.69012)^2 + (-118.5347 - -116.2339)^2)2.5343326As a starting point, we can choose a lag distance of .1 and we can examine distances up to 12 lags apart. We want to choose a lag distance that yields enough pairs in each lag to generate a variance that we trust. We might aim to have at least 15 pairs in each lag.variog2 av8top lat lon, width(.1) lags(12) list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 4.729442 6 || 2 1.8984963 31 || 3 1.3789778 41 || 4 2.7462469 50 || 5 4.3899238 49 ||----------------------------------|| 6 4.1974818 43 || 7 5.2652506 48 || 8 7.3351494 41 || 9 6.8823236 36 || 10 8.0089961 29 ||----------------------------------|| 11 6.6957223 29 || 12 7.1360346 23 |+----------------------------------+We can see that our first lag contains only 6 pairs. We might increase the size of our lags and look at fewer of them.variog2 av8top lat lon, width(.15) lags(10) list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 1.8485044 21 || 2 1.8412199 57 || 3 3.1204523 74 || 4 4.4411303 68 || 5 5.8693088 70 ||----------------------------------|| 6 7.0979125 55 || 7 7.8960334 44 || 8 6.5713557 37 || 9 4.0710902 23 || 10 3.3176015 16 |+----------------------------------+In the output, we can see lag distances up to 10*.15 = 1.5, the number of pairs that are this far apart in the dataset, and the semi-variance. As we can see from the plot, the semi-variance increases until the lag distance exceeds .15*7 = 1.05.。
利用Stata命令进行空间杜宾模型分析
利用Stata命令进行空间杜宾模型分析标题:利用Stata命令进行空间杜宾模型分析介绍:在空间计量经济学领域,空间杜宾模型(Spatial Durbin Model)被广泛应用于探究空间依赖关系对经济现象的影响。
Stata作为一款常用的统计软件,提供了许多强大的命令来进行空间经济分析。
本文将介绍如何利用Stata命令进行空间杜宾模型的分析和解释。
一、概述空间杜宾模型是对传统杜宾模型的扩展,考虑了空间上的相互依赖关系。
其基本方程可以表示为:Y = ρWy + Xβ + λU + ε其中,Y是因变量,Wy表示空间邻近权重矩阵与因变量的乘积,X是自变量矩阵,β是参数向量,U是随机误差项,ε是空间误差项。
ρ和λ分别代表空间滞后和同时方程的空间依赖系数。
二、数据准备在Stata中进行空间杜宾模型的分析,首先要准备好需要的数据。
可以使用Stata的数据管理命令进行数据导入和转换,确保数据的一致性和准确性。
还需考虑到空间邻近权重矩阵的构建,可以使用Stata的空间数据分析命令来计算邻近矩阵。
三、模型估计利用Stata进行空间杜宾模型的估计,可以使用"splm"命令。
该命令提供了多种空间经济模型的估计方法,包括最小二乘法(OLS)、广义矩估计法(GMM)等。
在使用"splm"命令时,需要设定模型的形式和变量的选择。
四、模型诊断和解释为了确保模型的有效性和准确性,需要对模型进行诊断和解释。
可以通过Stata的模型诊断命令来进行一系列统计分析,如异方差性检验、空间误差的显著性检验等。
还可以使用Stata的模型解释命令来获取模型估计结果的解释,并进行进一步的分析和讨论。
五、实证案例分析在本节中,将以一个实证案例来展示如何使用Stata命令进行空间杜宾模型的分析。
案例数据为某地区的经济数据,包括GDP、人口、贸易等变量。
我们希望研究空间依赖对GDP的影响,并通过空间杜宾模型来分析这一关系。
stata命令大全(全)[整理版]
*********面板数据计量分析与软件实现*********说明:以下do文件相当一部分内容来自于中山大学连玉君STATA教程,感谢他的贡献。
本人做了一定的修改与筛选。
*----------面板数据模型* 1.静态面板模型:FE 和RE* 2.模型选择:FE vs POLS, RE vs POLS, FE vs RE (pols混合最小二乘估计)* 3.异方差、序列相关和截面相关检验* 4.动态面板模型(DID-GMM,SYS-GMM)* 5.面板随机前沿模型* 6.面板协整分析(FMOLS,DOLS)*** 说明:1-5均用STATA软件实现, 6用GAUSS软件实现。
* 生产效率分析(尤其指TFP):数据包络分析(DEA)与随机前沿分析(SFA)*** 说明:DEA由DEAP2.1软件实现,SFA由Frontier4.1实现,尤其后者,侧重于比较C-D与Translog 生产函数,一步法与两步法的区别。
常应用于地区经济差异、FDI溢出效应(Spillovers Effect)、工业行业效率状况等。
* 空间计量分析:SLM模型与SEM模型*说明:STATA与Matlab结合使用。
常应用于空间溢出效应(R&D)、财政分权、地方政府公共行为等。
* ---------------------------------* --------一、常用的数据处理与作图-----------* ---------------------------------* 指定面板格式xtset id year (id为截面名称,year为时间名称)xtdes /*数据特征*/xtsum logy h /*数据统计特征*/sum logy h /*数据统计特征*/*添加标签或更改变量名label var h "人力资本"rename h hum*排序sort id year /*是以STATA面板数据格式出现*/sort year id /*是以DEA格式出现*/*删除个别年份或省份drop if year<1992drop if id==2 /*注意用==*/*如何得到连续year或id编号(当完成上述操作时,year或id就不连续,为形成panel格式,需要用egen命令)egen year_new=group(year)xtset id year_new**保留变量或保留观测值keep inv /*删除变量*/**或keep if year==2000**排序sort id year /*是以STATA面板数据格式出现sort year id /*是以DEA格式出现**长数据和宽数据的转换*长>>>宽数据reshape wide logy,i(id) j(year)*宽>>>长数据reshape logy,i(id) j(year)**追加数据(用于面板数据和时间序列)xtset id year*或者xtdestsappend,add(5) /表示在每个省份再追加5年,用于面板数据/tsset*或者tsdes.tsappend,add(8) /表示追加8年,用于时间序列/*方差分解,比如三个变量Y,X,Z都是面板格式的数据,且满足Y=X+Z,求方差var(Y),协方差Cov(X,Y)和Cov(Z,Y)bysort year:corr Y X Z,cov**生产虚拟变量*生成年份虚拟变量tab year,gen(yr)*生成省份虚拟变量tab id,gen(dum)**生成滞后项和差分项xtset id yeargen ylag=l.y /*产生一阶滞后项),同样可产生二阶滞后项*/gen ylag2=L2.ygen dy=D.y /*产生差分项*/*求出各省2000年以前的open inv的平均增长率collapse (mean) open inv if year<2000,by(id)变量排序,当变量太多,按规律排列。
空间计量与stata命令
n
i1 j1
nn
Wij (Xi X)2
S2
Wij
i1 j1 i1
i1 j1
n
(Xi X)2
(S2 i1
;
n
n
Xi
X i1 ) n
•
W是二进制权数。
• Moran’s I的取值一般为[-1,+1],解释同相 关系数。
• 正空间自相关:相似的观测值在空间集聚;
• 负空间自相关:相似的观测值在空间分散;
空间相关来源
4.溢出效应(spillover effect)
溢出效应是指经济活动和过程中的外部性对未参与 经济活动和过程其中的周围个体的影响。 散发有毒气体 的植物会对周围的植物产生有害的影响, 屋主拥有一座漂 亮花园也显然对周围邻居有正效应。 同样不断加强的贸 易往来所带来的经济利益对地区性国家多边联盟的形成 具有正的溢出效应。
• 一般是先从空间邻近的最基本二进制矩阵开始,逐步选择 确定空间权值矩阵。
• 关于各种权值矩阵的选择,没有现成的理论根据,一般可 考虑空间计量模型对各种空间权值矩阵的适用程度,检验 估计结果对权值矩阵的敏感性,最终的依据实际上就是结 果的客观性和科学性。
• Anselin(1999,2003)研制开发的空间统计分析软 件GeoDa095i可以直接生成邻近矩阵来测算并确定地区 之间的空间效应。
二、空间自相关
空间权重矩阵
计量经济学经常用线性模型来近似非线性模型, 即可将
近似写成
记 矩阵 的元素为 对角元素都为零。
,它的
空间自相关
一般我们无法利用容量为 的样本去估计 个参数。为了确保模型参数可识
别,我们需要对 的形式加以限制。最常 用的限制方式之一就是假设
用STATA做空间计量
H o w c a n I c a l c u l a t e M o r a n's I i n S t a t aNote: The commands shown in this page are user-written Stata commandsthat must be downloaded. To install the package of spatial analysistools, type findit spatgsa in the command window.Moran's I is a measure of spatial autocorrelation--how related thevalues of a variable are based on the locations where they were measured. Using a set of user-written Stata commands, we can calculate Moran's I in Stata. We will be using the spatwmat commandto generate a matrix of weights based on the locations in our dataand the spatgsa command to calculate Moran's I or other spatial autocorrelation measures.Let's look at an example. Our dataset, ozone, contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (station), the latitude and longitude of the station (lat and lon),and the average of the highest eight hour daily averages (av8top).This data, and other spatial datasets, can be downloaded from the University of Illinois's Spatial Analysis Lab. We can look at a summaryof our location variables to see the range of locations under consideration.summarize lat lonVariable | Obs Mean Std. Dev. MinMax-------------+--------------------------------------------------------lat | 32 .2228168lon | 32 .5683853Based on the minimum and maximum values of these variables, we can calculate the greatest Euclidean distance we might measure between two points in our dataset.display sqrt( - ^2 + - ^2)Knowing this maximum distance between two points in our data, we can generate a matrix based on the distances between points. In the spatwmat command, we name the weights matrix to be generated, indicate which of our variables are the x- and y-coordinate variables, and provide a range of distance values that are of interest in the band option. All of the distances are of interest in this example, so we create a band with an upper bound greater than our largest possible distance. If we did not care about distances greater than 2, we could indicate this in the band option.spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 3) The following matrix has been created:1. Inverse distance weights matrix ozoneweightsDimension: 32x32Distance band: 0 < d <= 3Friction parameter: 1Minimum distance:1st quartile distance:Median distance:3rd quartile distance:Maximum distance:Largest minimum distance:Smallest maximum distance:As described in the output, the command above generated a matrix with 32 rows and 32 columns because our data includes 32 locations. Each off-diagonal entry [i, j] in the matrix is equal to 1/(distance between point i and point j). Thus, the matrix entries for pairs of points that are close together are higher than for pairs of points that are far apart. If you wish to look at the matrix, you can display it with the matrix list command. With our matrix of weights, we can now calculate Moran's I.spatgsa av8top, weights(ozoneweights) moranMeasures of global spatial autocorrelationWeights matrix--------------------------------------------------------------Name: ozoneweightsType: Distance-based (inverse distance)Distance band: < d <=Row-standardized: No-------------------------------------------------------------- Moran's I--------------------------------------------------------------Variables | I E(I) sd(I) z p-value*--------------------+-----------------------------------------av8top |--------------------------------------------------------------*1-tail testBased on these results, we can reject the null hypothesis that there is zero spatial autocorrelation present in the variable av8top at alpha = .05.VariationsBinary Matrix: If there exists some threshold distance d such that pairs with distances less than d are neighbors and pairs with distances greater than d are not, you can create a binary neighbors matrix with the spatwmat command (indicating bin and setting band to have an upper bound of d) and use this weights matrix for calculating Moran's I. We could do this for d = .75:spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 .75) bin The following matrix has been created:1. Distance-based binary weights matrix ozoneweightsDimension: 32x32Distance band: 0 < d <= .75Friction parameter: 1Minimum distance:1st quartile distance:Median distance:3rd quartile distance:Maximum distance:Largest minimum distance:Smallest maximum distance:spatgsa av8top, weights(ozoneweights) moranMeasures of global spatial autocorrelationWeights matrix-------------------------------------------------------------- Name: ozoneweightsType: Distance-based (binary)Distance band: < d <=Row-standardized: No--------------------------------------------------------------Moran's I--------------------------------------------------------------Variables | I E(I) sd(I) z p-value*--------------------+-----------------------------------------av8top |--------------------------------------------------------------*1-tail testIn this example, the binary formulation of distance yields a similar result. We can reject the null hypothesis that there is zero spatial autocorrelation present in the variable av8top at alpha = .05. Using an existing matrix: If you have calculated a weights matrix according to some other metric than those available in spatwmat and wish to use it in calculating Moran's I, spatwmat allows you to read in a Stata dataset of the required dimensions and format it as a distance matrix that can be used by spatgsa. If is a dataset with 32 columns and 32 rows, it could be converted to a weighted matrix aweights to be used in spatgsa analyzing av8top:spatwmat using "C:\", name(aweights)How do I generate a variogram for spatial data in StataWhen analyzing geospatial data, describing the spatial pattern of a measured variable is of great importance. User written Stata commands allow you to explore such patterns. This page will use the variog and variog2 command. To install this, type findit variog in your command window.The variog command allows you to calculate and graph a variogram for regularly spacedone-dimensional data. The variog2 command allows you to calculate and graph a variogram for two-dimensional data without constraints on spacing. In both cases, the variogram illustrates how differences in a measured variable Z vary as the distances between the points at which Z is measured increase.Let's look at an example. Our dataset contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (station), the latitude and longitude of the station (lat and lon), and the average of the highest eight hour daily averages (av8top). This data, and other spatial datasets, can be downloaded from the .clist in 1/5station av8top lat lon1. 602. 693. 724. 745. 75For the sake of an example, let's imagine that instead of specific latitude and longitude locations, the stations are evenly spaced along a single latitude. If we assume the observations are in the order in which the stations appear, we can use the variog command. In the command, we indicate the measured outcome and we will opt for the calculated values to be listed. By default, a plot of the semi-variogram will be generated.variog av8top, list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 31 || 2 30 || 3 29 || 4 28 || 5 27 ||----------------------------------|| 6 26 || 7 25 || 8 24 || 9 23 || 10 22 ||----------------------------------|| 11 21 || 12 20 || 13 19 || 14 18 || 15 17 ||----------------------------------|| 16 16 |+----------------------------------+Next, let's generate a variogram using the latitude and longitude of the stations. For this, we will use the variog2 command. While the lag distance in variog was assumed to be the distance between each evenly spaced observation, variog2 requires the user to specify the lag distance. Let's look at a summary of our coordinates to get a sense of the distances existing in our data.summarize lat lonVariable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------lat | 32 .2228168lon | 32 .5683853Based on this, we can calculate the maximum possible distance we might see in our data.dis sqrt( - ^2 + - ^2)As a starting point, we can choose a lag distance of .1 and we can examine distances up to 12 lags apart. We want to choose a lag distance that yields enough pairs in each lag to generate a variance that we trust. We might aim to have at least 15 pairs in each lag.variog2 av8top lat lon, width(.1) lags(12) list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 6 || 2 31 || 3 41 || 4 50 || 5 49 ||----------------------------------|| 6 43 || 7 48 || 8 41 || 9 36 || 10 29 ||----------------------------------|| 11 29 || 12 23 |+----------------------------------+We can see that our first lag contains only 6 pairs. We might increase the size of our lags and look at fewer of them.variog2 av8top lat lon, width(.15) lags(10) list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 21 || 2 57 || 3 74 || 4 68 || 5 70 ||----------------------------------|| 6 55 || 7 44 || 8 37 || 9 23 || 10 16 |+----------------------------------+In the output, we can see lag distances up to 10*.15 = , the number of pairs that are this far apart in the dataset, and the semi-variance. As we can see from the plot, the semi-variance increases until the lag distance exceeds .15*7 = .。
用STATA做空间计量
用S T A T A做空间计量文档编制序号:[KKIDT-LLE0828-LLETD298-POI08]H o w c a n I c a l c u l a t e M o r a n's I i n S t a t a Note: The commands shown in this page are user-written Stata commands that must be downloaded. To install the package of spatial analysis tools, type findit spatgsa in the command window.Moran's I is a measure of spatial autocorrelation--how related the values of a variable are based on the locations where they were measured. Using a set of user-written Stata commands, we can calculate Moran's I in Stata. We will be using the spatwmat command to generate a matrix of weights based on the locations in our data and the spatgsa command to calculate Moran's I or other spatial autocorrelation measures.Let's look at an example. Our dataset, ozone, contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (station), the latitude and longitude of the station (lat and lon), and the average of the highest eight hour daily averages (av8top). This data, and other spatial datasets, can be downloaded from the University of Illinois's Spatial Analysis Lab. We can look at a summary of our location variables to see the range of locations under consideration.summarize lat lonVariable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------lat | 32 .2228168lon | 32 .5683853Based on the minimum and maximum values of these variables, we can calculate the greatest Euclidean distance we might measure between two points in our dataset.display sqrt( - ^2 + - ^2)Knowing this maximum distance between two points in our data, we can generate a matrix based on the distances between points. In the spatwmat command, we name the weights matrix to be generated, indicate which of our variables are the x- and y-coordinate variables, and provide a range of distance values that are of interest in the band option. All of the distances are of interest in this example, so we create a band with an upper bound greater than our largest possible distance. If we did not care about distances greater than 2, we could indicate this in the band option.spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 3)The following matrix has been created:1. Inverse distance weights matrix ozoneweightsDimension: 32x32Distance band: 0 < d <= 3Friction parameter: 1Minimum distance:1st quartile distance:Median distance:3rd quartile distance:Maximum distance:Largest minimum distance:Smallest maximum distance:As described in the output, the command above generated a matrix with 32 rows and 32 columns because our data includes 32 locations. Each off-diagonal entry [i, j] in the matrix is equal to 1/(distance between point i and point j). Thus, the matrix entries for pairs of points that are close together are higher than for pairs of points that are far apart. If you wish to look at the matrix, you can display it with the matrix list command. With our matrix of weights, we can now calculate Moran's I.spatgsa av8top, weights(ozoneweights) moranMeasures of global spatial autocorrelationWeights matrix--------------------------------------------------------------Name: ozoneweightsType: Distance-based (inverse distance)Distance band: < d <=Row-standardized: No--------------------------------------------------------------Moran's I--------------------------------------------------------------Variables | I E(I) sd(I) z p-value*--------------------+-----------------------------------------av8top |--------------------------------------------------------------*1-tail testBased on these results, we can reject the null hypothesis that there is zero spatial autocorrelation present in the variable av8top at alpha = .05.VariationsBinary Matrix: If there exists some threshold distance d such that pairs with distances less than d are neighbors and pairs with distances greater than d are not, you can create a binary neighbors matrix with the spatwmat command (indicating bin and setting band to have an upper bound of d) and use this weights matrix for calculating Moran's I. We could do this for d = .75:spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 .75) binThe following matrix has been created:1. Distance-based binary weights matrix ozoneweightsDimension: 32x32Distance band: 0 < d <= .75Friction parameter: 1Minimum distance:1st quartile distance:Median distance:3rd quartile distance:Maximum distance:Largest minimum distance:Smallest maximum distance:spatgsa av8top, weights(ozoneweights) moranMeasures of global spatial autocorrelationWeights matrix--------------------------------------------------------------Name: ozoneweightsType: Distance-based (binary)Distance band: < d <=Row-standardized: No--------------------------------------------------------------Moran's I--------------------------------------------------------------Variables | I E(I) sd(I) z p-value*--------------------+-----------------------------------------av8top |--------------------------------------------------------------*1-tail testIn this example, the binary formulation of distance yields a similar result. We can reject the null hypothesis that there is zero spatial autocorrelation present in the variable av8top at alpha = .05.Using an existing matrix: If you have calculated a weights matrix according to some other metric than those available in spatwmat and wish to use it in calculating Moran's I, spatwmat allows you to read in a Stata dataset of the required dimensions and format it as a distance matrix that can be used by spatgsa. If is a dataset with 32 columns and 32 rows, it could be converted to a weighted matrix aweights to be used in spatgsa analyzing av8top:spatwmat using "C:\", name(aweights)How do I generate a variogram for spatial data in StataWhen analyzing geospatial data, describing the spatial pattern of a measured variable is of great importance. User written Stata commands allow you to explore such patterns. This page will use the variog and variog2 command. To install this, type findit variog in your command window.The variog command allows you to calculate and graph a variogram for regularly spaced one-dimensional data. The variog2 command allows you to calculate and graph a variogram for two-dimensional data without constraints on spacing. In both cases, the variogram illustrates how differences in a measured variable Z vary as the distances between the points at which Z is measured increase.Let's look at an example. Our dataset contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (station), the latitude and longitude of the station (lat and lon), and the average of the highest eight hour daily averages (av8top). This data, and other spatial datasets, can be downloaded from the .clist in 1/5station av8top lat lon1. 602. 693. 724. 745. 75For the sake of an example, let's imagine that instead of specific latitude and longitude locations, the stations are evenly spaced along a single latitude. If we assume the observations are in the order in which the stations appear, we can use the variog command. In the command, we indicate the measured outcome and we will opt for the calculated values to be listed. By default, a plot of the semi-variogram will be generated.variog av8top, list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 31 || 2 30 || 3 29 || 4 28 || 5 27 ||----------------------------------|| 6 26 || 7 25 || 8 24 || 9 23 || 10 22 ||----------------------------------|| 11 21 || 12 20 || 13 19 || 14 18 || 15 17 ||----------------------------------|| 16 16 |+----------------------------------+Next, let's generate a variogram using the latitude and longitude of the stations. For this, we will use the variog2 command. While the lag distance in variog was assumed to be the distance between each evenly spaced observation, variog2 requires the user to specify the lag distance. Let's look at a summary of our coordinates to get a sense of the distances existing in our data.summarize lat lonVariable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------lat | 32 .2228168lon | 32 .5683853Based on this, we can calculate the maximum possible distance we might see in our data.dis sqrt( - ^2 + - ^2)As a starting point, we can choose a lag distance of .1 and we can examine distances up to 12 lags apart. We want to choose a lag distance that yields enough pairs in each lag to generate a variance that we trust. We might aim to have at least 15 pairs in each lag.variog2 av8top lat lon, width(.1) lags(12) list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 6 || 2 31 || 3 41 || 4 50 || 5 49 ||----------------------------------|| 6 43 || 7 48 || 8 41 || 9 36 || 10 29 ||----------------------------------|| 11 29 || 12 23 |+----------------------------------+We can see that our first lag contains only 6 pairs. We might increase the size of our lags and look at fewer of them.variog2 av8top lat lon, width(.15) lags(10) list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 21 || 2 57 || 3 74 || 4 68 || 5 70 ||----------------------------------|| 6 55 || 7 44 || 8 37 || 9 23 || 10 16 |+----------------------------------+In the output, we can see lag distances up to 10*.15 = , the number of pairs that are this far apart in the dataset, and the semi-variance. As we can see from the plot, the semi-variance increases until the lag distance exceeds .15*7 = .。
计量基础与stata应用
计量基础与stata应用
计量经济学是经济学的一个重要分支,它使用数学、统计学和经济学原理来分析和预测经济现象。
在计量经济学中,计量基础是非常重要的一部分,它涉及到如何选择合适的计量方法和模型,以及如何评估模型的可靠性和准确性。
在Stata中应用计量经济学方法时,需要注意以下几点:
数据准备:在开始分析之前,需要准备数据。
Stata提供了各种数据管理功能,如数据导入、清理、转换和统计分析等。
模型选择:根据研究问题和数据特征选择合适的计量模型。
例如,线性回归模型、逻辑斯蒂回归模型、时间序列模型等。
估计模型参数:使用Stata提供的命令和函数来估计模型的参数。
Stata提供了各种估计方法,如最小二乘法、最大似然估计法等。
模型评估:在模型估计完成后,需要对模型进行评估。
可以使用各种统计量来评估模型的可靠性,如R方、调整R方、残差图和诊断检验等。
结果解释:根据估计的参数和评估结果,解释和讨论计量经济学模型的结论。
总之,计量基础在Stata应用中非常重要。
在应用计量经济学方法时,需要注意数据准备、模型选择、参数估计、模型评估和结果解释等方面。
同时,要理解计量经济学的基本原理和假设,以及它们对估计方法和模型选择的影响。
只有掌握了计量基础,才能更好地应用Stata等统计软件进行经济分析和预测。
用STATA做空间计量
How can I calculate Moran's I in Stata?Note: The commands shown in this page are user-written Stata commands that must be downloaded. To install the package of spatial analysis tools, type findit spatgsa in the command window.Moran's I is a measure of spatial autocorrelation--how related the values of a variable are based on the locations where they were measured. Using a set of user-written Stata commands, we can calculate Moran's I in Stata. We will be using the spatwmat command to generate a matrix of weights based on the locations in our data and the spatgsa command to calculate Moran's I or other spatial autocorrelation measures.Let's look at an example. Our dataset, ozone, contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (station), the latitude and longitude of the station (lat and lon), and the average of the highest eight hour daily averages (av8top). This data, and other spatial datasets, can be downloaded from the University of Illinois's Spatial Analysis Lab. We can look at a summary of our location variables to see the range of locations under consideration.use , clearsummarize lat lonV ariable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------lat | 32 34.0146 .2228168 33.6275 34.69012lon | 32 -117.7078 .5683853 -118.5347 -116.2339Based on the minimum and maximum values of these variables, we can calculate the greatest Euclidean distance we might measure between two points in our dataset.display sqrt((34.69012 - 33.6275)^2 + (-116.2339 - -118.5347)^2)2.5343326Knowing this maximum distance between two points in our data, we can generate a matrix based on the distances between points. In the spatwmat command, we name the weights matrix to be generated, indicate which of our variables are the x- and y-coordinate variables, and provide a range of distance values that are of interest in the band option. All of the distances are of interest in this example, so we create a band with an upper bound greater than our largest possible distance. If we did not care about distances greater than 2, we could indicate this in the band option.spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 3)The following matrix has been created:1. Inverse distance weights matrix ozoneweightsDimension: 32x32Distance band: 0 < d <= 3Friction parameter: 1Minimum distance: 0.11st quartile distance: 0.4Median distance: 0.63rd quartile distance: 1.0Maximum distance: 2.4Largest minimum distance: 0.50Smallest maximum distance: 1.23As described in the output, the command above generated a matrix with 32 rows and 32 columns because our data includes 32 locations. Each off-diagonal entry [i, j] in the matrix is equal to 1/(distance between point i and point j). Thus, the matrix entries for pairs of points that are close together are higher than for pairs of points that are far apart. If you wish to look at the matrix, you can display it with the matrix list command. With our matrix of weights, we can now calculate Moran's I.spatgsa av8top, weights(ozoneweights) moranMeasures of global spatial autocorrelationWeights matrix--------------------------------------------------------------Name: ozoneweightsType: Distance-based (inverse distance)Distance band: 0.0 < d <= 3.0Row-standardized: No--------------------------------------------------------------Moran's I--------------------------------------------------------------Variables | I E(I) sd(I) z p-value*--------------------+-----------------------------------------av8top | 0.248 -0.032 0.036 7.679 0.000--------------------------------------------------------------*1-tail testBased on these results, we can reject the null hypothesis that there is zero spatial autocorrelation present in the variable av8top at alpha = .05.VariationsBinary Matrix: If there exists some threshold distance d such that pairs with distances less than dare neighbors and pairs with distances greater than d are not, you can create a binary neighbors matrix with the spatwmat command (indicating bin and setting band to have an upper bound of d) and use this weights matrix for calculating Moran's I. We could do this for d = .75:spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 .75) binThe following matrix has been created:1. Distance-based binary weights matrix ozoneweightsDimension: 32x32Distance band: 0 < d <= .75Friction parameter: 1Minimum distance: 0.11st quartile distance: 0.4Median distance: 0.63rd quartile distance: 1.0Maximum distance: 2.4Largest minimum distance: 0.50Smallest maximum distance: 1.23spatgsa av8top, weights(ozoneweights) moranMeasures of global spatial autocorrelationWeights matrix--------------------------------------------------------------Name: ozoneweightsType: Distance-based (binary)Distance band: 0.0 < d <= 0.75Row-standardized: No--------------------------------------------------------------Moran's I--------------------------------------------------------------Variables | I E(I) sd(I) z p-value*--------------------+-----------------------------------------av8top | 0.188 -0.032 0.033 6.762 0.000--------------------------------------------------------------*1-tail testIn this example, the binary formulation of distance yields a similar result. We can reject the null hypothesis that there is zero spatial autocorrelation present in the variable av8top at alpha = .05.Using an existing matrix: If you have calculated a weights matrix according to some other metric than those available in spatwmat and wish to use it in calculating Moran's I, spatwmat allows you to read in a Stata dataset of the required dimensions and format it as a distance matrix that can be used by spatgsa. If altweights.dta is a dataset with 32 columns and 32 rows, it could be converted to a weighted matrix aweights to be used in spatgsa analyzing av8top:spatwmat using "C:\altweights.dta", name(aweights)How do I generate a variogram for spatial data in Stata?When analyzing geospatial data, describing the spatial pattern of a measured variable is of great importance. User written Stata commands allow you to explore such patterns. This page will use the variog and variog2 command. To install this, type findit variog in your command window.The variog command allows you to calculate and graph a variogram for regularly spaced one-dimensional data. The variog2 command allows you to calculate and graph a variogram for two-dimensional data without constraints on spacing. In both cases, the variogram illustrates how differences in a measured variable Z vary as the distances between the points at which Z is measured increase.Let's look at an example. Our dataset contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (station), the latitude and longitude of the station (lat and lon), and the average of the highest eight hour daily averages (av8top). This data, and other spatial datasets, can be downloaded from the GeoDa Center for Geospatial Analysis and Computation.use , clearclist in 1/5station av8top lat lon1. 60 7.225806 34.13583 -117.92362. 69 5.899194 34.17611 -118.31533. 724.052885 33.82361 -118.18754. 74 7.181452 34.19944 -118.53475. 756.076613 34.06694 -117.7514For the sake of an example, let's imagine that instead of specific latitude and longitude locations, the stations are evenly spaced along a single latitude. If we assume the observations are in the order in which the stations appear, we can use the variog command. In the command, we indicate the measured outcome and we will opt for the calculated values to be listed. By default, a plot of the semi-variogram will be generated.variog av8top, list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 2.328506 31 || 2 2.615086 30 || 3 2.629862 29 || 4 2.983584 28 || 5 3.415026 27 ||----------------------------------|| 6 2.923007 26 || 7 4.104437 25 || 8 3.378503 24 || 9 3.531528 23 || 10 4.49281 22 ||----------------------------------|| 11 5.22965 21 || 12 6.657857 20 || 13 6.5462 19 || 14 6.126221 18 || 15 6.556983 17 ||----------------------------------|| 16 6.451519 16 |+----------------------------------+Next, let's generate a variogram using the latitude and longitude of the stations. For this, we will use the variog2 command. While the lag distance in variog was assumed to be the distance between each evenly spaced observation, variog2 requires the user to specify the lag distance. Let's look at a summary of our coordinates to get a sense of the distances existing in our data.summarize lat lonVariable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------lat | 32 34.0146 .2228168 33.6275 34.69012lon | 32 -117.7078 .5683853 -118.5347 -116.2339 Based on this, we can calculate the maximum possible distance we might see in our data.dis sqrt((33.6275 - 34.69012)^2 + (-118.5347 - -116.2339)^2)2.5343326As a starting point, we can choose a lag distance of .1 and we can examine distances up to 12 lags apart. We want to choose a lag distance that yields enough pairs in each lag to generate a variance that we trust. We might aim to have at least 15 pairs in each lag.variog2 av8top lat lon, width(.1) lags(12) list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 4.729442 6 || 2 1.8984963 31 || 3 1.3789778 41 || 4 2.7462469 50 || 5 4.3899238 49 ||----------------------------------|| 6 4.1974818 43 || 7 5.2652506 48 || 8 7.3351494 41 || 9 6.8823236 36 || 10 8.0089961 29 ||----------------------------------|| 11 6.6957223 29 || 12 7.1360346 23 |+----------------------------------+We can see that our first lag contains only 6 pairs. We might increase the size of our lags and look at fewer of them.variog2 av8top lat lon, width(.15) lags(10) list+----------------------------------+| Lag Semi-variance # of pairs ||----------------------------------|| 1 1.8485044 21 || 2 1.8412199 57 || 3 3.1204523 74 || 4 4.4411303 68 || 5 5.8693088 70 ||----------------------------------|| 6 7.0979125 55 || 7 7.8960334 44 || 8 6.5713557 37 || 9 4.0710902 23 || 10 3.3176015 16 |+----------------------------------+In the output, we can see lag distances up to 10*.15 = 1.5, the number of pairs that are this far apart in the dataset, and the semi-variance. As we can see from the plot, the semi-variance increases until the lag distance exceeds .15*7 = 1.05.。
《计量经济学基础与Stata应用》Stata常用命令
生成变量 y 的倒数形式 生成序列变量 生成时间序列变量 year 生成面板数据时间序列变量 生产 y 的一阶滞后变量 生成 y 的变化量 改变现有变量的值 对变量取值重新编码 可以对一组数据求和,同时也能 给出观测样本的平均 值、标准 差、最小值以及最大值
summarize, detail
获取详细的描述性统计,包括百 分位数、中位数、平 均值、标 准差、方差、偏度、峰度
Stata edit lable rename merge list sort x tabulate x1, sort miss
浏览数据文件 打开表格化数据编辑器,进行数据输入或编辑 给变量添加标签 给变量重新命名 对数据文件进行合并 按默认或“表格”格式列出数据记录 将数据按 x 值从小到最大依次排序 显示 x1 所有值的频数分布,包括缺失值。同时按频数大小对行 (变量值)进行排序
制图
histogram y,frequency twoway scatter y x graph twoway connected y
画出变量 y 的直方图,以纵轴显示频数 显示 y 对 x 的双变量散点图 y 对 time 的时间标绘图,显示的数据点由线段连接起来
time graph box y1 y2 y3 graph pie a b c
构建变量 y1 y2 y3 的箱线图 画一个饼图,其中的每块表明了变量 a、b 和 c 的相对量。这些 变量必须有相似的单位
graph bar (sum) a b c graph dot(median) y, over(x) Graph twoway lfit y percent
以条形图中并排的条显示变量 a、b 和 c 各自的合计 画出一个点图,沿着水平刻度在 x 每一取值水平所对应的 y 的 中位数处打点 在回归直线图上叠并一张散点图
重磅!Stata15的新模块(二):空间计量分析(续)
重磅!Stata15的新模块(二):空间计量分析(续)Prof. Lung-fei Lee (李龙飞)Ohio State University不久前,Stata 公司发布了最新的 Stata 15,包含了许多令人激动的重大升级,包括非参数回归、空间计量、门槛回归、DSGE 模型等。
本公众号将陆续为你介绍,与计量经济学最为相关的几个全新模块。
(接上期推文)初步检验空间效应在Stata 15 中定义好空间权重矩阵后,即可进行初步的空间效应检验。
基本方法就是,计算莫兰I 指数(Moran's I,本质上为空间自相关系数),然后考察其显著性。
为此,先进行 OLS 回归,比如:reg y x1 x2 x3其中,y 为被解释变量,x1,x2 与 x3 为解释变量。
然后,使用以下命令计算上述 OLS 回归残差的莫兰 I 指数,并检验其显著性。
estat moran, errorlag(W)其中,必选项errorlag(W) 用于指定空间权重矩阵(莫兰指数的定义依赖于空间权重矩阵),以检验残差(error)是否具有空间滞后(spatial lag)效应。
如果莫兰指数(空间自相关系数)显著不为 0,则说明存在空间效应,须进一步进行空间计量分析;反之,则或许没有必要。
在上述 OLS 回归中,也可以将自变量都去掉,只对常数项回归:reg yestat moran, errorlag(W)此时,就是检验被解释变量本身是否存在空间自相关(spatial autocorrelation)。
空间自回归模型空间计量的不少术语都源于时间序列。
比如,空间数据也称为“空间序列”(spatial series),即分布于空间的序列。
进一步,最常见的时间序列模型为自回归模型,比如AR(1),即依赖于它的一阶滞后(邻居)。
类似地,可以考虑空间序列的自回归模型(Spatial Autoregression,简记 SAR),即依赖于其一阶空间滞后(邻居)的(比如,某地区的犯罪率依赖于其相邻地区的犯罪率),可写为向量形式:其中,为的空间滞后(邻居),而参数即为空间自回归系数(spatial autoregressive coefficient),是空间计量分析首要感兴趣的参数;为扰动项。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
溢出效应是指经济活动和过程中的外部性对未参与
经济活动和过程其中的周围个体的影响。 散发有毒气体 的植物会对周围的植物产生有害的影响, 屋主拥有一座漂 亮花园也显然对周围邻居有正效应。 同样不断加强的贸 易往来所带来的经济利益对地区性国家多边联盟的形成 具有正的溢出效应。
第讲 空间计量经济学初步
概述
地理学第一定律
世界上万千事物的状态都可以由一个三维的空间坐 标系与一个一维的时间坐标系来唯一刻画。时间或空间 上距离相近的两个事物的状态是相互关联的,即不能被 认为是相互独立的,且两事物越是接近,它们状态的相 关性越强。当两点距离为零(实则是同一个体),它们将 完全相关。越是相距遥远的事物相关性越弱,当两事物
概述
空间统计学VS空间计量经济学
首先,空间统计学的理论是空间计量经济学发展的基
础。正如计量经济学其他分支的发展都广泛借助统计学 的理论,空间计量经济学也尽可能吸收一切可以利用的 现存有关空间统计的理论。 其次,统计学的应用范围不仅限于经济学一门学科。 某一空间统计学理论最初就是为处理经济学中的空间效 应而提出,之后完全可能被应用到除经济学外的其他学 科。空间计量经济学补充和扩展了空间统计学。
空间相关性的根源
1. 观测数据地理位置接近(geographical
proximity)
由于地理位置的接近而导致的空间相关
性是空间相关性最初始的定义, 与地理学第
一定律吻合。这种相关性是环境, 地质等学
科中的普遍现象。
空间相关来源
2.截面上个体间互相竞争(competition)和 合作 最典型的例子是在一个寡头竞争的市场 中, 厂商对自己产品定价时将同时对市场 上其他厂商的价格作出反应, 最后决定的 价格将是博弈的均衡点。
n
S 2 Wij
i 1 j 1
n
(S 2
( X i X )2
i 1
n
;
X
X
i 1
i
n
)
•
W是二进制权数。
• Moran’s I的取值一般为[-1,+1],解释同相 关系数。 • 正空间自相关:相似的观测值在空间集聚; • 负空间自相关:相似的观测值在空间分散; • 无空间自相关:观测值在空间分布上没有 规律(完全随机)。
空间权重矩阵
计量经济学经常用线性模型来近似非线性模型, 即可将
近似写成
记 矩阵 的元素为 对角元素都为零。
,它的
空间自相关
一般我们无法利用容量为 的样本去估计 个参数。为了确保模型参数可识 别,我们需要对 的形式加以限制。最常 用的限制方式之一就是假设 其中 称为空间权重矩阵(spatial weighting matrix),它刻画的是截面上个体 之间空间相关的结构,是一个无量纲的矩阵。 称为是空间自回归系数,表示了空间相关性 在给定空间结构下的方向和强弱。
地区
Y
y Y Y
-9.1
-6.1 13.9 -4.1 6.93 -1.1
WA
WB
Wc
WD
WE
WF
y
2
A
B C D E F
12
15 35 17 28 20
0
1 1 1 0 0
1
0 1 0 0 0
1
1 0 0 1 1
1
0 0 0 1 0
0
0 1 1 0 1
0
0 1 0 1 0
82.3
36.8 194 16.6 48 1.14
地区 A B C D
yA * y
-108.84 -136.05 -317.45 -154.19
yB * y
yC * y yD * y yE * y
36.91 24.70 -56.70 16.56 -62.86 -42.07 96.53 -28.21
yF * y
9.70 6.49 -14.91 4.35
• 1. Moran指数(Moran’s I)
I n Wij ( X i X )( X j X )
i 1 j 1 n n n n n n
Wij ( X i X ) 2
i 1 j 1 i 1 n n
n
W ( X
i 1 j 1 ij
i
X )( X j X )
空间相关来源 3. 模仿行为(copy cat)
在一群体中,个体会重复或模仿一个或几个
特定个体的行为。 例如在班级中中游成绩的学
生会以成绩优秀的学生为榜样, 竞争性体育比赛
中, 选手会以领先选手为心中目标, 在以上这些情 况下, 如果不考虑空间相关性, 所建立的模型会和 真实模型相差甚远。
空间相关来源
空间权值矩阵的选择
• 尽管二进制的空间邻近权值矩阵并非适用于所有的空间计 量经济模型,但是,处于某些情况下的实用性,空间统计 学家在构建空间计量模型时的首选就是从二进制的邻近矩 阵开始的。 • 一般是先从空间邻近的最基本二进制矩阵开始,逐步选择 确定空间权值矩阵。 • 关于各种权值矩阵的选择,没有现成的理论根据,一般可 考虑空间计量模型对各种空间权值矩阵的适用程度,检验 估计结果对权值矩阵的敏感性,最终的依据实际上就是结 果的客观性和科学性。 • Anselin(1999,2003)研制开发的空间统计分析软 件GeoDa095i可以直接生成邻近矩阵来测算并确定地区 之间的空间效应。
之间距离为无穷远,可近似地认为两者完全不相关。
一、概述
空间计量经济学 (spatial econometrics)
空间计量经济学作为现代微观计量经济学
(micro-econometrics)的一个分支,是旨在为
处理截面数据或面板数据中的空间效应(spatial effect ) ,空间相关性(spatial dependence)与 空间异质性(spatial heterogeneity)发展专门的 建模、估计与统计检验方法。
概述
最后,正如Anselin (1988)所认为,空间统计
学是以数据为出发点的(data-driven),而空间计 量经济学是以模型为出发点的(model-driven)。 这说明,由经济学问题建立合适的刻画相关性的 计量模型,并发展相关的估计,假设检验,预测
方法才是空间计量经济学的主要任务。
二、空间自相关
A B 0 -136 55.1 0 -126 -85 36.91 0
WE * yE * y WF * yF * y
0 0
0 0
C
D
-317
-154
-85
0
0
0
0
0
96.5
-28
-14.91
0
E
F 合计
0
0 -608
0
0 -30
96.5
-15 -129
-28.21
0 8.7
0
-7.4 60.9
-7.42
附1.基于距离的空间权值矩阵
• 根据距离标准,Wij为:
; 1 当区域i和区域j在距离d之内(即区域i和区域j相邻) Wij (d ) 0 当区域i和区域j在距离d之外(即区域i和区域j不相邻);
• 基于距离的空间权值矩阵(Distance Based Spatial Weights)方法是假定空间相互作用的强度是决定于地区 间的质心距离或者区域行政中心所在地之间的距离,是一 种在实践应用中常用的空间权值矩阵。
概述
在时间序列分析中,时间自回归过程将 时刻t的反应变量与过去时刻的变量相联系, 表示一时刻所发生的事件受过去时间发生
事件结果的影响。如:
概述
空间相关性是指一地所发生的事件,行为
与现象,会直接或间接影响到另一地发生
的事件行为和现象。因此某一处的观测与
其他各地观测之间存在着函数关系。其一
般表达为
空间相关来源
• 在这种情况下,不同的权值指标随距离dij的定义而变化, 其取值取决于选定的函数形式(如距离的倒数或倒数的平 方,以及欧氏距离等)。 • 当然,还需要定义一个门槛距离,超过了某给定的门槛距 离则区域间的相互作用可以忽略不计。
附2.经济社会流量空间权值矩阵
• 除了使用真实的地理坐标计算地理距离外,还有 包括经济和社会因素的更加复杂的权值矩阵设定 方法。 • 比如,根据区域间交通运输流、通讯量、GDP总 额、贸易流动、资本流动、人口迁移、劳动力流 等确定空间权值,计算各个地区任何两个变量之 间的距离。
0 1/3 1/3 0 0 1/ 2 0 1/ 2
0 0 1/ 4 0 1/3 0
以上定义的权重矩阵的合理性在于,如果j和i同时和k 相邻,则由于j与k和i与k相邻的边界长度不同,j和k对 i的空间作用分别不同,正比于它们与i相接的边界的长 度。
注意:
• 对于模型而言,权重矩阵W的元素是非 随机的、外生的。基于一个距离衰减函数、 社会网络结构、经济距离、k个最邻近、经 验流量矩阵等也可以确定空间权重,尽管 这些选择可能间接表明空间权重的确定是 相当任意的。
空间自相关
二元相关(0-1相关)
例1.1.1. 在地图上的 则 Wn ,ij 0 。 个子区域中,如果 和 具有相 邻的边界(boundary),则定义
Wn ,ij 1 ,否
空间权重矩阵
0 1 1 Wi j 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1 0
W
j 1
n
n ,ij
其中 分子可以理解成是 和 的边界相同 部分的长度,分母是 与其他相邻接的个体 边界的总长。根据这一定义所得的权重矩阵 如下所示:
空间权重矩阵