8 空间回归
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
—the change in Y for a one unit change in X
= residual= error = Yi-Ŷi =Actual (Yi ) – Predicted (Ŷi ) Y b a
0
Yi
Ŷi
X
Regression line
1 X
10
Ordinary Least Squares (OLS)
Y
( Yi - Y)2
SS Total or Total Sum of Squares
r2
2
ˆi – Y)2 ( Y
2 ˆ ( Yi – Y ) i
ˆ –Y ) (Y ( Y –Y )
i i i i
SS Regression or Explained Sum of Squares
Yi α βX i εi
β (and b) measure the change in Y for a one unit change in X. If β = 0 then X has no effect on Y, therefore Null Hypothesis (H0): in the population β = 0 Alternative Hypothesis (H1): in the population β ≠ 0 Thus, we test if our sample regression coefficient, b, is sufficiently different from zero to reject the Null Hypothesis and conclude that X has a statistically significant affect on Y 16
ˆi – Y)2 ( Y
2 ˆ ( Yi – Y ) i
SS Regression or = Explained Sum of Squares
SS Residual or + Error Sum of Squares
12
Partitioning the Variance on Y
Y
Y
Y
Y
Y
SS Residual or Error Sum of Squares
2
判定系数越大说明了回归方程的参考价值越高
13
Standard Error of the Estimate (se)
Measures predictive accuracy: the bigger the standard error, the greater the spread of the observations about the regression line, thus the predictions are less accurate
基本内容
普通线性回归模型及估计
OLS工作的基本原理(ordinary least square普通最小二乘法) 解释OLS结果
空间回归
空间回归提出的背景及意义 空间滞后回归 空间误差回归 地理加权回归
OLS工作的基本原理
在实际工作中,我们可能会遇到以下类似的问题
在我们国家是否有持续发生年轻人早逝的地方? 哪里为犯罪或火灾的高发地点?
Correlation and Regression
What is the difference?
Mathematically, they are identical.数学形式一样
Conceptually, very different.概念意义不同
Correlation 相关分析
Co-variation 共同变化 Relationship or association
Se = error mean square, or average squared residual = variance of the estimate, variance about
2
regression
(called sigma-square in GeoDA)
se
2 ˆ ( Yi – Yi )
的影响程度。 例如,了解某些特定濒危鸟类的主要栖息地特征(降水,食 物源、植被、天敌),以协助通过立法来保护该物种。
2.对某种现象建模以预测其他地点或其他时间的数值,构
建一个持续准确的预测模型。
的用电量将会是多少?
例如,如果已知人口增长情况和典型的天气状况,那么明年
3.深入探索某些假设情况。 假设您正在对住宅区的犯罪活动进行建模,以更好的了解犯 罪活动并希望实施可能阻止犯罪活动的策略,开始分析时, 就会有很多问题或想要检验的假设情况: 1)“破窗理论”表明公共财产的破坏(涂鸦、被毁坏的建筑物 等)可招致其他犯罪行为,破坏财产行为与入世盗窃之间是 否存在正关系? 2)非法使用毒品与盗窃行为之间存在某种关系吗(吸毒成瘾 的人又可能通过偷取财物来维持他们吸毒的习惯吗)?
No direction or causation is implied
Illiteracy
40 30 20 10 0 0 40 2000000 4000000 6000000 % Population Urban
Y
X
X1
X2
Quantity
Regression回归分析
Prediction of Y from X
r2 = 0.26 r = .51 Se= 1.3 b = 0.8 moderate
b=0
none
As the coefficient of determination gets smaller, the slope of the Regression regression line (b) gets closer to zero. As the coefficient of determination gets smaller, the standard error gets line in larger, and closer to the standard deviation of the dependent variable (Y) blue (Sy = 2) 注意判定系数、相关系数、回归系数的意义与不同 15
8
Regression
Simple regression Between two variables
Y
One dependent variable (Y) One independent variable (X)
X
Multiple Regression Between three or more variables
30 20 10 0 0
Regression line
Implies, but does not prove, causation
X (independent variable)
predicts
2000000 4000000 6000000 Price
Y (dependent variable)
例如,对影响大学生毕业率的因素进行建模,可以对近期的
劳动力技能和资源进行预测;因为监测站数量不足而无法进 行充分插值的情况下(沿山脊地区和山谷内,雨量计通常会 短缺),可以用回归法来预测这些地区的降雨量或者是空气 质量。
使用回归分析的主要原因
1.对某一现象建模,测量一个或多个变量的变化对另一变量变化
variable) from another variable (X - the independent variable) Y = a +bX +
a is the intercept —the value of Y when X =0 b is the regression coefficient or slope of the line
--the standard criteria for obtaining the regression line
Yi
The regression line minimizes the sum of the squared deviations between actual Yi and predicted Ŷi
Sample Statistics, Population Parameters and Statistical Significance tests
Yi = a +bXi +i
a and b are sample statistics which are estimates of population parameters α and β
Sum of squared residuals Number of observations minus degrees of freedom (for simple regression, degrees of freedom = 2)
nk
注:standard error 标准误 (Se),standard deviation 标准差 SD 这两个概念别弄混了
Ŷi
Min (Yi-Ŷi)2
Βιβλιοθήκη BaiduYi
Ŷi
11
Coefficient of Determination (r2)判定系数
The coefficient of determination (r2) measures the proportion of the variance in Y (the dependent variable) which can be predicted or “explained by” X (the independent variable). Varies from 1 to 0. • It equals the correlation coefficient (r相关系数) squared.
致了这种情况?
我们能否对犯罪、911呼叫或火灾频发地区的特征进行建模, 以帮助减少这些事件的发生? 导致交通事故发生率比预期要高的因素有哪些,有没有相 关政策或者措施来减少整个城市或特定事故高发区的交通
事故?
通过回归分析,我们可以对空间关系进行建模、检查和探 究,还可以解释所观测到的空间模式背后的诸多因素。 例如分析有些地区为什么会持续发生年轻人早逝或者糖尿病 的发病率比预期的要高。 通过空间关系建模,对这些现象进行预测。
14
Coefficient of determination (r2 ), correlation coefficinet (r), regression coefficient (b), and standard error (Se)
(Values are hypothetical and for illustration of relative change only) r2 = r = 1 Se= 0.0 Sy = 2 b= 2 r2 = 0.94 r = .97 Se= 0.3 perfect positive r2 = 0.07 Se= 1.8 b = 0.1 weak Very strong r2 = r= 0.00 Se= Sy = 2 r2 = 0.51 r = .71 Se= 1.1 b = 1.1 strong
•
r
2
2 ˆ ( Y i - Y)
SS Regression or Explained Sum of Squares SS Total or Total Sum of Squares
( Yi - Y )
2
Note:
( Yi - Y)2
SS Total or Total Sum of Squares
X1
income
One dependent variable (Y) Two or independent variable (X1 ,X2…)
Y
X2
9
Simple Linear Regression
Concerned with “predicting” one variable (Y - the dependent
城市中哪里的交通事故发生率比预期的要高? ……
可以通过热点分析的方法弄清以上问题
911紧急呼叫数据的 分析结果,显示了 呼叫热点(红 色)、呼叫冷点 (蓝色)以及负责 事故处理的消防和 警察分队的位置 (绿色十字)
对于上面的每一个问题都询问了“where”,但是我们自然 会想到“why” 为什么国家会存在持续发生年轻人早逝的地方?是什么导
= residual= error = Yi-Ŷi =Actual (Yi ) – Predicted (Ŷi ) Y b a
0
Yi
Ŷi
X
Regression line
1 X
10
Ordinary Least Squares (OLS)
Y
( Yi - Y)2
SS Total or Total Sum of Squares
r2
2
ˆi – Y)2 ( Y
2 ˆ ( Yi – Y ) i
ˆ –Y ) (Y ( Y –Y )
i i i i
SS Regression or Explained Sum of Squares
Yi α βX i εi
β (and b) measure the change in Y for a one unit change in X. If β = 0 then X has no effect on Y, therefore Null Hypothesis (H0): in the population β = 0 Alternative Hypothesis (H1): in the population β ≠ 0 Thus, we test if our sample regression coefficient, b, is sufficiently different from zero to reject the Null Hypothesis and conclude that X has a statistically significant affect on Y 16
ˆi – Y)2 ( Y
2 ˆ ( Yi – Y ) i
SS Regression or = Explained Sum of Squares
SS Residual or + Error Sum of Squares
12
Partitioning the Variance on Y
Y
Y
Y
Y
Y
SS Residual or Error Sum of Squares
2
判定系数越大说明了回归方程的参考价值越高
13
Standard Error of the Estimate (se)
Measures predictive accuracy: the bigger the standard error, the greater the spread of the observations about the regression line, thus the predictions are less accurate
基本内容
普通线性回归模型及估计
OLS工作的基本原理(ordinary least square普通最小二乘法) 解释OLS结果
空间回归
空间回归提出的背景及意义 空间滞后回归 空间误差回归 地理加权回归
OLS工作的基本原理
在实际工作中,我们可能会遇到以下类似的问题
在我们国家是否有持续发生年轻人早逝的地方? 哪里为犯罪或火灾的高发地点?
Correlation and Regression
What is the difference?
Mathematically, they are identical.数学形式一样
Conceptually, very different.概念意义不同
Correlation 相关分析
Co-variation 共同变化 Relationship or association
Se = error mean square, or average squared residual = variance of the estimate, variance about
2
regression
(called sigma-square in GeoDA)
se
2 ˆ ( Yi – Yi )
的影响程度。 例如,了解某些特定濒危鸟类的主要栖息地特征(降水,食 物源、植被、天敌),以协助通过立法来保护该物种。
2.对某种现象建模以预测其他地点或其他时间的数值,构
建一个持续准确的预测模型。
的用电量将会是多少?
例如,如果已知人口增长情况和典型的天气状况,那么明年
3.深入探索某些假设情况。 假设您正在对住宅区的犯罪活动进行建模,以更好的了解犯 罪活动并希望实施可能阻止犯罪活动的策略,开始分析时, 就会有很多问题或想要检验的假设情况: 1)“破窗理论”表明公共财产的破坏(涂鸦、被毁坏的建筑物 等)可招致其他犯罪行为,破坏财产行为与入世盗窃之间是 否存在正关系? 2)非法使用毒品与盗窃行为之间存在某种关系吗(吸毒成瘾 的人又可能通过偷取财物来维持他们吸毒的习惯吗)?
No direction or causation is implied
Illiteracy
40 30 20 10 0 0 40 2000000 4000000 6000000 % Population Urban
Y
X
X1
X2
Quantity
Regression回归分析
Prediction of Y from X
r2 = 0.26 r = .51 Se= 1.3 b = 0.8 moderate
b=0
none
As the coefficient of determination gets smaller, the slope of the Regression regression line (b) gets closer to zero. As the coefficient of determination gets smaller, the standard error gets line in larger, and closer to the standard deviation of the dependent variable (Y) blue (Sy = 2) 注意判定系数、相关系数、回归系数的意义与不同 15
8
Regression
Simple regression Between two variables
Y
One dependent variable (Y) One independent variable (X)
X
Multiple Regression Between three or more variables
30 20 10 0 0
Regression line
Implies, but does not prove, causation
X (independent variable)
predicts
2000000 4000000 6000000 Price
Y (dependent variable)
例如,对影响大学生毕业率的因素进行建模,可以对近期的
劳动力技能和资源进行预测;因为监测站数量不足而无法进 行充分插值的情况下(沿山脊地区和山谷内,雨量计通常会 短缺),可以用回归法来预测这些地区的降雨量或者是空气 质量。
使用回归分析的主要原因
1.对某一现象建模,测量一个或多个变量的变化对另一变量变化
variable) from another variable (X - the independent variable) Y = a +bX +
a is the intercept —the value of Y when X =0 b is the regression coefficient or slope of the line
--the standard criteria for obtaining the regression line
Yi
The regression line minimizes the sum of the squared deviations between actual Yi and predicted Ŷi
Sample Statistics, Population Parameters and Statistical Significance tests
Yi = a +bXi +i
a and b are sample statistics which are estimates of population parameters α and β
Sum of squared residuals Number of observations minus degrees of freedom (for simple regression, degrees of freedom = 2)
nk
注:standard error 标准误 (Se),standard deviation 标准差 SD 这两个概念别弄混了
Ŷi
Min (Yi-Ŷi)2
Βιβλιοθήκη BaiduYi
Ŷi
11
Coefficient of Determination (r2)判定系数
The coefficient of determination (r2) measures the proportion of the variance in Y (the dependent variable) which can be predicted or “explained by” X (the independent variable). Varies from 1 to 0. • It equals the correlation coefficient (r相关系数) squared.
致了这种情况?
我们能否对犯罪、911呼叫或火灾频发地区的特征进行建模, 以帮助减少这些事件的发生? 导致交通事故发生率比预期要高的因素有哪些,有没有相 关政策或者措施来减少整个城市或特定事故高发区的交通
事故?
通过回归分析,我们可以对空间关系进行建模、检查和探 究,还可以解释所观测到的空间模式背后的诸多因素。 例如分析有些地区为什么会持续发生年轻人早逝或者糖尿病 的发病率比预期的要高。 通过空间关系建模,对这些现象进行预测。
14
Coefficient of determination (r2 ), correlation coefficinet (r), regression coefficient (b), and standard error (Se)
(Values are hypothetical and for illustration of relative change only) r2 = r = 1 Se= 0.0 Sy = 2 b= 2 r2 = 0.94 r = .97 Se= 0.3 perfect positive r2 = 0.07 Se= 1.8 b = 0.1 weak Very strong r2 = r= 0.00 Se= Sy = 2 r2 = 0.51 r = .71 Se= 1.1 b = 1.1 strong
•
r
2
2 ˆ ( Y i - Y)
SS Regression or Explained Sum of Squares SS Total or Total Sum of Squares
( Yi - Y )
2
Note:
( Yi - Y)2
SS Total or Total Sum of Squares
X1
income
One dependent variable (Y) Two or independent variable (X1 ,X2…)
Y
X2
9
Simple Linear Regression
Concerned with “predicting” one variable (Y - the dependent
城市中哪里的交通事故发生率比预期的要高? ……
可以通过热点分析的方法弄清以上问题
911紧急呼叫数据的 分析结果,显示了 呼叫热点(红 色)、呼叫冷点 (蓝色)以及负责 事故处理的消防和 警察分队的位置 (绿色十字)
对于上面的每一个问题都询问了“where”,但是我们自然 会想到“why” 为什么国家会存在持续发生年轻人早逝的地方?是什么导