Local polynomial regression estimators in survey sampling
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Local Polynomial Regression Estimators in Survey Sampling
F. Jay Breidt and Jean D. Opsomer Iowa State University
June 14, 19nite population totals in the presence of auxiliary information is considered. A class of estimators based on local polynomial regression is proposed. Like generalized regression estimators, these estimators are weighted linear combinations of study variables, in which the weights are calibrated to known control totals, but the assumptions on the superpopulation model are considerably weaker. The estimators are shown to be asymptotically design-unbiased and consistent under mild assumptions. A variance approximation based on Taylor series linearization is suggested and shown to be consistent for the design mean squared error of the estimators. The estimators are robust in the sense of asymptotically attaining the Godambe-Joshi lower bound to the anticipated variance. Simulation experiments indicate that the estimators are more e cient than regression estimators when the model regression function is incorrectly speci ed, while being approximately as e cient when the parametric speci cation is correct. Calibration, generalized regression estimation, Godambe-Joshi lower bound, local polynomial kernel smoothing. Iowa State University of Science and Technology, Statistical Laboratory and Department of Statistics, 221 Snedecor Hall, Ames, IA 50011{1210.
Keywords
Local Polynomial Regression Estimators in Surveys
1
In many survey problems, auxiliary information is available for all elements of the population of interest. The U.S. Consumer Expenditure Survey and other household surveys conducted by the Census Bureau are drawn from the decennial census, which (prior to the year 2000) contained auxiliary information on contract rent and value of owned home; population registers in some countries contain age and taxable income; geographic information systems contain elevation or other spatially explicit data for all locations; and so forth. Use of this auxiliary information in estimating parameters of the nite population of study variables is a central problem in surveys. One approach to this problem is the superpopulation approach, in which a working model describing the relationship between the auxiliary variable x and the study variable y is assumed. Estimators are sought which have good e ciency if the model is true, but maintain desirable properties like asymptotic design unbiasedness (unbiasedness over repeated sampling from the nite population) and design consistency if the model is false. Typically, the assumed models are linear models, leading to the familiar ratio and regression estimators (e.g., Cochran, 1977), the best linear unbiased estimators (Brewer, 1963; Royall, 1970), the generalized regression estimators (Cassel, Sarndal, and Wretman, 1977; Sarndal, 1980; Robinson and Sarndal, 1983), and related estimators (Wright, 1983; Isaki and Fuller, 1982). The papers cited vary in their emphasis on design and model, but it is fair to say that all are concerned to some extent with behavior of the estimators under model misspeci cation. Given this concern with robustness, it is natural to consider a nonparametric class of models for , because they allow the models to be correctly speci ed for much larger classes of functions. Kuo (1988), Dorfman (1992), Dorfman and Hall (1993), and Chambers, Dorfman, and Wehrly (1993) have adopted this approach in constructing model-based estimators. This paper describes theoretical properties of a new type of model-assisted nonparametric regression estimator for the nite population total, based on local polynomial smoothing. Local polynomial regression is a generalization of kernel regression. Cleveland (1979) and Cleveland and Devlin (1988) showed that these techniques are applicable to a wide range of problems. Theoretical work by Fan (1992, 1993) and Ruppert and Wand (1994) showed that it has many desirable theoretical properties, including design adaptation, consistency and asymptotic unbiasedness. Wand and Jones (1995) provide a clear explanation of the asymptotic theory for kernel regression and local polynomial regression. The monograph by Fan and Gijbels (1996) explores a wide range of application areas of local polynomial regression techniques. However, the application of these techniques to model-assisted survey sampling as described in this article is new. In Section 1.2 we introduce the local polynomial regression estimator and in Section 1.3 we state assumptions used in the theoretical derivations of Section 2, in which our main results are described. Section 2.1 shows that the estimator is a weighted linear combination of study variables in which the weights are calibrated to known control totals. Section 2.2 contains a proof that the estimator is asymptotically design unbiased and design consistent, and Section 2.3 provides an approximation to its mean squared error and a consistent estimator of the mean squared error. Section 2.4 establishes asymptotic normality of the
F. Jay Breidt and Jean D. Opsomer Iowa State University
June 14, 19nite population totals in the presence of auxiliary information is considered. A class of estimators based on local polynomial regression is proposed. Like generalized regression estimators, these estimators are weighted linear combinations of study variables, in which the weights are calibrated to known control totals, but the assumptions on the superpopulation model are considerably weaker. The estimators are shown to be asymptotically design-unbiased and consistent under mild assumptions. A variance approximation based on Taylor series linearization is suggested and shown to be consistent for the design mean squared error of the estimators. The estimators are robust in the sense of asymptotically attaining the Godambe-Joshi lower bound to the anticipated variance. Simulation experiments indicate that the estimators are more e cient than regression estimators when the model regression function is incorrectly speci ed, while being approximately as e cient when the parametric speci cation is correct. Calibration, generalized regression estimation, Godambe-Joshi lower bound, local polynomial kernel smoothing. Iowa State University of Science and Technology, Statistical Laboratory and Department of Statistics, 221 Snedecor Hall, Ames, IA 50011{1210.
Keywords
Local Polynomial Regression Estimators in Surveys
1
In many survey problems, auxiliary information is available for all elements of the population of interest. The U.S. Consumer Expenditure Survey and other household surveys conducted by the Census Bureau are drawn from the decennial census, which (prior to the year 2000) contained auxiliary information on contract rent and value of owned home; population registers in some countries contain age and taxable income; geographic information systems contain elevation or other spatially explicit data for all locations; and so forth. Use of this auxiliary information in estimating parameters of the nite population of study variables is a central problem in surveys. One approach to this problem is the superpopulation approach, in which a working model describing the relationship between the auxiliary variable x and the study variable y is assumed. Estimators are sought which have good e ciency if the model is true, but maintain desirable properties like asymptotic design unbiasedness (unbiasedness over repeated sampling from the nite population) and design consistency if the model is false. Typically, the assumed models are linear models, leading to the familiar ratio and regression estimators (e.g., Cochran, 1977), the best linear unbiased estimators (Brewer, 1963; Royall, 1970), the generalized regression estimators (Cassel, Sarndal, and Wretman, 1977; Sarndal, 1980; Robinson and Sarndal, 1983), and related estimators (Wright, 1983; Isaki and Fuller, 1982). The papers cited vary in their emphasis on design and model, but it is fair to say that all are concerned to some extent with behavior of the estimators under model misspeci cation. Given this concern with robustness, it is natural to consider a nonparametric class of models for , because they allow the models to be correctly speci ed for much larger classes of functions. Kuo (1988), Dorfman (1992), Dorfman and Hall (1993), and Chambers, Dorfman, and Wehrly (1993) have adopted this approach in constructing model-based estimators. This paper describes theoretical properties of a new type of model-assisted nonparametric regression estimator for the nite population total, based on local polynomial smoothing. Local polynomial regression is a generalization of kernel regression. Cleveland (1979) and Cleveland and Devlin (1988) showed that these techniques are applicable to a wide range of problems. Theoretical work by Fan (1992, 1993) and Ruppert and Wand (1994) showed that it has many desirable theoretical properties, including design adaptation, consistency and asymptotic unbiasedness. Wand and Jones (1995) provide a clear explanation of the asymptotic theory for kernel regression and local polynomial regression. The monograph by Fan and Gijbels (1996) explores a wide range of application areas of local polynomial regression techniques. However, the application of these techniques to model-assisted survey sampling as described in this article is new. In Section 1.2 we introduce the local polynomial regression estimator and in Section 1.3 we state assumptions used in the theoretical derivations of Section 2, in which our main results are described. Section 2.1 shows that the estimator is a weighted linear combination of study variables in which the weights are calibrated to known control totals. Section 2.2 contains a proof that the estimator is asymptotically design unbiased and design consistent, and Section 2.3 provides an approximation to its mean squared error and a consistent estimator of the mean squared error. Section 2.4 establishes asymptotic normality of the