(完整版)R语言代码试题答案步骤
【最新】R语言数据分析统计自测题(附答案代码数据)
1.1
Answer
Following the hint, we compute C by maximising 2 2 eλx−x /2 (2π )1/2 λ ϕ(x) h(x) = = g (x; λ) 2 e−λx−x2 /2
(2π )1/2 λ
x≥0 x < 0.
Differentiating with respect to x we get 2 λx−x2 /2 (2π)1/2 λ (λ − x)e h′ (x) = 2 (−1)(λ + x)e−λx−x2 /2
X <- rlaplace(1) Y <- runif(1, 0, C*dlaplace(X,1)) if( Y < dnorm(X, 0, 1) ) { out[i] <- X ## save X to out object ## when Y < phi(X) t <- 1 ## make t +ve so ## as to exit while } } } return(out) } To verify code, simulate a large sample (n = 214 ) and check if the histogram matches the density of the standard normal distribution. X <- rej.sampl.stnorm(2^14) hist(X, breaks=40, freq=FALSE) lines(x, dnorm(x,0,1), lty=2)
MATH11176 Lab 6 solutions
Ioannis Papastathopoulos October 26, 2017
R语言知识点汇总及课后题梳理档3(接档2)
课后题主要以本书为例,内容重点各个版本的都整理在一块了,大家自行查阅。
R语言实现:使用DMwR包中的函数lofactor(),基本格式为:lofactor(data, k)其中,data为数值型数据集;k为用于计算局部异常因子的邻居数量。
聚类将那些不属于任何一类的数据作为异常值。
作为一位数据分析人员,应当警惕编码使用的不一致问题和数据表示的不一致问题,如格式不一致(日期“2018/05/25”和“25/05/2018”)、类型不一致、命名不一致等。
编码不一致和数据表示不一致的问题通常需要人工检测,当发现一定规律时可以通过编程进行替换和修改。
若存在不一致的数据是无意义数据,可以使用缺失值处理方法进行相应处理。
数据矛盾(不一致)还可能是由于被挖掘的数据来自不同的数据源,对于重复存放的数据未能进行一致性更新造成的,类似于数据库参照完整性。
例如,两张表中都存放了用户电话号码,但在用户的电话号码发生给变时,只更新了一张表中的数据,那么这两张表就有了不一致的数据。
这要借助数据库的完整性理论。
分布分析用于解释数据分布的特征和类型,分为定量数据和定性数据两种类型。
(1)定量数据的分布分析方法1:直方图在R语言中,使用hist()函数画出样本的直方图。
方法2:核密度图与直方图相配套的是核密度图,其目的是用已知样本,方法3:茎叶图与直方图比较,茎叶图更能细致地看出数据分布结构。
R语言中使用stem()函数绘制茎叶图。
在茎叶图中,纵轴为测定数据,横轴为数据频数,数据的十分位表示“茎”,作为纵轴的刻度;个位数作为“叶”,显示频数的个数,作用与直方图类似。
(2)定性数据的分布分析对于定性变量,常常根据分类变量来分组,可以采用饼图来描述定性变量的分布。
饼图的每一个扇形部分代表每一类型的百分比或频数,根据定性变量的类型数目将饼形图分成几个部分,每一部分的大小与每一类型的频数成正比。
)对比分析原理,对比分析,就是给孤立的数据一个合理的参考系,否则孤立的数据毫无意义。
r语言完成数独代码
在R 语言中完成数独游戏的解决方案需要一个递归的解题算法,通常是回溯法。
以下是一个简单的R 语言代码示例,它展示了如何实现数独求解器。
请注意,这个解决方案假定数独是合法的,并且至少有一个解。
```risSafe <- function(board, row, col, num) {# 检查同一行有没有相同数字if (any(board[row, ] == num)) {return(FALSE)}# 检查同一列有没有相同数字if (any(board[, col] == num)) {return(FALSE)}# 检查同一个3x3宫格内有没有相同数字startRow <- (row - 1) %/% 3 * 3 + 1startCol <- (col - 1) %/% 3 * 3 + 1if (any(board[startRow:(startRow+2), startCol:(startCol+2)] == num)) {return(FALSE)}return(TRUE)}solveSudoku <- function(board) {emptyLoc <- which(board == 0, arr.ind = TRUE)if (nrow(emptyLoc) == 0) {return(TRUE) # 如果没有空位,返回TRUE (数独已解决)}row <- emptyLoc[1, 1]col <- emptyLoc[1, 2]for (num in 1:9) {if (isSafe(board, row, col, num)) {board[row, col] <- num # 试着填入数字if (solveSudoku(board)) {return(TRUE)}board[row, col] <- 0 # 回溯}}return(FALSE) # 触发回溯}# 初始化数独问题,0 表示空白格board <- matrix(c(5, 3, 0, 0, 7, 0, 0, 0, 0,6, 0, 0, 1, 9, 5, 0, 0, 0,0, 9, 8, 0, 0, 0, 0, 6, 0,8, 0, 0, 0, 6, 0, 0, 0, 3,4, 0, 0, 8, 0, 3, 0, 0, 1,7, 0, 0, 0, 2, 0, 0, 0, 6,0, 6, 0, 0, 0, 0, 2, 8, 0,0, 0, 0, 4, 1, 9, 0, 0, 5,0, 0, 0, 0, 8, 0, 0, 7, 9),nrow = 9, byrow = TRUE)if (solveSudoku(board)) {cat('A solution has been found:\n')print(board)} else {cat('No solution exists.\n')}```上述R 脚本定义了两个函数:`isSafe` 用于检查在特定位置填写数字是否不违反数独的规则;而`solveSudoku` 则通过适时的填写数字和回溯来求解数独。
“生物统计学”课程作业(R语言)及参考答案
“生物统计学”课程作业(R语言)第一次作业:请各位同学用如下格式提交作业:题目一:题目:2006年四川省5个县奶牛的增长率(与2005年相比)如下,绘制成长条图。
解:代码:x<-c(22.6,13.8,18.2,31.3,9.5)barplot(x,col=rainbow(5),ylab='增长率%',xlab='县名',main='2005~2006四川省5个县奶牛的增长率',names.arg=c('双流县','名山县','宣汉县','青川县','泸定县'))图片:题目二:题目:1~9周龄大型肉鸭杂交组合GW 和GY 的料肉比如下表所示,绘制成线图。
解:双流县名山县宣汉县青川县泸定县2005~2006四川省5个县奶牛的增长率县名增长率%051015202530代码:x<-1:9y<-c(1.42,1.56,1.66,1.84,2.13,2.48,2.83,3.11,3.48)plot(y~x,type="l",col="red",ylab='料肉比',xlab='周龄',main='1~9周龄大型肉鸭杂交组合GW 和GY 的料肉比')z<-c(1.47,1.71,1.80,1.97,2.31,2.91,3.02,3.29,3.57)lines(z~x,type="l",col="blue")legend(1,3.0,'GW-red')legend(1,3.25,'GY-blue')图片:24681.52.02.53.03.51~9周龄大型肉鸭杂交组合GW 和GY 的料肉比周龄料肉比GW-redGY-blue题目三:附加题set.seed(学号后8位)data <- rnorm(100, 10.5, 1.0)data 为某场猪一月龄体重记录1. 求数据data 的平均数、标准差和变异系数平均数:10.62118标准差:1.076999变异系数:0.10140112. 选择合适的统计图,展示数据data 的总体分布情况作图软件:R统计图:(请将做好的统计图粘贴到此处)020*********8910111213某场猪一月龄体重记录100头猪标号体重代码:(若使用R,或SAS,请将代码粘贴到此处,若使用的软件不需代码则可忽略此部分)set.seed(20020125)data<-rnorm(100,10.5,1.0)dataplot(data,type="p",ylab='体重',xlab='100头猪标号',main='某场猪一月龄体重记录')结果解释:(100字以内解释该统计图中看到的结果)由统计图可知,该场猪一月龄体重主要介于8~14kg之间,但在该区间内,猪的体重数据分布较分散,相对来说数据较为集中的区域为10~11kg。
【原创】R语言Ordination_OPTION 统计分析统计自测题(附答案代码数据)
STAT660/FES758bMultivariate StatisticsHomework #6 OPTION A: OrdinationDue : Wednesday, 4/26/17 – Submit on CANVAS by midnightFor this assignment, you can either use your own data or the data described below. Use any combination of R/SAS/MINITAB/SPSS/STATA that you like. Whichever data you choose, do the following:1) Fit Correspondence Analysis to your data.2) Discuss the inertia, make a two dimensional plot of the first two CA directions.3) Comment on whether or not there is any evidence of 'data snaking' in higherdimensional space.4) In a few sentences, describe what you conclude from your plot.5) Perform Multidimensional Scaling (metric or non-metric) for 1, 2, and 3dimensions.6) Discuss the stress (or SStress) of each dimensional solution. Make a scree plotif you're able.7) Make a two dimensional plot of your results.8) If possible, overlay some other variables to interpret your ordination axes.9) BONUS– try canonical correspondence analysis, or calculate p-values for theoverlaid additional variables.Loaner Data : choose ONECereal.attitudes.csv : Marketing Survey Attitudes toward Cereals∙8 Cereals∙11 Questions (come back to, tastes nice, popular with all the family, very easy to digest, nourishing, natural flavor, reasonably priced, a lot of food value, stays crispy in milk, helps to keep you fit, fun for children to eat) ∙Values are percent of respondents who had a favorable response for a particular cereal for that particular question.T. K. Chakrapani and A. S. C. Ehrenberg, "An Alternative to Factor Analysis in Marketing Research Part 2: Between Group Analysis", PMRS Journal, Vol. 1, Issue 2, October 1981, pp. 32-38.R code to get you started :#get the datacereal <-read.csv("/stat660/data/cereal.attitudes.csv") Wiconsin.Forest.csv : Relative abundance of 14 species was measured on 10 plots. Plots were ordered from pioneer (early stage) to climax (late stage). The final column contains that stage of the forest on a scale from 1 to 10.Peet & Loucks (1977)R code to get you started :#get the dataforest <-read.csv("/stat660/data/Wisconsin.Forest.csv") rownames(forest)=forest[,1]forestenv=matrix(forest[,17],ncol=1)rownames(forestenv)=forest[,1]colnames(forestenv)=c("Stage")forest=forest[,-c(1,17)]forestenv=data.frame(forestenv)。
r语言与统计分析第五章课后答案
r语言与统计分析第五章课后答案第五章5.1设总体某是用无线电测距仪测量距离的误差,它服从(α,β)上的均匀分布,在200次测量中,误差为某i的次数有ni次:某i:3579111315171921Ni:21161526221421221825求α,β的矩法估计值α=u-β=u+程序代码:某=eq(3,21,by=2)y=c(21,16,15,26,22,14,21,22,18,25)u=rep(某,y)u1=mean(u)=var(u)1=qrt()a=u1-qrt(3)某1b=u1+qrt(3)某1b=u1+qrt(3)某1得出结果:a=2.217379b=22.402625.2为检验某自来水消毒设备的效果,现从消毒后的水中随机抽取50L,化验每升水中大肠杆菌的个数(假设1L水中大肠杆菌的个数服从泊松分布),其化验结果如下表所示:试问平均每升水中大肠杆菌个数为多少时,才能使上述情况的概率达到最大大肠杆菌数/L:0123456水的升数:1720222100γ=u是最大似然估计程序代码:a=eq(0,6,by=1)b=c(17,20,10,2,1,0,0)c=a某bd=mean(c)得出结果:d=7.1428575.3已知某种木材的横纹抗压力服从正态分布,现对十个试件做横纹抗压力试验,得数据如下:482493457471510446435418394469(1)求u的置信水平为0.95的置信区间程序代码:某=c(482493457471510446435418394469)t.tet(某)得出结果:data:某t=6.2668,df=9,p-value=0.0001467alternativehypothei:truemeaninotequalto095percentconfidenceinterval:7.66829916.331701ampleetimate:meanof某12由答案可得:u的置信水平为0.95的置信区间[7.66829916.331701](2)求σ的置信水平为0.90的置信区间程序代码:chiq.var.tet<-function(某,var,alpha,alternative="two.ided"){ option(digit=4)reult<-lit()n<-length(某)v<-var(某)reult$var<-vchi2<-(n-1)某v/varreult$chi2<-chi2p<-pchiq(chi2,n-1)reult$p.value<-pif(alternative=="le")reult$p.value<-pchaiq(chi2,n-1,loer.tail=F)eleif(alternative=="two.ider")reult$p.value<-2某min(pchaiq(chi2,n-1),pchaiq(chi2,n-1,lower.tail=F))reult$conf.int<-c((n-1)某v/qchiq(alpha/2,df=n-1,lower.tail=F),(n-1)某v/qchiq(alpha/2,df=n-1,lower.tail=T))reult}某<-c(482,493,457,471,510,446,435,418,394,469)y=var(某)chiq.var.tet(某,0.048^2,0.10,alternative="two.ide")得出结果:$conf.int:659.83357.0由答案可得:σ的置信水平为0.90的置信区间[659.83357.0]5.4某卷烟厂生产两种卷烟A和B现分别对两种香烟的尼古丁含量进行6次试验,结果如下:A:252823262922B:282330352127若香烟的尼古丁含量服从正态分布(1)问两种卷烟中尼古丁含量的方差是否相等(通过区间估计考察)(2)试求两种香烟的尼古丁平均含量差的95%置信区间程序代码:某=c(25,28,23,26,29,22)Y=c(28,23,30,35,21,27)Var.tet(某,y)data:某andyF=0.2992,numdf=5,denomdf=5,p-value=0.2115alternativehypothei:trueratioofvarianceinotequalto195percentconfidenceinterval:0.041872.13821ampleetimate:ratioofvariance0.2992由答案可得:其方差不相等,方差区间为[0.041872.13821](2)5.5比较两个小麦品种的产量,选择24块条件相似地实验条,采用相同的耕作方法做实验,结果播种甲品种的12块实验田的单位面积产量和播种乙品种的12块试验田的单位面积产量分别为:A:628583510554612523530615573603334564B:535433398470567480498560503426338547假定每个品种的单位面积产量服从正态分布,甲品种产量的方差为2140,乙品种产量的方差为3250,试求这两个品种平均面积产量差的置信水平为0.95的置信上限和置信水平为0.90的置信下限。
r语言与统计分析 第五章课后答案
第五章5.1 设总体x是用无线电测距仪测量距离的误差,它服从(α,β)上的均匀分布,在200次测量中,误差为xi的次数有ni次:Xi:3 5 7 9 11 13 15 17 19 21Ni:21 16 15 26 22 14 21 22 18 25求α,β的矩法估计值α=u-√3sβ=u+√3s程序代码:x=seq(3,21,by=2)y=c(21,16,15,26,22,14,21,22,18,25)u=rep(x,y)u1=mean(u)s=var(u)s1=sqrt(s)a=u1-sqrt(3)*s1b=u1+sqrt(3)*s1b=u1+sqrt(3)*s1得出结果:a= 2.217379b= 22.402625.2为检验某自来水消毒设备的效果,现从消毒后的水中随机抽取50L,化验每升水中大肠杆菌的个数(假设1L水中大肠杆菌的个数服从泊松分布),其化验结果如下表所示:试问平均每升水中大肠杆菌个数为多少时,才能使上述情况的概率达到最大大肠杆菌数/L:0 1 2 3 4 5 6水的升数:17 20 10 2 1 0 0γ=u是最大似然估计程序代码:a=seq(0,6,by=1)b=c(17,20,10,2,1,0,0)c=a*bd=mean(c)得出结果:d= 7.1428575.3已知某种木材的横纹抗压力服从正态分布,现对十个试件做横纹抗压力试验,得数据如下:482 493 457 471 510 446 435 418 394 469(1)求u的置信水平为0.95的置信区间程序代码:x=c(482 493 457 471 510 446 435 418 394 469 )t.test(x)得出结果:data: xt = 6.2668, df = 9, p-value = 0.0001467alternative hypothesis: true mean is not equal to 095 percent confidence interval:7.668299 16.331701sample estimates:mean of x12由答案可得:u的置信水平为0.95的置信区间[7.668299 16.33170 1](2)求σ的置信水平为0.90的置信区间程序代码:chisq.var.test<-function(x,var,alpha,alternative="two.sided"){options(digits=4)result<-list()n<-length(x)v<-var(x)result$var<-vchi2<-(n-1)*v/varresult$chi2<-chi2p<-pchisq(chi2,n-1)result$p.value<-pif(alternative=="less")result$p.value<-pchaisq(chi2,n-1,loer.tail=F)else if(alternative=="two.sider")result$p.value<-2*min(pchaisq(chi2,n-1),pchaisq(chi2,n-1,lower.tail=F))result$conf.int<-c((n-1)*v/qchisq(alpha/2,df=n-1,lower.tail=F),(n-1)*v/qchisq(alpha/2,df=n-1,lower.tail=T))result}x<-c(482,493,457,471,510,446,435,418,394,469)y=var(x)chisq.var.test(x,0.048^2,0.10,alternative="two.side")得出结果:$conf.int: 659.8 3357.0由答案可得:σ的置信水平为0.90的置信区间[659.8 3357.0] 5.4某卷烟厂生产两种卷烟A和B 现分别对两种香烟的尼古丁含量进行6次试验,结果如下:A:25 28 23 26 29 22B:28 23 30 35 21 27若香烟的尼古丁含量服从正态分布(1)问两种卷烟中尼古丁含量的方差是否相等(通过区间估计考察)(2)试求两种香烟的尼古丁平均含量差的95%置信区间(1)程序代码:X=c(25,28,23,26,29,22)Y=c(28,23,30,35,21,27)Var.test(x,y)得出结果:F test to compare two variancesdata: x and yF = 0.2992, num df = 5, denom df = 5, p-value = 0.2115 alternative hypothesis: true ratio of variances is not equa l to 195 percent confidence interval:0.04187 2.13821sample estimates:ratio of variances0.2992由答案可得:其方差不相等,方差区间为[0.04187 2.13821](2)5.5 比较两个小麦品种的产量,选择24块条件相似地实验条,采用相同的耕作方法做实验,结果播种甲品种的12块实验田的单位面积产量和播种乙品种的12块试验田的单位面积产量分别为:A:628 583 510 554 612 523 530 615 573 603 334 564B:535 433 398 470 567 480 498 560 503 426 338 547假定每个品种的单位面积产量服从正态分布,甲品种产量的方差为2140,乙品种产量的方差为3250,试求这两个品种平均面积产量差的置信水平为0.95的置信上限和置信水平为0.90的置信下限。
r语言编程与基础高级绘图课后答案
r语言编程与基础高级绘图课后答案文章目录第1章R语言概述1.选择题(1)多行注释的快捷键是(C)。
A.Ctrl+Shin+NB.Ctrl+NC.Ctrl+Shin+CD.Ctrl+C(2)以下函数不能直接查看plot函数的帮助文档的是(B)。
A. ?plotB.??plotC.help(plot)D.help(plot)(3)以下R包的加载方式正确的是(A)。
A.install.package 函数B.library 函数C…libPaths 函数D.install 函数(4)以下R包中不能调用分类算法的是(D)。
A.nnet包B.e1071包C.tree包D.arules包2.操作题第2章数据对象与数据读写1.选择题(4)下列选项不是逻辑型数据的是(C)。
A.TB.FC.NAD.10(5)下列可以求矩阵的特征值和特征向量的函数是(B)。
A. diagB. eigenC.solveD. det(6)下列选项中可以使得列表转换为向量的是(D)。
A. as.matrixB. as.data.frameC. as.listD. unlist(7)下列用来转换数据框的函数是(B)。
A. as.listB. as.matrixC. as.data.frameD. as.vector(8)下列用键盘导人数据的函数是(B)。
A.read.tableB. read.csvC.editD.readHTMLTable(9)RODBC包中向数据库提交一个查询,并返回结果的函数是(B)。
A.odbcConnectB.sqlFetchC. sqlQueryD. sqlDrop(10)抓取网页上的表格,可使用XML包的是(D)函数。
A.read.csvB. read.tableC.read.xlsxD. read HTMLTable2.操作题(1)创建一个对象,并进行数据类型的转换、判别等操作,步骤如下:①创建一个对象x,内含元素为序列:1,3,5,6,8②判断对象x是否是数值型数据③将对象转换为逻辑数据,记为x1④判断x1是否为逻辑型数据。
【最新】R语言数据分析统计自测题(附答案代码数据)
MATH11176Lab2SolutionsIoannis PapastathopoulosOctober2,20171Exercise1:Using the command load("lab2E1.RData"),read in the data for this ques-tion.Thefile contains a100×100matrix,M,and a vector,v,of length10,000.1.What is the5,001th entry of v?2.What is the(i,j)=(12,34)entry of the matrix M?3.What is the standard deviation of the elements of v?4.What is the mean of the elements of M?5.How many elements of M are strictly greater than0.75?6.If X is a matrix,denote the(i,j)-th element by X(i,j).Let X r1:r2,c1:c2 denote the submatrix consisting of the rows r1to r2and columns c1to c2.Calculate,for r1=13,r2=45,c1=71,c2=95,the sum of theelements of the appropriate submatrix in Mr2−r1∑i=1c2−c1∑j=1M r1:r2,c1:c2(i,j).11.1Answer11.1.1ans:v[5001]=0.7931.1.2ans:M[12,34]=0.401.1.3ans:0.28sd(v)=0.29061.1.4ans:2.99mean(M)=0.5041.1.5ans:9876sum(M>0.75)=25421.1.6ans:sum(M[13:45,71:95])413.48822Exercise2Using the command load("lab2E2.data"),read in the data for this ques-tion.Thefile contains four vectors day,route,time and dist.These vectors represent day index,route taken,running times(in minutes),routes and distance covered(in miles).(a)What is the total distance covered in the24days?(b)What was the fastest time for the route"f1"in minutes?(c)What was the fastest average speed in any particular run in miles/minute?(d)On what day was this pace achieved?2.1Answer22.1.1(a)ans:race%>%select(dist)%>%sum104.6or=sum(race$dist)=2.1.2(b)ans:race%>%filter(route=="f1")%>%select(time)%>%min21.14311minutesor,min(race[race$route=="f1",]$time)22.1.3(c)ans:race%>%mutate(avg=dist/time)%>%select(avg) %>%min0.1010miles/minuteor avg<-min(race$dist/race$time)2.1.4(d)ans:avg<-race%>%mutate(avg=dist/time)%>%select(avg) minavg<-avg%>%minrace$day[which(avg==minavg)][1]ThursdayLevels:Monday Saturday Sunday Thursday Tuesday Wednesdayavg<-race$dist/race$time)minavg<-min(avg)race$day[which(avg==minavg)][1]ThursdayLevels:Monday Saturday Sunday Thursday Tuesday Wednesday3Exercise3:Run the following code.set.seed(1)n<-1000A<-matrix(runif(n*n),n,n)x<-runif(n)Evaluate x T Ax,tr(A),tr(A T W A),where W is a diagonal matrix withW ii=x i.Hint:Use your favourite search engine to look for"how to computethe trace in R"and"how to construct a diagonal matrix in R".3.1Answer3ans1<-t(x)%*%A%*%xans2<-sum(diag(A))ans3<-sum(diag(A%*%W%*%A))34Exercise4:Re-write the following code such that you eliminate all loopsX<-matrix(runif(100000),nrow=1000,ncol=100)z<-rep(0,1000)for(i in1:1000){for(j in1:100){z[i]<-z[i]+X[i,j]}}Confirm that all three versions give the same answers,but that your re-writes are much faster than the original.Hint:Use apply,rowSums and system.time.4.1Answerz<-rowSums(X)system.time(rowSums(X))z<-apply(X,1,sum)system.time(apply(X,1,sum))5Exercise5:Re-write the following,replacing the loop with efficient code.n<-100000z<-rnorm(n)zneg<-NULLj<-1for(i in1:n){if(z[i]<0){zneg[j]<-z[i]j<-j+1}}4Hint:5.1Answerzneg2<-z[z<0]Check zneg and zneg2are the same vectorscbind(zneg,zneg2)%>%headcbind(zneg,zneg2)%>%tail6Exercise6:Is the following table in tidy form?If not discuss how you would convert it to one.Non-infectious InfectiousTreatment A5763Treatment B70426.1Answertreatment disease status frequencyA Infectious63B Infectious42A Non-infectious57B Non-infectious705。
R语言与统计分析第五章习题答案
#5。
1x<—c(3,5,7,9,11,13,15,17,19,21)y〈-c(21,16,15,26,22,14,21,22,18,25)e=sum(x*y)/sum(y) #样本期望d=(sum(x*x*y)/sum(y))-e^2 #样本方差a=(8*e+sqrt(64*e^2—4*4*(4*e^2-12*d)))/8 #估计结果b=(8*e—sqrt(64*e^2—4*4*(4*e^2—12*d)))/8ab#5。
2x<-c(0,1,2,3,4,5,6)y〈-c(17,20,10,2,1,0,0)e=2.718281828459f〈-function(λ)(e^(—50*λ)*λ^50)/(2^10*6^2*24) #似然函数optimize(f,c(0,2),maximum=TRUE)#5.3x<-c(482,493,457,471,510,446,435,418,394,469)#0.95置信区间t。
test(x)$conf.intchisq。
var。
test<-function(x,var,alpha,alternative=”two.sided”){options(digits=4)result<—list()n〈—length(x)v〈-var(x)result$var<-vchi2<-(n—1)*v/varresult$chi2<—chi2p<—pchisq(chi2,n—1)result$p.value<—pif(alternative==”less")result$p。
value〈-pchaisq(chi2,n—1,lower。
tail=F)else if (alternative==”two。
sider”)result$p.value<—2*min(pchaisq(chi2,n—1),pchaisq(chi2,n-1,lower.tail=F))result$conf。
r语言代码试题答案步骤(1)
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"Copyright (C) 2017 The R Foundation for Statistical ComputingPlatform: x86_64-w64-mingw32/x64 (64-bit)R是自由软件,不带任何担保。
在某些条件下你可以将其自由散布。
用'license()'或'licence()'来看散布的详细条件。
R是个合作计划,有许多人为之做出了贡献.用'contributors()'来看合作者的详细情况用'citation()'会告诉你如何在出版物中正确地引用R或R程序包。
用'demo()'来看一些示范程序,用'help()'来阅读在线帮助文件,或用'help.start()'通过HTML浏览器来看帮助文件。
用'q()'退出R.[原来保存的工作空间已还原]> h=read.csv("F://1.csv",header=true)Error in read.table(file = file, header = header, sep = sep, quote = quote, : 找不到对象'true'> h=read.csv("F://1.csv",header=TRUE)> h地区 x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 北京 7535 2639 1971 1658 3696 84742 87475 106.5 1.3 240462 天津 7344 1881 1854 1556 2254 61514 93173 107.5 3.6 200243 河北4211 1542 1502 1047 1204 38658 36584 104.1 3.7 125314 山西3856 1529 1439 906 1506 44236 33628 108.8 3.3 122125 内蒙古5463 2730 1584 1354 1972 46557 63886 109.6 3.7 177176 辽宁5809 2042 1433 1310 1844 41858 56649 107.7 3.6 165947 吉林4635 2045 1594 1448 1643 38407 43415 111.0 3.7 146148 黑龙江4687 1807 1337 1181 1217 36406 35711 104.8 4.2 129849 上海9656 2111 1790 1017 3724 78673 85373 106.0 3.1 2625310 江苏6658 1916 1437 1058 3078 50639 68347 112.6 3.1 1882511 浙江7552 2110 1552 1228 2997 50197 63374 104.5 3.0 2154512 安徽5815 1541 1397 1143 1933 44601 28792 105.3 3.7 1501213 福建7317 1634 1754 773 2105 44525 52763 104.6 3.6 1859314 江西5072 1477 1174 671 1487 38512 28800 106.7 3.0 1277615 山东5201 2197 1572 1005 1656 41904 51768 106.9 3.3 1577816 河南4607 1886 1191 1085 1525 37338 31499 106.8 3.1 1373317 湖北5838 1783 1371 1030 1652 39846 38572 105.6 3.8 1449618 湖南5442 1625 1302 918 1738 38971 33480 105.7 4.2 1460919 广东8258 1521 2100 1048 2954 50278 54095 107.9 2.5 2239620 广西5553 1146 1377 884 1626 36386 27952 107.5 3.4 1424421 海南6556 865 1521 993 1320 39485 32377 107.0 2.0 1445722 重庆6870 2229 1177 1102 1471 44498 38914 107.8 3.3 1657323 四川6074 1651 1284 773 1587 42339 29608 105.9 4.0 1505024 贵州4993 1399 1014 655 1396 41156 19710 105.5 3.3 1258625 云南5468 1760 974 939 1434 37629 22195 108.9 4.0 1388426 西藏5518 1362 845 467 550 51705 22936 109.5 2.6 1118427 陕西5551 1789 1322 1212 2079 43073 38564 109.4 3.2 1533328 甘肃4602 1631 1288 1050 1388 37679 21978 108.6 2.7 1284729 青海4667 1512 1232 906 1097 46483 33181 110.6 3.4 1234630 宁夏4769 1876 1193 1063 1516 47436 36394 105.5 4.2 1406731 新疆5239 2031 1167 1028 1281 44576 33796 114.8 3.4 13892> lm=lm(y~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=h)> lmCall:lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9,data = h)Coefficients:(Intercept) x1 x2 x3 x4 x5 x6 x7 x8 x9320.640948 1.316588 1.649859 2.178660 -0.005609 1.684283 0.010320 0.003655 -19.130576 50.515575> summary(lm)Call:lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9,data = h)Residuals:Min 1Q Median 3Q Max-940.13 -195.24 3.42 239.00 476.06Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 3.206e+02 3.952e+03 0.081 0.936097x1 1.317e+00 1.062e-01 12.400 3.97e-11 ***x2 1.650e+00 3.008e-01 5.484 1.93e-05 ***x3 2.179e+00 5.199e-01 4.190 0.000412 ***x4 -5.609e-03 4.766e-01 -0.012 0.990720x5 1.684e+00 2.142e-01 7.864 1.08e-07 ***x6 1.032e-02 1.343e-02 0.769 0.450665x7 3.655e-03 1.070e-02 0.342 0.736006x8 -1.913e+01 3.197e+01 -0.598 0.555983x9 5.052e+01 1.502e+02 0.336 0.739986---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 389.4 on 21 degrees of freedomMultiple R-squared: 0.9923, Adjusted R-squared: 0.9889F-statistic: 298.9 on 9 and 21 DF, p-value: < 2.2e-16> pre=fitted.values(lm)> res=residuals(lm)> sd(res)[1] 325.7967> res=residuals(lm)> dy=step(lm)Start: AIC=377.73y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9Df Sum of Sq RSS AIC - x4 1 21 3184326 375.73 - x9 1 17149 3201454 375.90 - x7 1 17700 3202005 375.90 - x8 1 54295 3238599 376.26 - x6 1 89586 3273891 376.59 <none> 3184305 377.73 - x3 1 2662593 5846898 394.57 - x2 1 4561056 7745361 403.29 - x5 1 9377500 12561805 418.28 - x1 1 23314547 26498852 441.42Step: AIC=375.73y ~ x1 + x2 + x3 + x5 + x6 + x7 + x8 + x9Df Sum of Sq RSS AIC - x9 1 17428 3201754 373.90 - x7 1 18563 3202889 373.91- x6 1 91813 3276139 374.61 <none> 3184326 375.73 - x3 1 2936130 6120456 393.99 - x2 1 5467941 8652267 404.72 - x5 1 9393345 12577671 416.32 - x1 1 25886086 29070412 442.29Step: AIC=373.9y ~ x1 + x2 + x3 + x5 + x6 + x7 + x8Df Sum of Sq RSS AIC - x7 1 34634 3236387 372.24 - x6 1 74800 3276554 372.62 - x8 1 82150 3283904 372.69 <none> 3201754 373.90 - x3 1 3055353 6257107 392.67 - x2 1 5725836 8927590 403.69 - x5 1 9382624 12584378 414.33 - x1 1 25868832 29070586 440.29Step: AIC=372.24y ~ x1 + x2 + x3 + x5 + x6 + x8Df Sum of Sq RSS AIC- x6 1 152777 3389165 371.67 <none> 3236387 372.24 - x3 1 5501284 8737672 401.02 - x2 1 8895049 12131436 411.20 - x5 1 9458098 12694485 412.60 - x1 1 27733098 30969486 440.25Step: AIC=370.91y ~ x1 + x2 + x3 + x5 + x6Df Sum of Sq RSS AIC - x6 1 137540 3444741 370.17 <none> 3307201 370.91 - x3 1 5771063 9078264 400.21 - x2 1 8871193 12178394 409.32 - x5 1 9473521 12780722 410.81 - x1 1 28248162 31555363 438.83Step: AIC=370.17y ~ x1 + x2 + x3 + x5Df Sum of Sq RSS AIC <none> 3444741 370.17 - x3 1 5717883 9162624 398.50- x2 1 10249815 13694556 410.95- x5 1 10998313 14443054 412.60- x1 1 33258637 36703378 441.52> summary(dy)Call:lm(formula = y ~ x1 + x2 + x3 + x5, data = h)Residuals:Min 1Q Median 3Q Max-943.18 -161.05 12.74 250.93 566.25Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -1694.6269 562.9773 -3.010 0.00574 **x1 1.3642 0.0861 15.844 7.11e-15 ***x2 1.7679 0.2010 8.796 2.86e-09 ***x3 2.2894 0.3485 6.569 5.76e-07 ***x5 1.7424 0.1912 9.111 1.42e-09 ***---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 364 on 26 degrees of freedomMultiple R-squared: 0.9916, Adjusted R-squared: 0.9903F-statistic: 769.2 on 4 and 26 DF, p-value: < 2.2e-16>newdata=data.frame(x1=5200,x2=2000,x3=1100,x4=1000,x5=1300,x6=45000,x7=34000,x8=115.0 ,x9=3.8)> predict(dy,newdata,interval="confidence")fit lwr upr1 13718.67 13468.98 13968.36>> h=ts(read.csv("F://3.csv",header=TRUE)) > hTime Series:Start = 1End = 56Frequency = 1X78[2,] 53 [3,] -63 [4,] 13 [5,] -6 [6,] -16 [7,] -14 [8,] 3 [9,] -74 [10,] 89 [11,] -48 [12,] -14 [13,] 32 [14,] 56 [15,] -86 [16,] -66 [17,] 50 [18,] 26 [19,] 59 [20,] -47 [21,] -83 [22,] 2 [23,] -1 [24,] 124 [25,] -106[27,] -76 [28,] -47 [29,] -32 [30,] 39 [31,] -30 [32,] 6 [33,] -73 [34,] 18 [35,] 2 [36,] -24 [37,] 23 [38,] -38 [39,] 91 [40,] -56 [41,] -58 [42,] 1 [43,] 14 [44,] -4 [45,] 77 [46,] -127 [47,] 97 [48,] 10 [49,] -28 [50,] -17[52,] -2[53,] 48[54,] -131[55,] 65[56,] -17> plot(h,type="o")> local({pkg <- select.list(sort(.packages(all.available = TRUE)),graphics=TRUE) + if(nchar(pkg)) library(pkg, character.only=TRUE)})Warning message:程辑包‘urca’是用R版本3.4.4 来建造的> adf=ur.df(as.vector(h),type=c("drift"),selectlags=c("AIC"))> summary(adf)################################################ Augmented Dickey-Fuller Test Unit Root Test ################################################Test regression driftCall:lm(formula = z.diff ~ g.1 + 1 + g)Residuals:Min 1Q Median 3Q Max-96.191 -23.390 -0.581 18.446 133.241Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -9.4381 7.0489 -1.339 0.187g.1 -1.7837 0.2386 -7.476 9.65e-10 ***g 0.1956 0.1379 1.418 0.162---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 50.89 on 51 degrees of freedomMultiple R-squared: 0.7589, Adjusted R-squared: 0.7494F-statistic: 80.25 on 2 and 51 DF, p-value: < 2.2e-16Value of test-statistic is: -7.4761 27.9471Critical values for test statistics:1pct 5pct 10pcttau2 -3.51 -2.89 -2.58phi1 6.70 4.71 3.86> acf(h)> pacf(h)> ar=sarima(h,1,0,4,details=F)> ar$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,Q), period = S), xreg = xmean, include.mean = FALSE, optim.control = list(trace = trc, REPORT = 1, reltol = tol))Coefficients:ar1 ma1 ma2 ma3 ma4 xmean-0.0957 -0.7605 -0.051 -0.2591 0.0706 -5.0886s.e. 0.7318 0.7244 0.637 0.2013 0.1939 0.4252sigma^2 estimated as 1850: log likelihood = -291.97, aic = 597.95$degrees_of_freedom[1] 50$ttableEstimate SE t.value p.valuear1 -0.0957 0.7318 -0.1308 0.8965ma1 -0.7605 0.7244 -1.0498 0.2988ma2 -0.0510 0.6370 -0.0800 0.9365ma3 -0.2591 0.2013 -1.2875 0.2038ma4 0.0706 0.1939 0.3641 0.7173xmean -5.0886 0.4252 -11.9668 0.0000$AIC[1] 8.73734$AICc[1] 8.814721$BIC[1] 7.954342> ma=sarima(h,0,1,1,details=F)> ma$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), xreg = constant, optim.control = list(trace = trc, REPORT = 1, reltol = tol))Coefficients:ma1 constant-1.0000 0.1275s.e. 0.0452 0.4833sigma^2 estimated as 3412: log likelihood = -303.77, aic = 613.53$degrees_of_freedom[1] 53$ttableEstimate SE t.value p.valuema1 -1.0000 0.0452 -22.1390 0.000constant 0.1275 0.4833 0.2638 0.793$AIC[1] 9.206399$AICc[1] 9.250355$BIC[1] 8.278733> arma=sarima(h,1,1,1,details=F)> arma$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), xreg = constant, optim.control = list(trace = trc, REPORT = 1,reltol = tol))Coefficients:ar1 ma1 constant-0.4893 -1.0000 0.1052s.e. 0.1161 0.0469 0.2858sigma^2 estimated as 2548: log likelihood = -296.27, aic = 600.53$degrees_of_freedom[1] 52$ttableEstimate SE t.value p.valuear1 -0.4893 0.1161 -4.2127 0.0001ma1 -1.0000 0.0469 -21.3207 0.0000constant 0.1052 0.2858 0.3680 0.7143$AIC[1] 8.950118$AICc[1] 8.999838$BIC[1] 8.058619> res=residuals(ar$fit)> Box.test(res)Box-Pierce testdata: resX-squared = 0.0040697, df = 1, p-value = 0.9491 > plot(res*res)> res<-residuals(ma$fit)> resTime Series:Start = 1End = 56Frequency = 1[1] -5.812742e-02 7.839872e+01 -4.955419e+01 3.066745e+016.646768e+00 -3.818017e+00 -1.493191e+00 1.448967e+01 -5.993782e+01 1.009045e+02 -3.947443e+01 -3.604570e+00 4.075719e+01 6.073798e+01 -8.076426e+01 -5.630655e+01[17] 5.952211e+01 3.267028e+01 6.289877e+01 -4.376929e+01-7.688972e+01 9.609734e+00 6.123687e+00 1.281064e+02 -1.026027e+02 1.160447e+02 -7.392804e+01 -4.288658e+01 -2.676745e+01 4.382151e+01 -2.561905e+01 1.050204e+01[33] -6.774055e+01 2.380823e+01 7.222574e+00 -1.874297e+012.800543e+01 -3.305934e+01 9.500912e+01 -5.267336e+01 -5.347395e+01 5.982226e+00 1.856342e+01 2.163113e-01 8.018037e+01 -1.234786e+02 1.006553e+02 1.232087e+01[49] -2.566963e+01 -1.438774e+01 2.537689e+01 -6.110995e-044.939921e+01 -1.289854e+02 6.746525e+01 -1.514136e+01> Box.test(res)#Box-Pierce testdata: resX-squared = 13.335, df = 1, p-value = 0.0002606> yc=sarima.for(h,10,1,1,1)> yc$predTime Series:Start = 57End = 66Frequency = 1[1] 5.106162 -5.553160 -0.181147 -2.652867 -1.286844 -1.798532 -1.391503 -1.433980 -1.256525 -1.186677。
r语言初学者指南第二章答案
r语言初学者指南第二章答案English Answer:1. What is the difference between a vector and a matrix?A vector is a one-dimensional array of values, while a matrix is a two-dimensional array of values.2. What is the difference between a factor and a character vector?A factor is a categorical variable that can take ona limited number of values, while a character vector is a string of characters.3. What is the difference between a data frame and a tibble?A data frame is a tabular data structure that can contain different types of variables, while a tibble is aspecial type of data frame that is designed to be more user-friendly and consistent.4. What is the difference between a function and a procedure?A function is a self-contained block of code that returns a value, while a procedure is a self-contained block of code that does not return a value.5. What is the difference between a loop and a conditional statement?A loop is a control structure that repeats a block of code a specified number of times, while a conditional statement is a control structure that executes a block of code only if a certain condition is met.6. What is the difference between a package and a library?A package is a collection of R functions and datathat can be installed and loaded into R, while a library is a collection of packages that are pre-installed with R.7. What is the difference between a plot and a chart?A plot is a graphical representation of data, while a chart is a more specific type of plot that typically displays data in a specific way, such as a bar chart or a line chart.8. What is the difference between a regression model and a classification model?A regression model is a statistical model that predicts a continuous outcome variable, while a classification model is a statistical model that predicts a categorical outcome variable.9. What is the difference between a supervised learning algorithm and an unsupervised learning algorithm?A supervised learning algorithm learns from labeleddata, while an unsupervised learning algorithm learns from unlabeled data.10. What is the difference between a time series and a cross-sectional data set?A time series is a collection of data points that are ordered by time, while a cross-sectional data set is a collection of data points that are collected at a single point in time.中文回答:1. 向量和矩阵的区别是什么?向量是一个一维值数组,而矩阵是一个二维值数组。
R语言 第二章课后习题答案
#探索题1rm(list=ls(all=TRUE))x<-1:5;y<-2*1:5;x%o%youter(x,y)outer(x,y,fun='*')outer(x,y,fun='/')outer(x,y,fun='+')outer(x,y,fun='-') #结果一样#探索题2rm(list=ls(all=TRUE))A<-t(array(c(1:12),dim=c(4,3)))b<-c(1,1,1)x<-solve(A,b);x #有错误出现,提示必须为正方形,超定方程不可解#探索题3rm(list=ls(all=TRUE))lst<-list(name="Fred",wife="Mary",no.children=3,child.ages=c(4,7,9));lstlst[[2]]lst[[4]][2]lst[[4]][1:2]lst[["child.ages"]][1:2] #lst[[4]][1:2]或者lst[["child.ages"]][1:2]可以运行#探索题4rm(list=ls(all=TRUE))lst1<-list(name="Fred",wife="Mary",no.children=3,child.ages=c(4,7,9));lst1 lst2<-list(name="Mark",wife="Nacy",no.children=4,child.ages=c(2,5,10));lst2 lst.12<-c(lst1,lst2);lst.12 #实现列表捆绑lst.12<-list(lst1,lst2);lst.12 #也可实现列表捆绑,但是结果。
R语言基础操作练习代码
R语言基础操作练习代码R语言基本操作#构建子集:原始数据-预处理数据#[]提取一个或者多个类型相同的元素,[[]]从列表中或者数据框提取元素,$按照名称从列表中或数据框中提取元素x <-1:10x[1]x[1:4]x[x>5]x[x>5 & x<9]x[x>9 | x<5]y <-1:7ynames(y) <- c("a","b","c","d")y[1]y["d"]#RStudio-1.0.153这个版本,console中有时候不直接读取y <-1:7,需要重新输入y才可以显示x <-matrix(1:6,nrow = 2,ncol = 3)xx[1,2]x[1,]x[,3]x[1,c(1,2)]which #helpx[1,2,drop = FALSE]#用矩阵的形式拿出需要的数据x <-data.frame(v1 = 1:5,v2 = 6:10,v3 = 11:15)x[1]x[[1]]x$v1#列表,主要用到[[]]/[[]][]/[[]][[]]不完全匹配x <-list(id = 1:4,height = 170,gender = "male")x[1]x["id"]x[["id"]]x[[1]]#只拿内容,不拿名称x$idx[c(1,3)]y <- "id"x[[y]]x[["id"]]x$idx$y#结果为NULL,不可以使用x <- list(a = list(1,2,3,4),b = c("yinyanyan","zhaoke"))x[[1]]x[[1]][[2]]x[[1]][2]x[[c(1,3)]]#miss value 缺失值x <- c(1,NA,2,NA,3)is.na(x)x[!is.na(x)]#!是取反的意思x <- c(1,NA,2,NA,3)y <- c("a","b",NA,"c",NA)z <- complete.cases(x,y)#判断x[z]y[z]library(datasets)#加载R语言中自带的数据集head(airquality)#列出前面6行的数据g <- complete.cases(airquality)airquality[g,][1:10,]#向量化操作#可以作用于向量、矩阵等结构,使得代码简介,易于阅读,效率高x <- 1:5y <-6:10x+yx-yx*yx/yx <-matrix(1:4,nrow = 2,ncol = 2)y <-matrix(rep(2,4),nrow = 2,ncol = 2)x+yx-yx*yx/yx %*% y #矩阵乘法x %/% y #矩阵除法R语言变量脚本#vectorx <-vector("interger",length = 10)x <-vector("numeric",length = 10)x <-vector("character",length = 10)x1 <-1:5x2 <- C(1,2,3,4,5,6)x3 <- c(10,"a",TRUE)x4 <- c("a","b","c")as.numeric(x4)as.logical(x3)#matrixx <-matrix(nrow = 3,ncol = 4)#norw代表行,ncol代表列x1 <-matrix(1:6,nrow = 2,ncol = 3)dim(x1)#查看x1有几行几列attributes(x1)y <-1:6dim(y) <- c(2,3)y2 <- matrix(1:6,nrow = 2,ncol = 3)rbind(y,y2)#合并矩阵,按照行拼接cbind(y,y2)#合并矩阵,按照列拼接#array 数组,与矩阵类型,但是维度可以大于2,而矩阵只能等于 2 x <-array(1:24,dim = c(4,6),dimnames = NULL)x1 <-array(1:24,dim = c(2,3,4),dimnames = NULL)#list 可以包含不同类型的值l <-list("a",2,15L,3+4i,TRUE)l2 <-list(a =1,b =2,c =3)l3 <-list(c(1,2,3),c(4,5,6))x <-matrix(1:6,nrow = 2,ncol = 3)dimnames(x)<-list(c("a","b"),c("c","s","f"))#factor 处理分类数据,相当于整数向量+标签,优于整数向量x <-factor(c("female","male","female","male","female","male"))x1 <-factor(c("female","male","female","male","female","male"),levels = c("male","female"))table(x)unclass(x)#去除属性class(unclass(x))#missing value 缺失值,有NA和NaN两种,前者包含后者,后者特指数值缺失y <-c(1,NA,2,NA,3,NA)is.na(y)is.nan(y)y1 <-c(1,NaN,2,NA,3,NA)is.na(y1)is.nan(y1)#date frame 数据框,存储表格数据(tabular data),每个元素代表一列数据,其长度代表行数,类型可以不同df <-data.frame(id = c(1,4,2,3),names = c("a","c"),gender = c(TRUE,FALSE,TRUE,FALSE))nrow(df)#查看df的行数ncol(df)#查看df的列数dg <-data.frame(a = c(1,2,3,4),b = c(5,6,7,8),c = c(9,0,11,12))data.matrix(dg)#date or time date(),sys.date(),weekdays(),months(),quarters()date()X <- date()x <- Sys.Date()y <- as.Date("2019-11-02")weekdays()months()quarters()x <-Sys.time()x <-Sys.posixlt()R语言基础函数#mapply 是lapply的多元版本,mapply的参数为mapply(函数/函数名,数据函数相关的参数)#mapplylist(rep(1,4),rep(2,3),rep(3,2),rep(4,1))mapply(rep,1:4,4:1)s <-function(n,mean,std){rnorm(n,mean,std)}s(4,0,1)mapply(s,1:5,5:1,2)#第一个参数调用s这个函数,第二个参数对应s中的“4”,抽取几个数,第三个参数对应s中的“0”,平均值是多少,第四个参数对应s中的“1”,标准差是多少#tapply对向量子集进行操作,tapply的参数是tapply(向量,因子or因子列表,函数or函数名)#tapplyx <- c(rnorm(5),runif(5),rnorm(5,1))f <-gl(3,5)tapply(x,f,mean)tapply(x,f,mean,simplify = TRUE)#split#根据因子或因子列表将向量或其他对象分组通常与lapply- -起使用#split(参数): split(向量/列表/数据框,因子/因子列表)# splitx <- c(rnorm(5), runif(5), rnorm(5,1) )f <- gl(3,5)split(x,f)lapply(split(x,f),mean)head(airquality)s <- split(airquality, airquality$Month)table(airquality$Month)lapply(s,function(x) colMeans(x[,c("Ozone","Wind","Temp")]))sapply(s,function(x) colMeans(x[,c("Ozone","Wind","Temp")]))sapply(s,function(x) colMeans(x[,c("Ozone","Wind","Temp")]),na.rm = TRUE)#sort:对向量进行排序;返回排好序的内容#order:返回排好序的内容的下标/多个排序标准x <- data.frame(v1 = 1:5,v2 =c(10,7,9,6,8),v3 =11:15,v4 =c(rep(1,2),rep(2,2),1))sort(x$v2)sort(x$v2, decreasing = TRUE)order(x$v2)x[order(x$v2),]x[order(x$v4,x$v2,decreasing = TRUE),]#summarize datahead(airquality)tail(airquality)summary(airquality)str(airquality)table(airquality$Month)table(airquality$Ozone)table(airquality$Ozone,useNA = "ifany") table(airquality$Month,airquality$Day)any(is.na(airquality$Ozone))sum(is.na(airquality$Ozone))all(airquality$Month<12)titanic <- as.data.frame(Titanic)head(titanic)dim(titanic)summary(titanic)x <- xtabs(Freq ~ Class + Age, data=titanic) ftable(x)object.size(airquality)print(object.size(airquality),units = "Kb")。
r语言与统计分析第五章课后答案 (2)
第五章5.1 设总体x是用无线电测距仪测量距离的误差,它服从(α,β)上的均匀分布,在200次测量中,误差为xi的次数有ni次:Xi:3 5 7 9 11 13 15 17 19 21Ni:21 16 15 26 22 14 21 22 18 25求α,β的矩法估计值α=u-sβ=u+s程序代码:x=seq(3,21,by=2)y=c(21,16,15,26,22,14,21,22,18,25)u=rep(x,y)u1=mean(u)s=var(u)s1=sqrt(s)a=u1-sqrt(3)*s1b=u1+sqrt(3)*s1b=u1+sqrt(3)*s1得出结果:a= 2.217379b= 22.402625.2为检验某自来水消毒设备的效果,现从消毒后的水中随机抽取50L,化验每升水中大肠杆菌的个数(假设1L水中大肠杆菌的个数服从泊松分布),其化验结果如下表所示:试问平均每升水中大肠杆菌个数为多少时,才能使上述情况的概率达到最大大肠杆菌数/L:0 1 2 3 4 5 6水的升数:17 20 10 2 1 0 0γ=u是最大似然估计程序代码:a=seq(0,6,by=1)b=c(17,20,10,2,1,0,0)c=a*bd=mean(c)得出结果:d= 7.1428575.3已知某种木材的横纹抗压力服从正态分布,现对十个试件做横纹抗压力试验,得数据如下:482 493 457 471 510 446 435 418 394 469 (1)求u的置信水平为0.95的置信区间程序代码:x=c(482 493 457 471 510 446 435 418 394 469 )t.test(x)得出结果:data: xt = 6.2668, df = 9, p-value = 0.0001467alternative hypothesis: true mean is not equal to 095 percent confidence interval:7.668299 16.331701sample estimates:mean of x12由答案可得:u的置信水平为0.95的置信区间[7.668299 16.331701] (2)求σ的置信水平为0.90的置信区间程序代码:chisq.var.test<-function(x,var,alpha,alternative="two.sided "){options(digits=4)result<-list()n<-length(x)v<-var(x)result$var<-vchi2<-(n-1)*v/varresult$chi2<-chi2p<-pchisq(chi2,n-1)result$p.value<-pif(alternative=="less")result$p.value<-pchaisq(chi2,n-1,loer.tail=F)else if(alternative=="two.sider")result$p.value<-2*min(pchaisq(chi2,n-1),pchaisq(chi2,n-1,lower.tail=F))result$conf.int<-c((n-1)*v/qchisq(alpha/2,df=n-1,lower.tail=F),(n-1)*v/qchisq(alpha/2,df=n-1,lower.tail=T))result}x<-c(482,493,457,471,510,446,435,418,394,469)y=var(x)chisq.var.test(x,0.048^2,0.10,alternative="two.side")得出结果:$conf.int: 659.8 3357.0由答案可得:σ的置信水平为0.90的置信区间[659.8 3357.0] 5.4某卷烟厂生产两种卷烟A和B 现分别对两种香烟的尼古丁含量进行6次试验,结果如下:A:25 28 23 26 29 22B:28 23 30 35 21 27若香烟的尼古丁含量服从正态分布(1)问两种卷烟中尼古丁含量的方差是否相等(通过区间估计考察)(2)试求两种香烟的尼古丁平均含量差的95%置信区间(1)程序代码:X=c(25,28,23,26,29,22)Y=c(28,23,30,35,21,27)Var.test(x,y)得出结果:F test to compare two variancesdata: x and yF = 0.2992, num df = 5, denom df = 5, p-value = 0.2115 alternative hypothesis: true ratio of variances is not equa l to 195 percent confidence interval:0.04187 2.13821sample estimates:ratio of variances0.2992由答案可得:其方差不相等,方差区间为[0.04187 2.13821](2)5.5 比较两个小麦品种的产量,选择24块条件相似地实验条,采用相同的耕作方法做实验,结果播种甲品种的12块实验田的单位面积产量和播种乙品种的12块试验田的单位面积产量分别为:A:628 583 510 554 612 523 530 615 573 603 334 564B:535 433 398 470 567 480 498 560 503 426 338 547假定每个品种的单位面积产量服从正态分布,甲品种产量的方差为2140,乙品种产量的方差为3250,试求这两个品种平均面积产量差的置信水平为0.95的置信上限和置信水平为0.90的置信下限。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit)R是自由软件,不带任何担保。
在某些条件下你可以将其自由散布。
用'license()'或'licence()'来看散布的详细条件。
R是个合作计划,有许多人为之做出了贡献.用'contributors()'来看合作者的详细情况用'citation()'会告诉你如何在出版物中正确地引用R或R程序包。
用'demo()'来看一些示范程序,用'help()'来阅读在线帮助文件,或用'help.start()'通过HTML浏览器来看帮助文件。
用'q()'退出R.[原来保存的工作空间已还原]> h=read.csv("F://1.csv",header=true)Error in read.table(file = file, header = header, sep = sep, quote = quote, : 找不到对象'true'> h=read.csv("F://1.csv",header=TRUE)> h地区x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 北京7535 2639 1971 1658 3696 84742 87475 106.5 1.3 240462 天津7344 1881 1854 1556 2254 61514 93173 107.5 3.6 200243 河北4211 1542 1502 1047 1204 38658 36584 104.1 3.7 125314 山西3856 1529 1439 906 1506 44236 33628 108.8 3.3 122125 内蒙古5463 2730 1584 1354 1972 46557 63886 109.6 3.7 177176 辽宁5809 2042 1433 1310 1844 41858 56649 107.7 3.6 165947 吉林4635 2045 1594 1448 1643 38407 43415 111.0 3.7 146148 黑龙江4687 1807 1337 1181 1217 36406 35711 104.8 4.2 129849 上海9656 2111 1790 1017 3724 78673 85373 106.0 3.1 2625310 江苏6658 1916 1437 1058 3078 50639 68347 112.6 3.1 1882511 浙江7552 2110 1552 1228 2997 50197 63374 104.5 3.0 2154512 安徽5815 1541 1397 1143 1933 44601 28792 105.3 3.7 1501213 福建7317 1634 1754 773 2105 44525 52763 104.6 3.6 1859314 江西5072 1477 1174 671 1487 38512 28800 106.7 3.0 1277615 山东5201 2197 1572 1005 1656 41904 51768 106.9 3.3 1577816 河南4607 1886 1191 1085 1525 37338 31499 106.8 3.1 1373317 湖北5838 1783 1371 1030 1652 39846 38572 105.6 3.8 1449618 湖南5442 1625 1302 918 1738 38971 33480 105.7 4.2 1460919 广东8258 1521 2100 1048 2954 50278 54095 107.9 2.5 2239620 广西5553 1146 1377 884 1626 36386 27952 107.5 3.4 1424421 海南6556 865 1521 993 1320 39485 32377 107.0 2.0 1445722 重庆6870 2229 1177 1102 1471 44498 38914 107.8 3.3 1657323 四川6074 1651 1284 773 1587 42339 29608 105.9 4.0 1505024 贵州4993 1399 1014 655 1396 41156 19710 105.5 3.3 1258625 云南5468 1760 974 939 1434 37629 22195 108.9 4.0 1388426 西藏5518 1362 845 467 550 51705 22936 109.5 2.6 1118427 陕西5551 1789 1322 1212 2079 43073 38564 109.4 3.2 1533328 甘肃4602 1631 1288 1050 1388 37679 21978 108.6 2.7 1284729 青海4667 1512 1232 906 1097 46483 33181 110.6 3.4 1234630 宁夏4769 1876 1193 1063 1516 47436 36394 105.5 4.2 1406731 新疆5239 2031 1167 1028 1281 44576 33796 114.8 3.4 13892> lm=lm(y~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=h)> lmCall:lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9,data = h)Coefficients:(Intercept) x1 x2 x3 x4 x5 x6 x7 x8 x9320.640948 1.316588 1.649859 2.178660 -0.005609 1.684283 0.010320 0.003655 -19.130576 50.515575> summary(lm)Call:lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9,data = h)Residuals:Min 1Q Median 3Q Max-940.13 -195.24 3.42 239.00 476.06Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 3.206e+02 3.952e+03 0.081 0.936097x1 1.317e+00 1.062e-01 12.400 3.97e-11 ***x2 1.650e+00 3.008e-01 5.484 1.93e-05 ***x3 2.179e+00 5.199e-01 4.190 0.000412 ***x4 -5.609e-03 4.766e-01 -0.012 0.990720x5 1.684e+00 2.142e-01 7.864 1.08e-07 ***x6 1.032e-02 1.343e-02 0.769 0.450665x7 3.655e-03 1.070e-02 0.342 0.736006x8 -1.913e+01 3.197e+01 -0.598 0.555983x9 5.052e+01 1.502e+02 0.336 0.739986---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 389.4 on 21 degrees of freedomMultiple R-squared: 0.9923, Adjusted R-squared: 0.9889F-statistic: 298.9 on 9 and 21 DF, p-value: < 2.2e-16> pre=fitted.values(lm)> res=residuals(lm)> sd(res)[1] 325.7967> res=residuals(lm)> dy=step(lm)Start: AIC=377.73y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9Df Sum of Sq RSS AIC- x4 1 21 3184326 375.73- x9 1 17149 3201454 375.90- x7 1 17700 3202005 375.90- x8 1 54295 3238599 376.26- x6 1 89586 3273891 376.59<none> 3184305 377.73- x3 1 2662593 5846898 394.57- x2 1 4561056 7745361 403.29- x5 1 9377500 12561805 418.28- x1 1 23314547 26498852 441.42Step: AIC=375.73y ~ x1 + x2 + x3 + x5 + x6 + x7 + x8 + x9Df Sum of Sq RSS AIC- x9 1 17428 3201754 373.90- x7 1 18563 3202889 373.91- x8 1 54437 3238763 374.26- x6 1 91813 3276139 374.61<none> 3184326 375.73- x3 1 2936130 6120456 393.99- x2 1 5467941 8652267 404.72- x5 1 9393345 12577671 416.32- x1 1 25886086 29070412 442.29Step: AIC=373.9y ~ x1 + x2 + x3 + x5 + x6 + x7 + x8Df Sum of Sq RSS AIC - x7 1 34634 3236387 372.24 - x6 1 74800 3276554 372.62 - x8 1 82150 3283904 372.69 <none> 3201754 373.90 - x3 1 3055353 6257107 392.67 - x2 1 5725836 8927590 403.69 - x5 1 9382624 12584378 414.33 - x1 1 25868832 29070586 440.29Step: AIC=372.24y ~ x1 + x2 + x3 + x5 + x6 + x8Df Sum of Sq RSS AIC - x8 1 70813 3307201 370.91 - x6 1 152777 3389165 371.67 <none> 3236387 372.24 - x3 1 5501284 8737672 401.02 - x2 1 8895049 12131436 411.20 - x5 1 9458098 12694485 412.60 - x1 1 27733098 30969486 440.25Step: AIC=370.91y ~ x1 + x2 + x3 + x5 + x6Df Sum of Sq RSS AIC - x6 1 137540 3444741 370.17 <none> 3307201 370.91 - x3 1 5771063 9078264 400.21 - x2 1 8871193 12178394 409.32 - x5 1 9473521 12780722 410.81 - x1 1 28248162 31555363 438.83Step: AIC=370.17y ~ x1 + x2 + x3 + x5Df Sum of Sq RSS AIC <none> 3444741 370.17 - x3 1 5717883 9162624 398.50- x2 1 10249815 13694556 410.95- x5 1 10998313 14443054 412.60- x1 1 33258637 36703378 441.52> summary(dy)Call:lm(formula = y ~ x1 + x2 + x3 + x5, data = h)Residuals:Min 1Q Median 3Q Max-943.18 -161.05 12.74 250.93 566.25Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -1694.6269 562.9773 -3.010 0.00574 **x1 1.3642 0.0861 15.844 7.11e-15 ***x2 1.7679 0.2010 8.796 2.86e-09 ***x3 2.2894 0.3485 6.569 5.76e-07 ***x5 1.7424 0.1912 9.111 1.42e-09 ***---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 364 on 26 degrees of freedomMultiple R-squared: 0.9916, Adjusted R-squared: 0.9903F-statistic: 769.2 on 4 and 26 DF, p-value: < 2.2e-16>newdata=data.frame(x1=5200,x2=2000,x3=1100,x4=1000,x5=1300,x6=45000,x7=34000,x8=115.0 ,x9=3.8)> predict(dy,newdata,interval="confidence")fit lwr upr1 13718.67 13468.98 13968.36>> h=ts(read.csv("F://3.csv",header=TRUE)) > hTime Series:Start = 1End = 56Frequency = 1X78[1,] -58[2,] 53[3,] -63[4,] 13[5,] -6[6,] -16[7,] -14[8,] 3[9,] -74[10,] 89[11,] -48[12,] -14[13,] 32[14,] 56[15,] -86[16,] -66[17,] 50[18,] 26[19,] 59[20,] -47[21,] -83[22,] 2[23,] -1[24,] 124[25,] -106[26,] 113[27,] -76[28,] -47[29,] -32[30,] 39[31,] -30[32,] 6[33,] -73[34,] 18[35,] 2[36,] -24[37,] 23[38,] -38[39,] 91[40,] -56[41,] -58[42,] 1[43,] 14[44,] -4[45,] 77[46,] -127[47,] 97[48,] 10[49,] -28[50,] -17[51,] 23[52,] -2[53,] 48[54,] -131[55,] 65[56,] -17> plot(h,type="o")> local({pkg <- select.list(sort(.packages(all.available = TRUE)),graphics=TRUE) + if(nchar(pkg)) library(pkg, character.only=TRUE)})Warning message:程辑包‘urca’是用R版本3.4.4 来建造的> adf=ur.df(as.vector(h),type=c("drift"),selectlags=c("AIC"))> summary(adf)################################################ Augmented Dickey-Fuller Test Unit Root Test ################################################Test regression driftCall:lm(formula = z.diff ~ g.1 + 1 + g)Residuals:Min 1Q Median 3Q Max-96.191 -23.390 -0.581 18.446 133.241Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -9.4381 7.0489 -1.339 0.187g.1 -1.7837 0.2386 -7.476 9.65e-10 ***g 0.1956 0.1379 1.418 0.162---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 50.89 on 51 degrees of freedomMultiple R-squared: 0.7589, Adjusted R-squared: 0.7494F-statistic: 80.25 on 2 and 51 DF, p-value: < 2.2e-16Value of test-statistic is: -7.4761 27.9471Critical values for test statistics:1pct 5pct 10pcttau2 -3.51 -2.89 -2.58phi1 6.70 4.71 3.86> acf(h)> pacf(h)> ar=sarima(h,1,0,4,details=F)> ar$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,Q), period = S), xreg = xmean, include.mean = FALSE, optim.control = list(trace = trc, REPORT = 1, reltol = tol))Coefficients:ar1 ma1 ma2 ma3 ma4 xmean-0.0957 -0.7605 -0.051 -0.2591 0.0706 -5.0886s.e. 0.7318 0.7244 0.637 0.2013 0.1939 0.4252sigma^2 estimated as 1850: log likelihood = -291.97, aic = 597.95$degrees_of_freedom[1] 50$ttableEstimate SE t.value p.valuear1 -0.0957 0.7318 -0.1308 0.8965ma1 -0.7605 0.7244 -1.0498 0.2988ma2 -0.0510 0.6370 -0.0800 0.9365ma3 -0.2591 0.2013 -1.2875 0.2038ma4 0.0706 0.1939 0.3641 0.7173xmean -5.0886 0.4252 -11.9668 0.0000$AIC[1] 8.73734$AICc[1] 8.814721$BIC[1] 7.954342> ma=sarima(h,0,1,1,details=F)> ma$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), xreg = constant, optim.control = list(trace = trc, REPORT = 1, reltol = tol))Coefficients:ma1 constant-1.0000 0.1275s.e. 0.0452 0.4833sigma^2 estimated as 3412: log likelihood = -303.77, aic = 613.53$degrees_of_freedom[1] 53$ttableEstimate SE t.value p.valuema1 -1.0000 0.0452 -22.1390 0.000constant 0.1275 0.4833 0.2638 0.793$AIC[1] 9.206399$AICc$BIC[1] 8.278733> arma=sarima(h,1,1,1,details=F)> arma$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), xreg = constant, optim.control = list(trace = trc, REPORT = 1, reltol = tol))Coefficients:ar1 ma1 constant-0.4893 -1.0000 0.1052s.e. 0.1161 0.0469 0.2858sigma^2 estimated as 2548: log likelihood = -296.27, aic = 600.53$degrees_of_freedom[1] 52$ttableEstimate SE t.value p.valuear1 -0.4893 0.1161 -4.2127 0.0001ma1 -1.0000 0.0469 -21.3207 0.0000constant 0.1052 0.2858 0.3680 0.7143$AIC[1] 8.950118$AICc[1] 8.999838$BIC[1] 8.058619> res=residuals(ar$fit)> Box.test(res)Box-Pierce testX-squared = 0.0040697, df = 1, p-value = 0.9491> plot(res*res)> res<-residuals(ma$fit)> resTime Series:Start = 1End = 56Frequency = 1[1] -5.812742e-02 7.839872e+01 -4.955419e+01 3.066745e+016.646768e+00 -3.818017e+00 -1.493191e+00 1.448967e+01 -5.993782e+01 1.009045e+02 -3.947443e+01 -3.604570e+00 4.075719e+01 6.073798e+01 -8.076426e+01 -5.630655e+01[17] 5.952211e+01 3.267028e+01 6.289877e+01 -4.376929e+01-7.688972e+01 9.609734e+00 6.123687e+00 1.281064e+02 -1.026027e+02 1.160447e+02 -7.392804e+01 -4.288658e+01 -2.676745e+01 4.382151e+01 -2.561905e+01 1.050204e+01[33] -6.774055e+01 2.380823e+01 7.222574e+00 -1.874297e+012.800543e+01 -3.305934e+01 9.500912e+01 -5.267336e+01 -5.347395e+01 5.982226e+00 1.856342e+01 2.163113e-01 8.018037e+01 -1.234786e+02 1.006553e+02 1.232087e+01[49] -2.566963e+01 -1.438774e+01 2.537689e+01 -6.110995e-044.939921e+01 -1.289854e+02 6.746525e+01 -1.514136e+01> Box.test(res)#Box-Pierce testdata: resX-squared = 13.335, df = 1, p-value = 0.0002606> yc=sarima.for(h,10,1,1,1)> yc$predTime Series:Start = 57End = 66Frequency = 1[1] 5.106162 -5.553160 -0.181147 -2.652867 -1.286844 -1.798532 -1.391503 -1.433980 -1.256525 -1.186677。