【最新】R语言数据分析统计自测题(附答案代码数据)

合集下载

(完整版)R语言代码试题答案步骤

在某些条件下你可以将其自由散布。

用'license()'或'licence()'来看散布的详细条件。

R是个合作计划，有许多人为之做出了贡献.用'contributors()'来看合作者的详细情况用'citation()'会告诉你如何在出版物中正确地引用R或R程序包。

用'demo()'来看一些示范程序，用'help()'来阅读在线帮助文件，或用'help.start()'通过HTML浏览器来看帮助文件。

用'q()'退出R.[原来保存的工作空间已还原]> h=read.csv("F://1.csv",header=true)Error in read.table(file = file, header = header, sep = sep, quote = quote, : 找不到对象'true'> h=read.csv("F://1.csv",header=TRUE)> h地区x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 北京7535 2639 1971 1658 3696 84742 87475 106.5 1.3 240462 天津7344 1881 1854 1556 2254 61514 93173 107.5 3.6 200243 河北4211 1542 1502 1047 1204 38658 36584 104.1 3.7 125314 山西3856 1529 1439 906 1506 44236 33628 108.8 3.3 122125 内蒙古5463 2730 1584 1354 1972 46557 63886 109.6 3.7 177176 辽宁5809 2042 1433 1310 1844 41858 56649 107.7 3.6 165947 吉林4635 2045 1594 1448 1643 38407 43415 111.0 3.7 146148 黑龙江4687 1807 1337 1181 1217 36406 35711 104.8 4.2 129849 上海9656 2111 1790 1017 3724 78673 85373 106.0 3.1 2625310 江苏6658 1916 1437 1058 3078 50639 68347 112.6 3.1 1882511 浙江7552 2110 1552 1228 2997 50197 63374 104.5 3.0 2154512 安徽5815 1541 1397 1143 1933 44601 28792 105.3 3.7 1501213 福建7317 1634 1754 773 2105 44525 52763 104.6 3.6 1859314 江西5072 1477 1174 671 1487 38512 28800 106.7 3.0 1277615 山东5201 2197 1572 1005 1656 41904 51768 106.9 3.3 1577816 河南4607 1886 1191 1085 1525 37338 31499 106.8 3.1 1373317 湖北5838 1783 1371 1030 1652 39846 38572 105.6 3.8 1449618 湖南5442 1625 1302 918 1738 38971 33480 105.7 4.2 1460919 广东8258 1521 2100 1048 2954 50278 54095 107.9 2.5 2239620 广西5553 1146 1377 884 1626 36386 27952 107.5 3.4 1424421 海南6556 865 1521 993 1320 39485 32377 107.0 2.0 1445722 重庆6870 2229 1177 1102 1471 44498 38914 107.8 3.3 1657323 四川6074 1651 1284 773 1587 42339 29608 105.9 4.0 1505024 贵州4993 1399 1014 655 1396 41156 19710 105.5 3.3 1258625 云南5468 1760 974 939 1434 37629 22195 108.9 4.0 1388426 西藏5518 1362 845 467 550 51705 22936 109.5 2.6 1118427 陕西5551 1789 1322 1212 2079 43073 38564 109.4 3.2 1533328 甘肃4602 1631 1288 1050 1388 37679 21978 108.6 2.7 1284729 青海4667 1512 1232 906 1097 46483 33181 110.6 3.4 1234630 宁夏4769 1876 1193 1063 1516 47436 36394 105.5 4.2 1406731 新疆5239 2031 1167 1028 1281 44576 33796 114.8 3.4 13892> lm=lm(y~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=h)> lmCall:lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9,data = h)Coefficients:(Intercept) x1 x2 x3 x4 x5 x6 x7 x8 x9320.640948 1.316588 1.649859 2.178660 -0.005609 1.684283 0.010320 0.003655 -19.130576 50.515575> summary(lm)Call:lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9,data = h)Residuals:Min 1Q Median 3Q Max-940.13 -195.24 3.42 239.00 476.06Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 3.206e+02 3.952e+03 0.081 0.936097x1 1.317e+00 1.062e-01 12.400 3.97e-11 ***x2 1.650e+00 3.008e-01 5.484 1.93e-05 ***x3 2.179e+00 5.199e-01 4.190 0.000412 ***x4 -5.609e-03 4.766e-01 -0.012 0.990720x5 1.684e+00 2.142e-01 7.864 1.08e-07 ***x6 1.032e-02 1.343e-02 0.769 0.450665x7 3.655e-03 1.070e-02 0.342 0.736006x8 -1.913e+01 3.197e+01 -0.598 0.555983x9 5.052e+01 1.502e+02 0.336 0.739986---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 389.4 on 21 degrees of freedomMultiple R-squared: 0.9923, Adjusted R-squared: 0.9889F-statistic: 298.9 on 9 and 21 DF, p-value: < 2.2e-16> pre=fitted.values(lm)> res=residuals(lm)> sd(res)[1] 325.7967> res=residuals(lm)> dy=step(lm)Start: AIC=377.73y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9Df Sum of Sq RSS AIC- x4 1 21 3184326 375.73- x9 1 17149 3201454 375.90- x7 1 17700 3202005 375.90- x8 1 54295 3238599 376.26- x6 1 89586 3273891 376.59<none> 3184305 377.73- x3 1 2662593 5846898 394.57- x2 1 4561056 7745361 403.29- x5 1 9377500 12561805 418.28- x1 1 23314547 26498852 441.42Step: AIC=375.73y ~ x1 + x2 + x3 + x5 + x6 + x7 + x8 + x9Df Sum of Sq RSS AIC- x9 1 17428 3201754 373.90- x7 1 18563 3202889 373.91- x8 1 54437 3238763 374.26- x6 1 91813 3276139 374.61<none> 3184326 375.73- x3 1 2936130 6120456 393.99- x2 1 5467941 8652267 404.72- x5 1 9393345 12577671 416.32- x1 1 25886086 29070412 442.29Step: AIC=373.9y ~ x1 + x2 + x3 + x5 + x6 + x7 + x8Df Sum of Sq RSS AIC - x7 1 34634 3236387 372.24 - x6 1 74800 3276554 372.62 - x8 1 82150 3283904 372.69 <none> 3201754 373.90 - x3 1 3055353 6257107 392.67 - x2 1 5725836 8927590 403.69 - x5 1 9382624 12584378 414.33 - x1 1 25868832 29070586 440.29Step: AIC=372.24y ~ x1 + x2 + x3 + x5 + x6 + x8Df Sum of Sq RSS AIC - x8 1 70813 3307201 370.91 - x6 1 152777 3389165 371.67 <none> 3236387 372.24 - x3 1 5501284 8737672 401.02 - x2 1 8895049 12131436 411.20 - x5 1 9458098 12694485 412.60 - x1 1 27733098 30969486 440.25Step: AIC=370.91y ~ x1 + x2 + x3 + x5 + x6Df Sum of Sq RSS AIC - x6 1 137540 3444741 370.17 <none> 3307201 370.91 - x3 1 5771063 9078264 400.21 - x2 1 8871193 12178394 409.32 - x5 1 9473521 12780722 410.81 - x1 1 28248162 31555363 438.83Step: AIC=370.17y ~ x1 + x2 + x3 + x5Df Sum of Sq RSS AIC <none> 3444741 370.17 - x3 1 5717883 9162624 398.50- x2 1 10249815 13694556 410.95- x5 1 10998313 14443054 412.60- x1 1 33258637 36703378 441.52> summary(dy)Call:lm(formula = y ~ x1 + x2 + x3 + x5, data = h)Residuals:Min 1Q Median 3Q Max-943.18 -161.05 12.74 250.93 566.25Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -1694.6269 562.9773 -3.010 0.00574 **x1 1.3642 0.0861 15.844 7.11e-15 ***x2 1.7679 0.2010 8.796 2.86e-09 ***x3 2.2894 0.3485 6.569 5.76e-07 ***x5 1.7424 0.1912 9.111 1.42e-09 ***---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 364 on 26 degrees of freedomMultiple R-squared: 0.9916, Adjusted R-squared: 0.9903F-statistic: 769.2 on 4 and 26 DF, p-value: < 2.2e-16>newdata=data.frame(x1=5200,x2=2000,x3=1100,x4=1000,x5=1300,x6=45000,x7=34000,x8=115.0 ,x9=3.8)> predict(dy,newdata,interval="confidence")fit lwr upr1 13718.67 13468.98 13968.36>> h=ts(read.csv("F://3.csv",header=TRUE)) > hTime Series:Start = 1End = 56Frequency = 1X78[1,] -58[2,] 53[3,] -63[4,] 13[5,] -6[6,] -16[7,] -14[8,] 3[9,] -74[10,] 89[11,] -48[12,] -14[13,] 32[14,] 56[15,] -86[16,] -66[17,] 50[18,] 26[19,] 59[20,] -47[21,] -83[22,] 2[23,] -1[24,] 124[25,] -106[26,] 113[27,] -76[28,] -47[29,] -32[30,] 39[31,] -30[32,] 6[33,] -73[34,] 18[35,] 2[36,] -24[37,] 23[38,] -38[39,] 91[40,] -56[41,] -58[42,] 1[43,] 14[44,] -4[45,] 77[46,] -127[47,] 97[48,] 10[49,] -28[50,] -17[51,] 23[52,] -2[53,] 48[54,] -131[55,] 65[56,] -17> plot(h,type="o")> local({pkg <- select.list(sort(.packages(all.available = TRUE)),graphics=TRUE) + if(nchar(pkg)) library(pkg, character.only=TRUE)})Warning message:程辑包‘urca’是用R版本3.4.4 来建造的> adf=ur.df(as.vector(h),type=c("drift"),selectlags=c("AIC"))> summary(adf)################################################ Augmented Dickey-Fuller Test Unit Root Test ################################################Test regression driftCall:lm(formula = z.diff ~ g.1 + 1 + g)Residuals:Min 1Q Median 3Q Max-96.191 -23.390 -0.581 18.446 133.241Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -9.4381 7.0489 -1.339 0.187g.1 -1.7837 0.2386 -7.476 9.65e-10 ***g 0.1956 0.1379 1.418 0.162---Signif. codes: 0 ‘***’0.001 ‘**’0.01 ‘*’0.05 ‘.’0.1 ‘’1Residual standard error: 50.89 on 51 degrees of freedomMultiple R-squared: 0.7589, Adjusted R-squared: 0.7494F-statistic: 80.25 on 2 and 51 DF, p-value: < 2.2e-16Value of test-statistic is: -7.4761 27.9471Critical values for test statistics:1pct 5pct 10pcttau2 -3.51 -2.89 -2.58phi1 6.70 4.71 3.86> acf(h)> pacf(h)> ar=sarima(h,1,0,4,details=F)> ar$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,Q), period = S), xreg = xmean, include.mean = FALSE, optim.control = list(trace = trc, REPORT = 1, reltol = tol))Coefficients:ar1 ma1 ma2 ma3 ma4 xmean-0.0957 -0.7605 -0.051 -0.2591 0.0706 -5.0886s.e. 0.7318 0.7244 0.637 0.2013 0.1939 0.4252sigma^2 estimated as 1850: log likelihood = -291.97, aic = 597.95$degrees_of_freedom[1] 50$ttableEstimate SE t.value p.valuear1 -0.0957 0.7318 -0.1308 0.8965ma1 -0.7605 0.7244 -1.0498 0.2988ma2 -0.0510 0.6370 -0.0800 0.9365ma3 -0.2591 0.2013 -1.2875 0.2038ma4 0.0706 0.1939 0.3641 0.7173xmean -5.0886 0.4252 -11.9668 0.0000$AIC[1] 8.73734$AICc[1] 8.814721$BIC[1] 7.954342> ma=sarima(h,0,1,1,details=F)> ma$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), xreg = constant, optim.control = list(trace = trc, REPORT = 1, reltol = tol))Coefficients:ma1 constant-1.0000 0.1275s.e. 0.0452 0.4833sigma^2 estimated as 3412: log likelihood = -303.77, aic = 613.53$degrees_of_freedom[1] 53$ttableEstimate SE t.value p.valuema1 -1.0000 0.0452 -22.1390 0.000constant 0.1275 0.4833 0.2638 0.793$AIC[1] 9.206399$AICc$BIC[1] 8.278733> arma=sarima(h,1,1,1,details=F)> arma$fitCall:stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), xreg = constant, optim.control = list(trace = trc, REPORT = 1, reltol = tol))Coefficients:ar1 ma1 constant-0.4893 -1.0000 0.1052s.e. 0.1161 0.0469 0.2858sigma^2 estimated as 2548: log likelihood = -296.27, aic = 600.53$degrees_of_freedom[1] 52$ttableEstimate SE t.value p.valuear1 -0.4893 0.1161 -4.2127 0.0001ma1 -1.0000 0.0469 -21.3207 0.0000constant 0.1052 0.2858 0.3680 0.7143$AIC[1] 8.950118$AICc[1] 8.999838$BIC[1] 8.058619> res=residuals(ar$fit)> Box.test(res)Box-Pierce testX-squared = 0.0040697, df = 1, p-value = 0.9491> plot(res*res)> res<-residuals(ma$fit)> resTime Series:Start = 1End = 56Frequency = 1[1] -5.812742e-02 7.839872e+01 -4.955419e+01 3.066745e+016.646768e+00 -3.818017e+00 -1.493191e+00 1.448967e+01 -5.993782e+01 1.009045e+02 -3.947443e+01 -3.604570e+00 4.075719e+01 6.073798e+01 -8.076426e+01 -5.630655e+01[17] 5.952211e+01 3.267028e+01 6.289877e+01 -4.376929e+01-7.688972e+01 9.609734e+00 6.123687e+00 1.281064e+02 -1.026027e+02 1.160447e+02 -7.392804e+01 -4.288658e+01 -2.676745e+01 4.382151e+01 -2.561905e+01 1.050204e+01[33] -6.774055e+01 2.380823e+01 7.222574e+00 -1.874297e+012.800543e+01 -3.305934e+01 9.500912e+01 -5.267336e+01 -5.347395e+01 5.982226e+00 1.856342e+01 2.163113e-01 8.018037e+01 -1.234786e+02 1.006553e+02 1.232087e+01[49] -2.566963e+01 -1.438774e+01 2.537689e+01 -6.110995e-044.939921e+01 -1.289854e+02 6.746525e+01 -1.514136e+01> Box.test(res)#Box-Pierce testdata: resX-squared = 13.335, df = 1, p-value = 0.0002606> yc=sarima.for(h,10,1,1,1)> yc$predTime Series:Start = 57End = 66Frequency = 1[1] 5.106162 -5.553160 -0.181147 -2.652867 -1.286844 -1.798532 -1.391503 -1.433980 -1.256525 -1.186677。

R语言自测习题（附答案）

R语言自测习题（附答案）Sample Midterm,Statistics133,Fall20121.What is meant by“vectorized calculations”in R?Provide an example.If we have a vector x,an expression such as,x+2or x3,is vectorized in that the computation is performed on each element of the vector,i.e.2is added to each element of x or each element of x is cubed.There is no need to loop of each element of the vector to perform the computation.2.Describe two important di?erences between a data frame and a matrix in R.A data frame is essentially a list of vectors of the same length,whereas a matrix isessentially a vector with shape information.Data frames can have columns/vectors that are di?erent types,whereas all values ina matrix must be the same primitive element.Data frames can be indexed with$3.Data on37parents of babies born at Kaiser Hospital in the1960s is available in adata frame called parents.The variables age,ed,ht,and wt are the mother’s age, education level,height and weight.The variables that start with the letter d are corresponding variables for the fathers.>head(parents)age ed ht wt dage ded dht dwt marital inc127College6210031College65110Married[2500,5000)233College6413538College70148Married[7000,8000)328High School6411532Some High School NA NAMarried[5000,6000) 436College6919043Some College68197Married[12500,15000)523College6712524College NA NA Married[2500,5000)625High School629328High School64130Married[7000,8000) Provide the return value for each of the following expressions: dim(parents)[1]3710class(parents\$marital)[1]"factor"Write an R expression to?nd the subset of parents where the mother is over40.parents[parents$age>40,]Write an R expression using an apply function to return the class of each variable in the data frame.sapply(parents,class)Write one R expression using an apply function to return the number of NAs in each variable(recall that there is an is.na()function returns a logical indicating the presence of NAs) sapply(parents,function(x)sum(is.na(x)))4.Here is a list in R,>x$a[1]0.038954420.776588660.83532332$b[,1][,2][1,]14[2,]25[3,]36Write one line of R code to extract the?rst row of the matrix.x$b[1,]5.Suppose we have a matrix m in R,and we’ve just executed the following:>dim(m)[1]50003>head(m)[,1][,2][,3][1,]-2.2468718-0.7733515-3.4332337[2,]0.5771791-0.70585520.8052004[3,]-1.0125651-0.2699696-1.1368809[4,]-0.2504269-1.1205857-0.3498572[5,] 2.67471950.25506780.1225329[6,] 1.0095424-1.29000790.1387224We need to create a vector containing the sum of the squared entries in each row of m.Write R code to do this in two di?erent ways:(a)using a for loopsumM=rep(0,nrow(m))for(i in1:m){sumM[i]=sum(m[i,]^2)}(b)using the apply functionapply(m,1,function(x)sum(x^2))6.Write down what the value of x will contain after each line of R code,if the commandsare executed sequentially.>x=seq(0,8,length=5)02468>x[x<4]=NANA NA468>x[5]=10NA NA4610>x[]=000000>x=12127.Someone wants to study the distribution of the sum of three rolls of a die.To do thisshe designs a simulation study.In the?rst step,she writes a function to generate the sum of three random tosses of a fair die.In the second step she uses this function to generate1,000of these sums.(a)Write the function for the?rst step.sum3=function(){sum(sample(1:6,3,replace=TRUE))}(b)Write one line of code that uses the function from the?rst step to generate the1,000random sumsreplicate(1000,sum3)8.We want to compute the sum of the absolute deviations from the median for a vector.For example for a vector x=1:3,x has a median of2,and the absolute deviations from the median are1,0,and1so the sum of the absolute deviations from the median is2.。

【原创】R语言Ordination_OPTION 统计分析统计自测题(附答案代码数据)

STAT660/FES758bMultivariate StatisticsHomework #6 OPTION A: OrdinationDue : Wednesday, 4/26/17 – Submit on CANVAS by midnightFor this assignment, you can either use your own data or the data described below. Use any combination of R/SAS/MINITAB/SPSS/STATA that you like. Whichever data you choose, do the following:1) Fit Correspondence Analysis to your data.2) Discuss the inertia, make a two dimensional plot of the first two CA directions.3) Comment on whether or not there is any evidence of 'data snaking' in higherdimensional space.4) In a few sentences, describe what you conclude from your plot.5) Perform Multidimensional Scaling (metric or non-metric) for 1, 2, and 3dimensions.6) Discuss the stress (or SStress) of each dimensional solution. Make a scree plotif you're able.7) Make a two dimensional plot of your results.8) If possible, overlay some other variables to interpret your ordination axes.9) BONUS– try canonical correspondence analysis, or calculate p-values for theoverlaid additional variables.Loaner Data : choose ONECereal.attitudes.csv : Marketing Survey Attitudes toward Cereals∙8 Cereals∙11 Questions (come back to, tastes nice, popular with all the family, very easy to digest, nourishing, natural flavor, reasonably priced, a lot of food value, stays crispy in milk, helps to keep you fit, fun for children to eat) ∙Values are percent of respondents who had a favorable response for a particular cereal for that particular question.T. K. Chakrapani and A. S. C. Ehrenberg, "An Alternative to Factor Analysis in Marketing Research Part 2: Between Group Analysis", PMRS Journal, Vol. 1, Issue 2, October 1981, pp. 32-38.R code to get you started :#get the datacereal <-read.csv("/stat660/data/cereal.attitudes.csv") Wiconsin.Forest.csv : Relative abundance of 14 species was measured on 10 plots. Plots were ordered from pioneer (early stage) to climax (late stage). The final column contains that stage of the forest on a scale from 1 to 10.Peet & Loucks (1977)R code to get you started :#get the dataforest <-read.csv("/stat660/data/Wisconsin.Forest.csv") rownames(forest)=forest[,1]forestenv=matrix(forest[,17],ncol=1)rownames(forestenv)=forest[,1]colnames(forestenv)=c("Stage")forest=forest[,-c(1,17)]forestenv=data.frame(forestenv)。

2020-2021学年第二学期《R语言数据分析方法与实验》期末试卷

2020-2021学年第二学期《R语言数据分析方法与实验》期末考试试题第一题(共35分)探索nycflights13数据集1.从flights数据中找出到达时间延误2小时或者更多的所有航班，并将生成的新数据保存为flight_arr2hr。

(5分)2.将生成的flight_arr2hr数据集根据目的地（dest）进行分组，统计出抵达每个目的地的航班数量，筛选出抵达航班数量前十名的目的地，将结果命名为top10_dest。

(5分) 3.从weather表中挑选出以下变量：year, month, day, hour, origin, humid, wind_speed，并将其与flight_arr2hr表根据共同变量进行左连接, 生成的新数据保存为flight_weather (5分)4.基于flight_weather数据集，根据不同出发地(origin)在平行的三个图中画出风速wind_speed（x轴）和出发延误时间dep_delay（y轴）的散点图，以及平滑曲线。

(5分)5.flights中每家航空公司在2013年有多少班次的航班被取消了？提示：依据dep_time来判断某班次航班是否被取消(5分)6.找出flights中每一家航空公司的航班最常去的目的地机场，以及flights中每家航空公司飞往最常去的目的地机场的航班数量(10分)第二题（共20分）探索diamonds数据集1.对diamonds数据集，生成一个新变量id，用于存储每条观测值所在的行数。

挑选出id, x, y, z四个变量，将宽数据转换为长数据：将x, y, z的变量名存为新变量dimension，将x, y, z的值存为新变量length。

转换后的长数据存为xyz_long。

（5分）2.将xyz_long数据集转换回宽数据xyz_wide。

宽数据xyz_wide包含id, x, y, z四个变量。

（5分）3.写代码找出diamonds中最常见和最不常见的color，即出现次数最多和最少的color。

R语言统计分析作业

T<- read.table("G:/学习文件夹/R语言/R语言作业/5/住房状况调查.csv",header=TRUE,sep=",")1、对变量计划户型制作一张频数分布表，并绘制相应的图形，写出相应的结论（请写出R代码）table(T$计划户型)2、请对变量计划面积选择正确的图形进行描述，写出相应的结论（请写出R代码）；hist(T$计划面积, col = "lightgreen")3、对变量计划面积，计算样本数、平均值、中位数、修剪均值、样本标准差、偏态系数、峰态系数、最大值、最小值、上下四分位数，并对计划面积的分布特征进行综合分析（请写出R代码）；t<-na.omit(T$计划面积)summary(t)library("psych")describe(t)根据第二题的直方图显示，计划面积的数据分布是非对称分布，其中位数为100，平均数为101.6，上四分位数为80.0，下四分位数为120.0，分位数之差是40.04、请选用合适的图形来展示变量计划户型和变量从业状况之间的关系，并进行独立性检验；（请写出R代码）；t<- na.omit(T)b<-data.frame(t$从业状况,t$计划户型)a<-table(b)barplot(a,main="从业状况与计划户型的关系",ylab="频数",col=c(rainbow(6)),beside=TRUE)summary(assocstats(a))H0：计划户型与从业状况独立，即两个变量不关联；H1：计划户型与从业状况不独立，即两个变量关联；Pearson卡方检验结果表明，n=719，X-squared = 129.270, df = 50, p-value = 6.0761e-09，小于0.05, 拒绝原假设，Cramer′s V = 0.19，有证据表明计划户型与从业状况不独立。

r语言回归自测习题(附代码答案)

有问题到淘宝找“大数据部落”就可以了################## Part 1: Linear Regression Concepts ################# ######## These questions do not require coding but will explore some importan t concepts.## "Regression" refers to the simple linear regression equation:## y = b0 + b1*x## This homework will not discuss other models.## 1. (1 pt)## What is the interpretation of the coefficient B1?## (What meaning does it represent?)## Your answer here# 当自变量增加一个单位的时候，因变量增加多少个单位？## 2. (1 pt)## Outliers are problems for many statistical methods, but are particul arly problematic## for linear regression. Why is that? It may help to define what outli er means in this case.## (Hint: Think of how residuals are calculated)## Your answer here# 因为线性回归模型的一个观测点异常时，会对自变量和因变量的平均值产生很大影响，会对beta产生很大的影响，模型会发生巨大的改变# 标准化残差值大于2或者小于2的点可能是离群点## 3. (1 pt)## How could you deal with outliers in order to improve the accuracy of your model?## Your answer here# 对离群点进行删除或者用均值来替代################## Part 2: Sampling and Point Estimation ##############有问题到淘宝找“大数据部落”就可以了######### The following problems will use the cats dataset and explore## the average body weight of female cats.## Load the data by running the following code# install.packages("MASS")library(MASS)## Warning: package 'MASS' was built under R version 3.3.3data(cats)## 4. (2 pts)## Subset the data frame to ONLY include female cats.## Your answer herecats=cats[cats$Sex=="F",]## Use the sample function to generate a vector of 1s and 2s that is th e same## length as the subsetted data frame you just created. Use this vector to split## the 'Bwt' variable into two vectors, Bwt1 and Bwt2.## IMPORTANT: Make sure to run the following seed function before you r un your sample## function. Run them back to back each time you want to run the sample function to ensure## the same seed is used every time.## Check: If you did this properly, you will have 24 elements in Bwt1 a nd 23 elements## in Bwt2.set.seed(676)## Your answer hereset.seed(676)s1=sample(length(cats$Bwt),24)Bwt1=cats$Bwt[sample(length(cats$Bwt),24) ]有问题到淘宝找“大数据部落”就可以了Bwt2=cats$Bwt[-s1 ]## 5. (3 pts)## Calculate the mean and the standard deviation for each of the two## vectors, Bwt1 and Bwt2. Use this information to create a 95%## confidence interval for your sample means (you can use the following formula## for a confidence interval: mean +/- 2 * standard deviation).## Compare the confidence intervals -- do they seem to agree or disagre e?## Your answer heremean(Bwt1)## [1] 2.3375mean(Bwt2)## [1] 2.395652sd(Bwt1)## [1] 0.2617873sd(Bwt2)## [1] 0.2754802# confidence intervalmean(Bwt1)+2*sd(Bwt1)## [1] 2.861075mean(Bwt1)-2*sd(Bwt1)## [1] 1.813925mean(Bwt2)+2*sd(Bwt2)## [1] 2.946613mean(Bwt2)-2*sd(Bwt2)## [1] 1.844692# 从置信区间来看，他们相差不大，结果类似。

大学r语言考试题及答案

大学r语言考试题及答案一、选择题（每题2分，共20分）1. R语言是一种（）。

A. 编程语言B. 数据分析工具C. 操作系统D. 网页浏览器答案：A2. 在R语言中，用于生成随机数的函数是（）。

A. seq()B. rep()C. sample()D. random()答案：C3. 下列哪个函数可以用来计算R语言中的向量元素的总和？（）A. sum()B. mean()C. median()D. max()答案：A4. R语言中，用于创建数据框（data frame）的函数是（）。

A. data.frame()B. matrix()C. list()D. vector()答案：A5. 在R语言中，如何引用一个名为“x”的变量的第一个元素？（）A. x[1]B. x(1)C. x{1}D. x->1答案：A6. R语言中，用于绘制直方图的函数是（）。

A. plot()B. hist()C. bar()D. pie()答案：B7. 下列哪个选项是R语言中的数据类型？（）A. 数字（numeric）B. 文本（text）C. 日期（date）D. 所有选项都是答案：D8. 在R语言中，如何将一个向量反向？（）A. rev()B. reverse()C. flip()D. invert()答案：A9. R语言中，用于执行逻辑“与”操作的函数是（）。

A. &B. &&C. &D. and()答案：A10. 下列哪个命令可以用来安装R语言的包？（）A. install.packages()B. load.packages()C. get.packages()D. fetch.packages()答案：A二、简答题（每题5分，共30分）11. 简述R语言中向量和矩阵的区别。

答：R语言中的向量是一维的数据结构，可以包含相同类型的数据元素。

矩阵是二维的，由行和列组成，且矩阵中的所有元素必须是相同类型的。

R语言数据分析练习题参考答案

R语言数据分析练习题参考答案一、问题描述在这个练习中，我们将进行R语言数据分析的练习，并给出相应的参考答案。

以下是各个问题的具体描述：1. 统计数据给定一个包含10个正整数的向量x，求出以下统计数据：（1）向量x的均值；（2）向量x的中位数；（3）向量x的最大值；（4）向量x的最小值；（5）向量x的标准差。

2. 数据可视化使用R语言绘制以下数据的散点图：（1）给定一个包含50个数据点的数据集，x轴为变量x，y轴为变量y；（2）给定一个包含100个数据点的数据集，x轴为变量x，y轴为变量y，并对数据点进行颜色编码。

3. 数据处理给定一个包含100个数据点的数据集，其中的数据存在缺失值。

请使用R语言进行数据处理，具体要求如下：（1）删除包含缺失值的数据点；（2）计算数据集的均值并输出；（3）使用均值填充缺失值，并重新计算数据集的均值并输出。

二、问题解答下面给出以上问题的详细解答。

1. 统计数据（1）向量x的均值：mean(x)（2）向量x的中位数：median(x)（3）向量x的最大值：max(x)（4）向量x的最小值：min(x)（5）向量x的标准差：sd(x)2. 数据可视化（1）散点图1：plot(x, y)（2）散点图2：plot(x, y, col = colors)3. 数据处理（1）删除包含缺失值的数据点：complete_data <- na.omit(data)（2）计算数据集的均值并输出：mean(data)（3）使用均值填充缺失值，并重新计算数据集的均值并输出：data_filled <- datadata_filled[is.na(data_filled)] <- mean(data_filled, na.rm = TRUE)mean(data_filled)以上就是R语言数据分析练习题的参考答案。

通过这些练习，希望能够帮助你熟悉R语言的数据分析操作，并掌握常用的统计和可视化技巧。

07-R语言数据分析实战测试试卷

R数据分析测试试卷模块1：单选题1. 关于R语言的数据类型下列说法不正确的是？（D ）A．数字（double/numeric，缩写num）B．整数（integer，缩写int）C．逻辑（logic，缩写logi）D．文字（text，缩写text）2 下列用R语言创建一个两行三列1到6共6个数值的矩阵的表示方法为（A）A．matrix(1:6, ncol=3, nrow=2)B．matrix(2, 3,1:6 )C．matrix(1:6, col=3, row=2)D．matrix(1:6, 3:2)3. R语言中的无效变量名是？（D ）A．变量名称由字母，数字和点组成B．变量名称由字母，数字下划线字符组成C．变量名称由字母，点或下划线字符组成D．变量名称以数字后跟字母开头4. 将未加载的包加载到R的工作空间，调用的函数为：BA. install(“包名称”)B. library(“包名称”)C. lib(“包名称”)D. rep(“包名称”)5. 可利用as函数将数据对象的存储类型转换为指定的类型，书写格式为（A ）A. as.存储类型名(数据对象名)B. as.(数据对象名)C. as.(存储类型名)D. as.数据对象名(存储类型名)6. R中merge函数是用来实现（B ）A. 实现数据查找B. 实现数据合并C. 实现数据更新D. 实现数据过滤7. 将文本数据读入到向量中在R中可以使用（A）A. scan函数B. read.table函数C. open函数D. file函数8. 下列程序中不属于R循环结构的是（B）A. For结构B. Loop结构C. while结构D. repeat结构9. 关于R中获取YKH.example1.xlsx文件第一个sheet的命令，正确的是（B）A．read.table("./data/YKH.example1.xlsx",sheet=1)B．read.xlsx("./data/YKH.example1.xlsx",sheet=1)C．read.csv("./data/YKH.example1.xlsx",sheet=1)D．read.file("./data/YKH.example1.xlsx",sheet=1)10. 在R中运行代码IRkernel::installspec(user=FALSE)，使得R在jupyter中可以被探测到含义是: (D)A.对当前登录用户有效B.对当前管理员用户有效C.对匿名用户有效D.对此电脑上所有用户有效模块二判断题（答案T为正确，F为错误）11.R中通过options(repos = '/CRAN/')命令设定镜像下载地址(对)12.R中通过命令install.packages('ggplot2')尝试下载R的画图包(对)13.查看R内置数据集的前10行数据可以使用命令head(iris)(错)14.R中通过命令help(iris)来查看对应iris数据集使用的帮助文档(对)15.R中通过命令data()列出已载入的包中的所有数据集(对)16.R中通过命令as.matrix(x)把对象x转为矩阵型(对)17.R中通过命令str(x)查看对象x的结构(对)18.R中x[a:b]表示向量x的第a到b个元素(对)19.R中len(x)表示计算向量x的长度(错)20.R中3^2的结果是6(错)21.R中可以使用jiebaR实现文本数据分析的中文分析操作(对)22.R中可以使用segment命令来进行分词器分词操作(对)23.R中可以使用wordcloud2实现词云图的生成操作(对)24.R中可以使用attributes(iris)显示数据集的属性(对)25.R中算出列iris$Sepal.Length和iris$Petal.Length的协方差命令是cor(iris$Sepal.Length, iris$Petal.Length)(错)26.R中对iris数据集前四列全部数据进行聚类分析的命令是iris.kmeans<-kmeans(iris[,1:4],3)(对)27.R中显示iris数据集列Species中各个值出现频次的命令是table(iris$Species)(对)28.R中显示iris数据集中每个变量的分布情况的命令是summary(iris) (对)29.R中绘出iris数据集矩阵各列的散布图的命令是plot(iris)(对)30.R中机器学习分类分析中随机森林算法对应的包为ctree (错)。

r语言试题库

R语言试题库
一、选择题
R语言中用于向量下标的运算符是( )。

A. %
B. #
C. $
D. ^
答案：C
以下哪个不是R语言中的数据结构？
A. 向量
B. 矩阵
C. 数据框
D. 字符串
答案：D
R语言中用于逻辑运算的运算符是( )。

A. <
B. >
C. =
D. ==
答案：D
在R语言中，以下哪个函数用于计算向量的平均值？
A. sum
B. mean
C. median
D. mode
答案：B
二、填空题
R语言中，函数用于生成随机数。

答案：random() 或 rnorm() 或 runif() 等。

R语言中，函数用于读取数据文件。

答案：read.table() 或 read.csv() 等。

R语言中，函数用于绘制散点图。

答案：plot() 或 scatter()。

R语言中，函数用于计算一组数据的标准差。

答案：sd()。

三、简答题
简述R语言中常见的几个数据类型。

答案：数值型、字符型、逻辑型和复数型等。

简述R语言中条件语句的结构。

答案：if语句、if-else语句和switch语句等。

R语言与统计分析第五章习题答案

＃5。

1x<—c（3,5,7,9，11,13,15,17，19,21)y〈-c(21,16，15,26,22,14，21,22,18，25)e=sum(x＊y）/sum（y) #样本期望d=(sum(x*x＊y）/sum（y))-e^2 #样本方差a=（8*e+sqrt(64*e^2—4*4＊（4＊e^2-12*d）))/8 ＃估计结果b=（8*e—sqrt（64＊e^2—4*4＊(4*e^2—12*d）))/8ab＃5。

2x<-c（0，1,2,3，4,5，6）y〈-c(17，20,10,2，1，0,0）e=2.718281828459f〈-function（λ）(e^（—50＊λ）*λ^50)/（2^10*6^2*24) #似然函数optimize(f，c（0,2),maximum=TRUE)#5.3x<-c(482,493,457，471,510，446，435,418，394，469）#0.95置信区间t。

test（x)＄conf.intchisq。

var。

test<-function（x,var，alpha，alternative=”two.sided”)｛options(digits=4）result<—list(）n〈—length(x）v〈-var(x）result$var<-vchi2<-(n—1）＊v/varresult$chi2<—chi2p<—pchisq(chi2，n—1)result$p.value<—pif(alternative==”less"）result＄p。

value〈-pchaisq（chi2,n—1，lower。

tail=F）else if (alternative==”two。

sider”)result＄p.value<—2*min（pchaisq（chi2,n—1),pchaisq（chi2,n-1,lower.tail=F)）result$conf。

R语言练习题

R语言练习题统计软件实验1对于每个问题，请注意1命令代码2结果或图3的错误(?0.3?42)1y=sin(10?)?e?log423Y2x=sin(223/3)，y=x^2，z=y*10；求x+2y-5zx3建立一个一维数组XX，起始值=3，增量值=5.5，结束值=444建立等差一维数组x：首项为0,末项为?,项数为15x5将100202200600800输入R并保存到数值变量中6将numeric转换为factor存入变量factor.numeric，并用class（）确认。

factor.numeric8以2,4,6,8,48,50的形式创建一个从2到50的向量，并命名为vector1vector110选取vector1中的第10，15，20个元素vector1[c(10,15,20)]11在vector1中选择第10到第20个元素向量[10:20]12选取vector1中值大于40的元素vector1[vector1>40]13创建向量12345123451234512345123451234512345123451234512345rep（1:5,5）14使用rep()创建向量0000011111222223333344444rep(0:4,rep(5,5))15使用函数Rep（）构造一个向量x，它由三个3,4 2,5 1组成，x=C（Rep（3,3），Rep（2,4），Rep（1,5））统计软件实验2对于每个问题，请注意1命令代码2结果或图3的错误3231计算行列式的值a?426781a=det(matrix(c(3,4,7,2,2,8,3,6,1),nrow=3))323 111 2矩阵A？426，矩阵B？222; 找到AXB与a和B中相应元素之间的乘积781???333??a=matrix(c(3,4,7,2,2,8,3,6,1),nrow=3);b=matrix(rep(1:3,3 ),nrow=3);a%*%b;a*b由1,2,16组成两个方阵，其中矩阵A按列输入，矩阵B按行输入，计算C=A+B，d=ABA=matrix（1:16，nrow=4）；b=矩阵（1:16，nrow=4，byrow=t）；c=a+b；d=a%*%b4先复制附录数据至文本文档,然后读取数据至文件datadata5比率计算体重和身高的平方比存储在BMI变量BMI中6创建对象x,其值为1:10,使用write函数将其写入文件x.txt；删除x,然后再读入该文件并赋值给x,并保证x是numeric十、write.table(x,file=\x7.检查mtcars数据（输入mtcars）；将vs变量所在列中的所有元素更改为“学生编号的最后两位数”（mtcars$vs）mtcars$vs8将mtcars转换为矩阵mm，判断数据类型；将第六行改为“学生号的最后两位数字”；取矩阵mm的前11行，并将其存储在变量mtcars11中；取mtcars11主对角线的元素，形成主对角线矩阵MT；取mtcars11的上三角数组，存储在mtupper中；（需要通过网络搜索学习上三角矩阵的定义）mmmtcars11(mtcars11mtupper（x[lower.tri（x）]lower.tri(x,diag=falsex[upper.tri(x)]上面的tri（x，diag=false）9把mtupper的行名和列名改为null。

r语言考试题

r语言考试题1. 数据导入与处理在R语言中，我们常常需要处理和分析大量的数据。

请你使用适当的函数导入以下数据集，并回答相关问题。

数据集：iris问题：a) 数据集中有多少行和列？b) 数据集中每个列的数据类型是什么？c) 数据集中是否存在缺失值？d) 将数据集中的Species列中的所有值改为"setosa"。

2. 数据可视化R语言提供了强大的数据可视化工具，可以帮助我们更好地理解和分析数据。

请你使用适当的函数绘制以下图表。

数据集：mtcars图表：a) 绘制一张散点图，横轴为mpg列，纵轴为hp列。

b) 绘制一张柱状图，横轴为cyl列，纵轴为wt列。

3. 统计分析R语言是一个功能强大的统计分析工具。

请你使用适当的函数回答以下问题。

数据集：ChickWeight问题：a) 计算ChickWeight数据集中各个时间点(t)的平均体重。

b) 比较不同饲料类型(Diet)对鸡体重的影响。

4. 机器学习R语言内置了多种机器学习算法，并提供了相应的函数和工具，方便我们进行机器学习任务。

请你使用适当的函数完成以下任务。

数据集：iris任务：a) 将数据集分为训练集和测试集，比例为8:2。

b) 使用最近邻分类算法（K-Nearest Neighbor）对测试集中的数据进行分类，并计算准确率。

5. 数据挖掘R语言中的数据挖掘工具可以帮助我们从海量数据中挖掘出有价值的信息和模式。

请你使用适当的函数回答以下问题。

数据集：Adult问题：a) 预处理数据集，包括删除缺失值、处理离散特征等。

b) 使用决策树算法（Decision Tree）训练一个分类模型，并评估模型的性能。

结语通过以上的考试题目，你将有机会综合运用R语言的各种功能和工具，并展示你对数据处理、可视化、统计分析、机器学习和数据挖掘的理解和应用能力。

加油！（以上为考试题目的示例，实际的题目内容和格式可能会有所差异，请根据实际情况进行调整。