卫生统计学英文版
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
卫⽣统计学英⽂版
1、coefficient of determination The coefficient of determination is the ratio of the sum of squared regression to the sum of squared total.the coefficient of determination is such that 0 =< r 2 =< 1, and denotes the strength of the linear association between x and y. The coefficient of determination represents the percent of the data that is the closest to the line of best fit
2、p-value In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α, which is often 0.05 or 0.01. When the null hypothesis is rejected, the result is said to be statistically significant. 在零假设成⽴条件下,出现统计量⽬前值及更不利于零假设数值的概率
3、confidence interval a type of statistical interval estimate for an unknown parameter: a range of values believed to contain the parameter, with a predetermined degree of confidence. Its endpoints are the confidence limits, and it has a stated probability (confidence coefficient) of containing the parameter. The range of numerical values in which one can be confident (to a computed probability —e.g., 90 or 95%), that the population value being estimated will be found.
4、life table A statistical model for measuring the mortality (or any other type of “exit ”) experiences of a population, controlling for age distributions may be a current life table, when all of the people in a population at one time are surveyed, or
a cohort table, when all of the people born in a particular time span are dealt with as a group.
5、Meta-Analysis a systematic method of evaluating statistical data based on results of several independent studies of the same problem.
6、Partial regression coefficient Statistics in the population multiple linear regression equation that indicate the effect of each independent variable on the dependent variable with the influence of all the remaining variables held constant; each coefficient is the slope between the dependent variable and each of the independent variables.
⼀、standard deviation and Standard error
meaning:a measure of dispersion for individuals around sample mean.it can be used as the point estimation of population standard deviation 。
used in the descriptive analysis
a measure of dispersion of sample means from population mean it can be used as an estimation of sample means ’ standard deviation used in inferential analysis
application: if the standard deviation is smaller, it mans variables are more closely around distribution, so the mean have a better. Representative
if Standard error is smaller, it means that the difference between sample mean and population mean is smaller, using the sample mean to estimate the population mean have a bigger reliability
relationship with n:the bigger n is,the more closer sample standard deviation and population standard deviation
the bigger n is,the smaller Standard error will be.
Relationship between SD & SE
SE is directly proportional to SD, and it is inverse proportion to the square-root of sample
size n. they are used as Statistical parameter to describe the degree of variation
n X
/σσ=
⼆、confidence coefficient and reference range
Meaning :a type of statistical interval estimate for an unknown parameter: a range of values believed to contain the parameter, with a predetermined degree of confidence. Its endpoints are the confidence limits, and it has a stated probability (confidence coefficient) of containing the parameter. The range of numerical values in which one can be confident (to a computed probability —e.g., 90 or 95%), that the population value being estimated will be found.
A set of values established as normal maximums or minimums for a given analyte it is a fluctuating range ,application:to estimate population mean ,it it fluctuation range of population mean it can reflect the distribution range of a certain index of Most (95%) observation object Sample size :if the interval is bigger ,it will be approach to 0
if the interval is bigger ,it will be more stable
calculation formula : u nknown n small :
k nown
or
unknown but n>60
Normal distribution Skewness distribution PX~P100X
三、Linear correlation and regression of difference and assossiations
assossiations :1、for a data whichi can do both related analysis and regression analysis ,the positive and negative number of r and b is the same 2、the hypothesis testing of r and b value is the same
their t values are equal3 The same data ‘s correlation coefficient and regression coefficient can mutual conversion 4、use regression to explain correlation , r is closer to 1, explains they are more relevant
difference :1、Data source: correlation analysis is required that both x and y follow normal distribution; but for simple linear regression, only y is required following normal distribution.2. application:correlation analysis is employed to measure the association between two random variables (both x and y are treated symmetrically)simple linear regression is employed to measure the change in y for x (x is the independent varible, y is the dependent variable)3. r is a dimensionless number, it has no unit of measurement; but b has its unit which relate to y.4、Value range :-1=《r=《1,-00=《b=《+00
四、Parameters determining sample size
α: Probability of type I error (significance level),if αis smaller ,Need more sample size β: Probability of type II error (statistical power) βis smaller ,Test power is bigger Need
more sample size σ: Standard deviation of population. If σis bigger Need more sample size δ: Tolerable difference between two population means or two population proportions if δis bigger ,Need little sample size
五、2 ×2Table test matters needing attention
1、In the table Each grid should has T ≥1,and the grids containing 1≤T <5 should not exceed 1/5 of the total grids 。
If n ≥40,but there is 1≤T <5in the grid ,need to use Correction formula Also ,we can Increase sample size 、delete or merge by professional knowledge 、Switch to two-way disorder R x C table Fisher exact probability method
2、after refuse null hypothesis ,we should carry out Comparison between two3 、For the classification of orderly variables, cannot use the general chi-square test method
2
六、Non-parametric tests
1、Distribution unknown (condition of parametric methods not met)
2、Ordinal data :data have a ranking but no clear numerical interpretation
3、Non-precise data( i. e: >80); Variance obviously not eqaual
4、A quick and brief analysis ( for pilot study ). One-side or two-sideHas No upper limit No lower limit
七、.Steps of hypothesis test
Hypothesis testing is required to use small probability, from the opposite of the problem (H0) to judge whether the problem (H1) is true indirectly. Then the condition was established in H0 calculation of test statistics, finally get P value to judge
1、determine H0 H1 and
2、Select test method and calculate test statistic, determine distribution and df .
3、Find out critical value and p-value from table or computer .
4、Make inference: reject H0 if p< not reject H1 if p> .
5、Professional conclusion :clinical or preventive merits
⼋、caution in use relative measure
1、The denominator should be big enough! Otherwise the absolute measure should be used. Example: Out of 5 cases, 3 were cured–60% ?
2、Attention to the population where the relative measure comes from. Prevalence rate: Population is the students in the same grade . Constitutes: Population is all the patients
3、Pooled estimate of the frequency:Pooled estimate = numerators / denominators
4、Comparability between frequencies or between frequency distributions –Notice the balance of other conditions
5、If the distributions of other variables are different, to improve the comparability, “Standardization”is needed.
6、To compare two samples, hypothesis test is needed. (See Chi square test)
九、t检验与⽅差分析
⽅差分析的基本思想是根据研究⽬的和设计类型,将总变异中的离均差平⽅和SS及其⾃由度分别分解成相应的若⼲部分,然后求各相应部分的变异;再⽤各部分的变异与组内(或误差)变异进⾏⽐较,得出统计量F值;最后根据F值的⼤⼩确定P值,作出统计推断。
多于两个变量间的均数⽐较要⽤⽅差分析
⽅差分析的应⽤条件为①各样本须是相互独⽴的随机样本;②各样本来⾃正态分布总体;
③各总体⽅差相等,即⽅差齐
t检验适⽤于两个变量均数间的差异检验,。
⽤于⽐较均值的t检验可以分成三类,第⼀类是针对单组设计定量资料的;第⼆类是针对配对设计定量资料的;第三类则是针对成组设计定量资料的。
后两种设计类型的区别在于事先是否将两组研究对象按照某⼀个或⼏个⽅⾯的特征相似配成对⼦。
⽆论哪种类型的t检验,都必须在满⾜特定的前提条件下应⽤才是合理的。
若是单组设计,必须给出⼀个标准值或总体均值,同时,提供⼀组定量的观测结果,应⽤t检验的前提条件就是该组资料必须服从正态分布;若是配对设计,每对数据的差值必须服从正态分布;若是成组设计,个体之间相互独⽴,两组资料均取⾃正态分布的总体,并满⾜⽅差齐性。
之所以需要这些前提条件,是因为必须在这样的前提下所计算出的t统计量才服从t分布,⽽t检验正是以t分布作为其理论依据的检验⽅法。
⼗、pair-designed test and two independent samples t test
The so-called paired sample refers to in the two samples observation objects have some connection or have the same important features, then they are paired .two individual of each group are randomly assigned to accept two kinds of processions
Divide the test subjects into two treatment groups randomly, Each group accepts one treatment they come from two independent samples,to make a infererence whether their population mean are equal
⼗⼀、what is the relationship between Binomial distribution, poisson distribution normal distribution
1、if X-B(n,π)when n is very big,and πis very small then X follows poisson distribution approximately
2、if X-B(n,π)when nπ>5 and n(1-π)>5 then X follows normal distribution approximately
3、λ>=20 poisson distribution follows normal distribution approximately
⼗⼆、relationship of Interval estimation and hypothesis testing
Interval estimation In statistics, interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter, in contrast to point estimation, which is a single number.
Hypothesis testing is required to use small probability, from the opposite of the problem (H0) to judge whether the problem (H1) is true indirectly. Then the condition was established in H0 calculation of test statistics, finally get P value to judge
1、Confidence interval has the main function of hypothesis test
2、Confidence interval can provide information which hypothesis test is not given Confidence interval may indicate whether the difference have practical significance
3、Hypothesis test provide more information than confidence interval for example p value。