Jump to:

Philip Morris

Statistical Significance and Confidence Intervals

Date: 19860609/P
Length: 2 pages
2023512316-2023512317
Jump To Images
snapshot_pm 2023512316-2023512317

Fields

Author
Berry, G.
Type
PSCI, PUBLICATION SCIENTIFIC
BIBL, BIBLIOGRAPHY
Document File
2023512309/2023512515/Ets Issue Binder: Epidemiology
Site
R529
Author (Organization)
Medical Journal of Australia
Univ of Sydney
Master ID
2023512310/2514
Related Documents:
Litigation
Okag/Privilege Withdrawn
Okag/Produced
Characteristic
EXTR, EXTRA
Area
SCIENTIFIC AFFAIRS/BLACK LATERAL OLD S&T
Date Loaded
24 May 1999
UCSF Legacy ID
ujc02a00

Document Images

Text Control

Highlight Text:

OCR Text Alignment:

Image Control

Image Rotation:

Image Size:

Page 1: ujc02a00 Log in for more options!
nt7rit:ii This matertal rnay oe 618 protectz o bv copyright June 9, 1986 Vol. 144 THE MEDICAL IOL/RNAL OF AUSTRALIA Statistical significance and confidence M ~.nywpen in. rhe Journal use / V/ surf~~lcatme[hods arsdbne of th.I j 1 aJms ot the revlew procw is to try to ensure that appropriate methods have been& used. Often papers rer+ort results of comtsarative studies that art designed to S atuwer questions such as whether one treatment is superior to another for a particular disease, or whether there is an association between sottx form of behaviour (for exampk, taking regular, exercise or smoking) and the occvrrence of some disease. Comparative studies are almost invatiably carried out on a sample of individuals who are chosen from the populatiort, of individuals to whom it is intended to generalize the results. Data are collected on the sample in order to make inferences on the population. Valid inferences can only be drawn if the sample is chosen.in such.a way that it is represen- tative of the population. Otherwise a bias could occvr; epidemiological methods are designed' to eliminate such biases. Since the aim of a statistical analysis is to make inferences. it is paramount to express whatever inferences that can be drawn in the most informative way: There are several methods of statistical inference, but the two that are most commonly used are significance testing and confidence interval estimation. The former is well known and is featured by quoting P values. Many authors appear to be under the impression that a profusion of P values is necessary: regrettably this impression has been bolstered in the past by editors of biological Ijournals. Significance testing has its place buts as mentioned by Healy in,1978,' "it, is widely agreed among statisticians (if less so among the more naive users of statistics) that; significance testing is not the be-all and end- all of the subject". In this leading article I would'like to discuss tfie characteristics of: both methods of' inference, show that a confidence interval contains the result of a significance test, but nou vice versa, and suggest that confidence intervals are the answers to the more interesting questions that data can be used to answer. Any particular study is based on a particular sample: however, it is useful to imagine that the study is repeated with a different sample being selected each time. These hypothetical studies will give different results because they contain differenn individuals, and individuals vary in any characteristic because of biological varia- bility. The differences are termed sampling variability. It follows then that the results than are obtained from a particular sample can only be taken as an approximation to the actual situation~ in the whole popultitaon. Statistical methods are concerned »rh assessing the degree of approximatton and intervals what may be reasonably inferred, given that  different sample would have produced a different result. The methods are based on the assumption that it is a matter of chance which particular subjects are in the sample that is befng studied, and the sampling variability is thus random variation which is determined by the taws of probability. Therefore, the inferences are expressed in terms of probability. The situation is illustrated below. Population I f- - - - - - - sampling variation Sample data - - - - - - uncertainty Inlerences on population Taking a samplt• from the population involves sampling variation. As a conse- quence of thit, inferences from the sample data back to the population~ involve uncertainty. A statistical analysis may be thought of as asking questions of the data. In an investi- gation that compares two groups for the mean value of. for example, blood pressure or the prevalence of some disease, three questions may be posed: Is there a difference between the groups?: How large is the difference?; and How accurately is the size of the difference known?. As erpressed, the first question expects the answer, "yes"'or "no": although the answer cannot be given in, precisely these terms, itt is often rcduced~ to two possibilities. The appropriate methodology is the significance rest. The second question expects a numerical! value to be the answer. This is an estimate and, as it is a single value, is referred to as a point estimate. in effea, the third'question~ asks how reliable this point estimate is: the answer is a range of values which iis referred~ to as an interval estimate or a confidence interval: These questions represent two approaches to inference: hypothesis testing and~ estimation. Although at first sight they appeartobe quite different. in concept they have much in common. Both make inferential statements about the value of a parameter. (ik parameter is an unknowmy quantity which partly or wholly characterizes a population, for, example, a mean or a measure of association.) The significance test is an appropriate technique when there is an a priori hypothesis to test. For the purpose of the statistical test this hypothests is expressed in nuffform - such as whemo no difference exists between; groups - and the test evaluates whether the data are consistent with the null hyptxhesisf tf the data differ markedly from thosrwhich would be expected under the null hypothesis, to the extent that the probability of such an extreme result is low, then it is said that the result is statistically significant. Probability is measured on a continuum between 0 and I, but in significance testing a probability is considered low if it is less than conventionali values such as 0.05 (J4.) or 0.01 (1%). A significant result is equated with the reyacsion of the null hypothesis or the claim of a real effect. By definition, when the null hypothesis is true, significant results will occur by chance with the same relative frequency as the signifieance probability. That is, real effects will be claimed when the null hypothesis is true; however, the proba- bility of this error (type I) is determined in the data analysis. One disadvantage of a significance test is that: it may fail to detect a real effect:'that is, although the null hypothesis is false, the evidence is not strong enough to reject it. The probability' of this error (type 11) can be controlled' at the design stage only, by appropriate selection of the satnple size, and may be quite large. Thus, the trap of equating non-sitnifrcance with no effect must be avoided; failure to reject the null hypothesis is not the same as accepting it. In the approach of confidence interval estimation no particular hypothesis is consi- dered: rather„the emphasis is on estimatingg those values of the parameter withwhich,the data are consistent. These valhes form a range - the confidence interval. The range is calculated so that there is a high proba- bility - conventionally 95*t9 or 99'f. - that it contains the true value of the parameter. A significance test is essentially a test of whether the data are consistent with a specified parameter value, and the confi- dence intervali contains those parameter valucs with which the data are consistent. Therefore, a Srtsignificance test,and a 95% confidence interval': contain some infor- mation ir. common: significance implies that, the null hypothesis value is outside the confr- dence interval; non-siSnificance implies that the null hypothesis value is within.the confi- dence interval. However, the confidence inteeval contains more information because it is equivalent to performing a significance test for all values of the parameter, not just a single value. A confidence interval enables a reader to see how large the effect may be. not simply whether it is different from zero. The limitations of the interpretations that are provided'by a significance test may'now be considered. The difference is sisnifrcanr:. This means that there is a difference or„in otherwordsr the size of the difference is not zero. We know no more than this. The difference may J t
Page 2: ujc02a00 Log in for more options!
THE MEDICAL JOURNAL OF AUSTPALIA Vol. 144 June 9, 1966 be large and of great importance or it may be small and of no practial importance. It is t•r,umdactory that the tea provides no way of distinguishing between these quite different possibilitia. The d(fJerrnor Is nor sijeljuvnf, This means that there is insufficient evidence to enable us to conclude that there is a difference. So the difference may well be zero. But this is not: the satae as vying that it is zero. The true difference may be quite large. Again, it is unsatisfactory that this possibifity is ijot addressed. The coeciusicns that may be drawn from a significance test are considered to be incomplete because it is rarely that one is interested solely in whether a null hypothesis is or is not true; indeed' in many cases it may, be recognized at the outset that the null hypothesis is unlikely to be ttue.,Rather, the question is how large is the difference and: is it possibly large enough to be important? The emphasis is on measuring rather than on testing. The addition of the concept of an important difference to that of a null hypothesis means that there are four possible interpretations to an analysis: (a) the difference is significant and large enough to be of praRical iinportanoe; (b) the difference is significant but too small to be of practical importance; fc1' the difference is nott significant but may be large enough to be importantt and fd1 the difference is not significant and also not large enough to be of practical importance. pHtert.nc• Ynportant NuM' 0 hypot6.a. The size of differeace that is considered to be large enough to be important is a matter for debate, and genuine differences of opinion may arise. It is a tnedieal, not a statiuial, question, ahboujh a sssedsal statistitzatt who is esperienoed in thesubject area could contribute to setting a value. The fact that agreratent on a unique value may be impossible in no way detracts from the argument. In fact. expressing the results as a confidence tnterval enables interpretations to be made for any particular value that is considered appropriate. These possibilities are illustrated in the Figure where the confidence intervals are shown. The significant and non-significant cases are distinguished by the confidence intervals that exclude or include zero respec- tively. The main point is that in each case the confidence intervali gives the range of possible values for the true difference. Of particular concern is Ic1. Here ther: rttay be no true difference or there may be a luge, important difference. In other words the study is completely inconclusive. Such a possibility is missed by the simple expression "not signifianr" with its lure of equating this falsely with "no effect". This situation will arise with a studythat is carried out on too small a sample and this is why good study design demands attention to sample size to try to prevent the occurrence of an incon- clusive result. Altman found that it was common for undue emphasis to be placed on "negative" findings from small studies,' ta (b) tb) td) L ~ l l SIGNIFICANT NOT SIGNIFICANT Nnportant Not Important Inconclu.iw Tru• n.p.tJv raault FIGt/RE Conhdence intervals show.nS Jour ppss+ble conclusions in terms of stattttrcalsrgndrcance and practtcal'xttportrnce. 619 while Freimen et al. noted that •'nesative'• trisls were often too sasall to aonai:ute a fair teu of tbtmrpies.' Similarly, a ssgniGcance test will contrast (b).s significant and (d) as not sijnifiaar but fash to rec+t>Ssia tmt they give essentiaQy the tsme eoodmion - d.f any difference is too small ~to be iasportant. As an example., consider some results which were obtaiaetf by Garraway et aL from a dinial trial' for the -agraseat of arwr stroke in the elderty.' Of 155 puieau who were tssaaaged in a txroke tmtt. 73 were asxsssd as independeat when tbry wen discharged front the trnft compared with 49 of 132 who wert: maaaged in a med"l tsust. The simplest analysis shows that the difference betweefl the sneass raw of the two units is stipsific"t at the l% levd. Therefore, a genuine effect has beea estab- lished. To appreciate the importanca of this effect the advantage of the svoke unit may be measured by the difference bet..eea tbe two units in the percentage of tubjea.s who were discharge& as independent: 30.3% - 32.2% - 18. 1 %. This is the poiiu estimate. The aaurae7 of this iesditnue is given by its staadard erro>r (5.5) and the 95% confidence limits (/.3% and 2g.9%).'iaus, the gain could be as large as 29'h or as small as 7%. Recently, Gardner and Ahtnan have arstted against the eaarsive use off hypothesis testing and urged a Qeater use of confidence intetvds,' In an appendix to their paper they give methods to calculate confidence intervals for the commonly occurring two- sample comparisons. in presenting the main results of a study it is good practice to provide confidence intervals rather than to restrict the analysis to significance tesa. Only by so doing can authors give readers sufficient information for a proper conclusion to be drawn; otherwise readen have to rely upon the authors' own interpretation.' Therefore, intending authors are urged to express their main conctusions in confdertee interval form (possibly with the addition of.a siPifiance test, although strictly that would provide no extra information). One of the aims of the )ournal's statistical review process will be to ensure that where possible this is done. GEOFFREY BERRY Associatc Profesnar or Bioaaustio School of Public Heatth nad Tropieat Medtcine The Utiiver0ty of Sydney I. Healy M1R. It uatma . tnenre:'J R SurraSor A. 1971;, 1at: 3aS•31J. 2. Aheua DG. Stauwtra Is awd+cat )oarnaL: Sta MsI 1912..1 : 5901. 1. Frerean /A. Cbalr.rs TC., Smith H it. Xa01er RR. Tlr.unponvtct of Ea.. tAc rypr 11 aror aeG.rapie ua n ,the ora+P and sourprem+m uf uye rasdamootmC control trut. N Ewrt fM.d 1911; 299.' NOY9s 4 . Grvrs.,ay.wM. Akhw AJ. Prercou Rl. HocYer L. Mwaernem of sc+wr r.rde to tBr efoaty: trebutuisry rewhf of . toarolled trul. MMed'!' 19a0; 200: IW4t0a3. 3. C+rdne. MJ. Altmao DG. Confdma war.ahntAn ttue P.aluncaueutonP ruAer tBaa Eypotbau «are{. A. Ned 1 19R6; 292: 74&750.

Text Control

Highlight Text:

OCR Text Alignment:

Image Control

Image Rotation:

Image Size: