Jump to:

Philip Morris

Statistical Significance - A Misconstrued Notion in Medical Research

Date: 19970000/P
Length: 4 pages
2063633796-2063633799
Jump To Images
snapshot_pm 2063633796-2063633799

Fields

Author
Nurminen, M.
Type
PSCI, PUBLICATION SCIENTIFIC
BIBL, BIBLIOGRAPHY
Area
CARCHMAN,RICHARD/OFFICE
Litigation
Iwoh/Produced
Characteristic
EXTR, EXTRA
MARG, MARGINALIA
Site
R530
Named Organization
Finnish Inst of Occupational Health
Author (Organization)
Finnish Inst of Occupational Health
Scand J Work Environ Health
Named Person
Nurminen, M.
Nurminen, T.
Master ID
2063633486/4072
Related Documents:
Date Loaded
07 Jun 1999

Document Images

Text Control

Highlight Text:

OCR Text Alignment:

Image Control

Image Rotation:

Image Size:

Page 1: 2063633796 Log in for more options!
07,~08 C, ommentaries SA Neuc~axeL ~ UOZ/UU~ Scancl J Work Environ Fleaffh 1997;23:232--5 Statistical significance---a misconstrued notion in medical research' by Markku Nurminen, PhO2 Nurminen M. Statistical signi~|cance --a misconstrued notion in medical research. ScandJ WorlcEnviron Hearth 1997;23(3}:232--5. "]'he P-vatue iS the siggificanc¢ probability or obtaining a value of the r~t sr~.isfic tha~ iS as =xtrcm=. in rglafion m the null hypothesis, a~ tha~ observed. Medical reseatche/~ may, in sort~ simadon$, disagree on im app~priam use or on i,~ in~-pm=tion as a summary measure of consistency with t~ null hypothesi.~ ia a p~icular da~a sex. [v~or¢ informative statistical measu~s such as the likelihood raxio and the Bayesian post~tlor prd~abilhy hav~ b~en suggesw.d for drawiltg infcrenc.~ from ¢linic~ trials and cp|demiologic studies. Causal inf~-r.nc¢ is sr.=tlstica.[ in nature; rather it su'ive$ to provide scientific cxplanatior~ or criticisms of [:m0poscd explanations would describe the ob~,e~-ved dam pa~t~'n. In this comext, it is important to remember that a finding nmy not b¢ medically importanr~ or ~= cau~l hypothesis may even not b¢ u'ue even ira study shows • significant P.value. "Smfisdcally significant" is a chronically misinterprew.d concept in clinical u'iMs and epidemioloKy. The miscon- ception can be caused by both the confusing w_zminology and the difficult theory of statistics. The standard guage word "significant" has a special meaning in statis- tical research: the consistency of dam with a hypothesis is measured by the "significance probability" or the value. Finney (1) has proposed that one should always add thc adverb "statistically" in conjunction with the word "significant'" 'whenever its meaning could other- wise be in doubt. Significance testing prevails as a general method of analysis in medical research although the overemphasis on the use of r.he P-value has long been criticized (2). Researchers obviously believe that it is not worthwhile to submit a manuscript for publication to a journal unless it contains a significant P-value. Significance testing is an apparently objective way to decide whed-~r a called null hypothesis= (¢g. treatment A is as good as treatment B) remains valid or should be rejected and the study hypothesis be acceptc.d in i= place (eg, treatment A is be~zer than rxeatmcnt B). [nsmad of the P-value. computations of more informative statistical measures have bc~n suggested. Such smtisdcs include the P-value function (3). which yields the significance of also other hypotheses and not merely that of the null hypothesis, l 3 and the likelihood ratio test (4). which compares 2 rival hypothcsc~. Some scientific journals (cg, Cancer Re- search) instruct the authors to indicate the significance of their findings using an appropriar~ statistical analysis. Other journals, such as the British Medical Journal (5) and The Lancer (6), have nmommended that significance testing be replaced by the computation of a confidence interval. Certain statisticians reject significance testing categorically. (See, eg, referene~ 7.) Respected epidemi- ologists like Rothman (8) and Greenland (9) would not ban significance tcst~, but they hold the view that the tests appear to have produced much more harm than good in social and health sciences. The traditional C'frequentist") Neyman-Pearson school of statistics and the alternative Baye~ school in- terpret the notion of probability underlying the test of significance diffcmndy. The frequentist statisticians de- fine the P-value ~L~ the probability of the observed out- come in a study plus the probabilities of the morg ex- treme (urtobserved) outcomes ~ that is. va a relative frequency, or proportion, in large sampIes-- assuming that the obseawadons are gengTar~l according to a given probability model. The P-value measures whether a nuII hypothesi~ is compatible with the data or not. [t is, how- ever. totally contrary to the spirit of signifi,-an¢¢ testing to compare the P-value with preset Iev¢l.s. which axe This commentary was published in Finnish as an editorial in Duode¢im col 113. no 4, 1997. Department of Epidcmiology and Biomvtry. Finnish Institut= of Occupational Health. H,Isinki. Finland. Null hypothe.sfs is an exact statistical formulation for fix: studied assumption (hypotlmsis) to be incorr~=t: for example. ~he differenc, of 2 groups" mean values equals 0. Assuming that the null hypothesis prevails, one can make deductive inferences about the correctness of th~ study hypothesis, which is ofmn formulated in lcss exact t~rms. Reprint roquests to: DrM Nurminen. Finnish Institute of Occupational HeaidL Topeliuksenkatu 41 a A, FIN-002S0 Hclsinki. Finland. 232
Page 2: 2063633797 Log in for more options!
conventionally chosen as 5. I or O, 1%. and to interpret the result in a rigidly different manner depending on whether the P-value is below or above a cer~n level. [These reference levels am often markecl with I, 2 or asterisks (*), but they do not need to be considered in [he s~c light as the Stars indicating thc quality of hotels.] Significance testing is not ro be regarded as decision- making but as statistical inference. Occasionally onc sees the frequenrist P-valuc being interpreted as giving the probability for d3e stamment uhat "the null hypothesis is true" or that "'the result is a random finding". The former intc'q~ezation is su~ly wrong because the computation of the P-value explicitly assumes that the null hypothesis is urns. The latter interpretation is problematid since, in a frequentisz ana/ysis, one can never infer definitely whether a single hypothesis pertaining to the considered paran~c~ (eg, the difference between mean values) is true or not or whether the unknown value of the studied patamemr lies within, say, the 95% confidence interval compur~l from a particular experlmenmi mamrial. The frequenfist statisticians can only suite ~har, if the experi- rr~nt were repeated sufficiently many rimes, then ap- proximately 95% of the compu~d intervals (which are stochastic variat~s) would cover the true value of the studied parameter (which is a constant of nature). In the |nterpretadon of the P-value one must also consider ~he amount of information ¢onr.alned in the data (the "powcx" of a test,). Mie~nen (10) provides the fol- lowing guidelines for interpretation: (i) if information is very sparse, one should.not analyze the data at all; (ii) if information is very ample, ~ P-value is too sensitive be us~t'ul and. instead of testing, one should estimate ~he magnitude of the effect; (iii) if the amount of information is neither ve~ sparse nor vet7 ample, one may infer that (a) a very small P-value supports the study hypothesis, (b) a small P-va/uc does not discriminate bet~vecn the study hypothesis and the null hypothesis, and (c) a mod- erately or especially large Pova/uc is re|naively less c~n- sistent with the study hypothesis than with the null hy- pothesis, which spe.ak~ for the refutation of the study hypothesis. A Bayesian statistician overcomes the interpretative pmblems of ~ignificance testing by viewing probability as a degree of persona/ belief of the correcme,~s of a study hypothesis. This subje~:tive probability is based on inv~tigative foreknowledge regarding the uncertainty of LI~ study hypothesis and the preconception which one has of it and which will be modified via a model as empirical evidence accrues. Bayesian statistical ~eory produces a "posterior" probability distribution of the studied hypothesis, by means of which one can induc- tively state, for instance, that "with a 95% probability treatment A is mort c~ccdve than treatment B at least in 10% of the cases and at most in 20% of the cases." Diff~.nt experts will oftcn have different preconcep- tions of the cacdibili~y of the studied hypothesis, but in a Bayesian analysis these prior beliefs can naturally be refitted by the presentation of several prior distribu- tions in the context of the ~ame data (I 1), The Bayesian approach also avoids the ft'¢quen~ist problem r¢la~ed ~o the t~ting of multiple hypotheses in 1 data see or the simultaneous testing of a single hypodlcsis in many subsets of dam. For example, when one studies the differential diagnosis of malignant mesothelioma and lung caxcinoma with the aid of genetic alterations, one can examine 10"s of different chromosome changes. Fmquendst statisticians try to control the occurrence of false significant findings by applying more stric~ levels of significance. By using this procedure, for example, a significant difference (P = 0.004) of ~hc frequency of changes observed in chromosome 22 between patient group~ becomes nousignificant if one accounts for the respective tests made for the chromosomes 1 ..... 21 in the same investigation and corrects th~ 5% el'ideal level m 0.05/21 - 0°0024. According to the Bayesian way of thinking there is no reason to correct a particular P-va[ue merely because other variables were also considm~ in the same study, The Bayesian solution of ~he problem is to de.fine the prior joint likelihood of the mutually de- penden~ hypotheses, which would appear to he a sciemif- ically more rational proc~iure ~tan a mechanis6c correc- tion of d~c P-value. The specification of the prior likeli- hood function is. however, a challenging data-analytic rusk. especially in problems involving many parameters (12). Frequendst inference is thus problematic. Why isn't everyone ~hen a Bayesian ~13)? The answer is dictated by practice. For example, ra~ Bayesian likelihood ratio test is harder to compute than ~e frequentist significance test. Ten years ago. Bayesian analytic solutions of even [he simples~ epidemiologic problems were difficult ~o tackle (14). Nowadays, however, simuhttion modeling techniques make ~he performance of a Bayesian analysis possible also in more complex biomedical applicaxions (1~), in which the frequentist and Bayesian analyses do not necessarily resuk in the same inferences (16). This being the case, th~ Bayesian methods will inescapably be used in clinical medicine and epidea~iology (17). During the u'ansition period, medical scientis~ should prepare for the change by familiarizing themselves with the Bayc- parameter is a quantity which partly or fully determines a probability disu-ibution. A pazameter is not dix~tly me~urable, but. using a disedbution mo~i, one c~n descril~ the k3nd of samp.le.~ a.~,~,oc~ated with p_m'ti~.u.lm" va]ue.~ of parameter. Considering the compatibility of the da~a with the model one can estimate the mosTlikcly value of the param~cr. Scancl J Work Environ He.~l~i~ 1997. v~ 23. no 3
Page 3: 2063633798 Log in for more options!
sian mezhods (18): The natural simplicity of the Baye- sian concepts is appealing. The role of su~.isdcs in cause-effect studies depends on the study design. The traditional r~eory of statistics was created for randomized experiments. Thus in clini- cal ~'ials, in which the ~rcnrment of patienLs is rand- orrdzed, the results produced by customary analysis (d~e P-value, the confidence intecval of a paramemr, the like- lihood ratio) are interpretable quantities from the point or" view of causal inference~.I~ nonexpecimen~tI (eg, epid¢- miologie) studies, in which the exposure of pe~ons is not candomized, probabilisdc interpretzdons of conven- tional statistics are not neccssaxily just/fled and can lead to incon'ec~ infesences of nonrandomized stucHes Can thus, for exarapl¢, the P-value be interpreted in any n-oasonable way in nonrandomized studies? As a remedy m this problem, Greenland (20) suggests that, in d~e dam analysis, one should separam the following aspec~ from each other (i) the description of the data vm~ability by means of graphic dLsplays or simple summaries, (ii) ~he profiling of di~budons or relations being sought from the da~a in comparisons with statistical models (pattern recognidon, dat~ smoothing), and Off) scientific infer- once. Grcerdand C20) contends that statistical analysis is limited to stage 2, in which a s~adsdcal measure, "such as a P-value is not a dam su~; rather it is a convolu- tion of the dam wkh some model or preconceived notion about the proce~ t2~t generated the data [p 227]". One should use modern techniques of statistical analysis to examine the impact of cfiscrepant observations on the outcome measures (influence analysis) and the effects of departures from model assumptions on the stability of rJle t~ndings (scusRiv~ry analys/s).~C, ausa/int'crcnce is statistical by naRLre: rather it strives to (i) determine ~icntific explanations that would e~plaln ~he results statistical analyses in a logically coherem way and (ii) criticize proposed explanations thac ~ould no~ tend co observed data pa~etn (20). ~7 "Clinical significance" is detcrrrdned in population studies, for example, as the magnitude of the difference in the mean values between d~ experimental group and the comparison group. In large population groups even a small difference becomes artistically significant, where- as in small samples a clinically significant observation c~n remain s~t/s~cally non.~ignificam. Two recent co- hort studies on rc'produc~ve health ~urn~sh examples of survey~ in which the size of the n-Btcfial was a c~m factor. The notable sample size (over ~000 people) of a Danish study. ~2 D permitted the expression of mini- real differences, where.ca, in an American study (2~), the small number of exposed persons (only 27) prevented the presentation of ~t2"e~ences that were not b~g. On the other hand, although the difference wou|d be small on the group level, a efi~cal fining can be of decisive importance l'oc some persons who belong ~o a r/sk group, in a Finnish epidemiologic study (23), the risk of dying of coronary disease in a cohort oi" 3;3 industrial workers exposed to carbon disulfide was over 2-fold reladve to the risk of a same-sizod, individually ma~ched comparison cohort. The researchers discussed sever:q biochemical mechanisms tha~ would cxpia~n why carbon disulfide exposure c~uscd the incresscd risk of coronary monalitT. ~ possible indirect mechanism might have been high bl~ pressure. On a group level, the difference~ in the mean values of the subjects" blood pressures were stadsdcatly significant, although the femnces were relatively small [difference in systolic pres- sure 8 mm Hg (1.I kPa) and in diastolic pressure ~.5 mm Hg (0.5 ~Pa)]. If a worker had, in addition, other risk faczors, even a minor elevation of blood pressure could be a danger. The resem'chers esr.imated thaz high blood pressure was a causal factor ~n every 6th death due to eoronacy hcaz'z disease, which was originally caused by cazbon d~sulfide exposure It is noc very reznari~ble if a targe study produces a statistically significam result. The finding can b~ medi- tally important only if one's colleagues still bdieve in zhe result afzer having re~! the discussion o~ i~ signifi- cance without reference to P-values. Acknowledgments I drank Tuu~a Nurminen for her valuable commcms. References I. Finney DJ. On biomeL~c i0n~Ja~ and i~ uses. Biota Bull I~:~ 1:2~. 2, Ya~ F. ~c influ~ of "Smd~ M~s t~r ~ ~O~" ~ the d~elopmcn¢ of ~e ~i~ce of smfi~s. Am S~ ~ t95h46:19~. ~. Cox DE. ~c mI~ of sig~ifi~ m=. ~nd J S=t 4~70. 5. ~gman MIS. Tow~ ~d~d~ ~d =onfidcn~ limi~ [~t~ail, B~ 1986~92;716. 6. L~L g¢~ wl~ ~n~de~ [~i~oda/~. Lance~ 1987; I (853 i 7. 0~= M, S~fi~ bf¢=~. ~ut Hill (MA): Epide~- oIo~ R~u~aq [~./990. ~mwn and comply. [986. 9, G~l~d S, Forcwo~. in: O~es M. Smtisti~l /nfcmn~ Ch~nu[ Hill (HA): Epi~o]ogy ~sourc~ Inc, 1990: vii--viii. mncc ~h in m~dne. Al~ny (NY): ~im~ ~c. I985. Lil~d ~. Bmunhol~ D. ~ sm~s~l b~xis of public cy; a p~igm shiR ~ ove~. B~ 19~ Di~n ~. Sign, R. B~y~t~ ~ analysis in a ~lo~l ~rc~nlc~ ~=1. Smt M~ ]~2;t 1:13~22. ~n B. ~y ~n'£ eve~o~ a Bay~ (wi~ discu~sion}? Am S[~ t986;~: I~l I. 10. 11. 12. 13.
Page 4: 2063633799 Log in for more options!
Sca~cl J Work Environ Hea/th 199;'. rot 23. no 3 235

Text Control

Highlight Text:

OCR Text Alignment:

Image Control

Image Rotation:

Image Size: