Philip Morris
Quantitative Evaluation of Multiplicity in Epidemiology and Public Health Research
Fields
- Author
- Ottenbacher, K.J.
- Type
- PSCI, PUBLICATION SCIENTIFIC
- BIBL, BIBLIOGRAPHY
- FOOT, FOOTNOTES
- BIBL, BIBLIOGRAPHY
- Area
- CARCHMAN,RICHARD/OFFICE
- Litigation
- Iwoh/Produced
- Characteristic
- EXTR, EXTRA
- MARG, MARGINALIA
- Site
- R530
- Named Organization
- Univ of Tx
- Society for Epidemiology Research
- Bureau of Maternal + Child Health
- Hhs, Dept of Health and Human Services
- Mcj
- Society for Epidemiology Research
- Author (Organization)
- Am J Epidemiol
- American Journal of Epidemiology
- Johns Hopkins Univ
- Univ of Tx
- American Journal of Epidemiology
- Named Person
- Ottenbacher, K.J.
- Master ID
- 2063633486/4072
Related Documents:- 2063633486-4072 Book 7 Tabs 1-68
- 2063633488-3498 Predicting Rodent Carcinogenicity From Mutagenic Potency Measured in the Ames Salmonella Assay
- 2063633500-3505 Workplace Conditions, Socioeconomic Status, and the Risk of Mortality and Acute Myocardial Infarction: the Kuopio Ischaemic Heart Disease Risk Factor Study
- 2063633507-3510 Environmental Exposure to Gasoline and Leukemia in Children and Young Adults - An Ecology Study
- 2063633512-3530 Behavioral Functions of Nucleus Accumbens Dopamine: Empirical and Conceptual Problems with the Anhedonia Hypothesis
- 2063633532-3543 the Use of A Urine Mutagenicity Assay in the Monitoring of Environmental Exposure to Genotoxins
- 2063633545-3553 Smoking and Relative Body Weight: An International Perspective From the Who Monica Project
- 2063633555-3562 Aromatic Amine Dna Adduct Formation in Chronically-Exposed Mice: Considerations for Human Comparison
- 2063633564-3570 Life-Style Factors and Female Infertility
- 2063633571 Sensitivity of the Relation Between Cumulative Magnetic Field Exposure and Brain Cancer Mortality to Choice of Monitoring Data Grouping Scheme
- 2063633573-3584 Genetic Risk Factors for Chronic Obstructive Pulmonary Disease
- 2063633586-3593 Risk Factors Associated with the Development of Peripheral Arterial Disease in Smokers: A Case-Control Study
- 2063633595-3609 Self-Regulation and Mortality From Cancer, Coronary Heart Disease, and Other Causes: A Prospective Study
- 2063633611-3620 Dna Damage in Nasal Respiratory Epithelium From Children Exposed to Urban Pollution
- 2063633622-3630 Co-Carcinogenic Effects of Various Agents in Rats Following Exposure to Radon and Radon Daughters
- 2063633632-3638 Genetics and the Origin of Species: An Introduction
- 2063633640-3647 Subjective Indoor Air Quality in Schools in Relation to Exposure
- 2063633649-3662 the Nurses' Health Study: 20-Year Contribution to the Understanding of Health Among Women
- 2063633664-3671 Polymorphisms of Cyp1a1 and Gstm1 Influence the in Vivo Function of Cyp1a2
- 2063633679-3681 Abc of Allergies Asthma and Allergy
- 2063633683-3684 Inflammatory Responses and Coronary Heart Disease the 'dirty Chicken' Hypothesis of Cardiovascular Risk Factors
- 2063633685 Consultant Suspended for Not Getting Consent for Cardiac Procedure. Mmr Vaccine Policy Is Backed
- 2063633687-3690 When Can Odds Ratios Mislead?
- 2063633692-3699 Increased Responsiveness of Ventral Tegmental Area Dopamine Neurons to Glutamate After Repeated Administration of Cocaine or Amphetamine Is Transient and Selectively Involves Ampa Receptors
- 2063633701-3703 Association Between Cigarette Smoking and Fhit Gene Alterations in Lung Cancer
- 2063633705-3712 Genetic Testing for Susceptibility to Adult - Onset Cancer the Process and Content of Informed Consent
- 2063633714-3721 Release of Carbon Granules From Cigarettes with Charcoal Filters
- 2063633723-3731 Detection of Low - Fraction K-Ras Mutations in Primary Lung Tumors Using A Sensitive Method
- 2063633733-3740 Socioeconomic Level, Sedentary Lifestyle, and Wine Consumption As Possible Explanations for Geographic Distribution of Cerebrovascular Disease Mortality in Spain
- 2063633742-3750 Air Pollution and Daily Admissions for Chronic Obstructive Pulmonary Disease in 6 European Cities: Results From the Aphea Project
- 2063633751 Airway Obstruction and Rheumatoid Arthritis
- 2063633753-3756 Relationship Between Acetylator Status, Smoking, Diet and Colorectal Cancer Risk in the North-East of England
- 2063633758-3763 Cardiovascular Risk Factor Profile in Subjects with Familial Predisposition to Myocardial Infarction in Denmark
- 2063633765-3770 Effect of Fresh Fruit Consumption on Lung Function and Wheeze in Children
- 2063633772-3777 Interactive Effect of the P53 Gene and Cigarette Smoking on Coronary Artery Disease
- 2063633779-3784 P53 Gene Aberrations in Non-Small-Cell Lung Carcinomas From A Smoking Population
- 2063633786-3794 Interlaboratory Comparison of Pm10 and Black Smoke Measurements in the Peace Study
- 2063633796-3799 Statistical Significance - A Misconstrued Notion in Medical Research
- 2063633801-3808 Urinary 1-Hydroxypyrene As A Marker of Exposure to Pyrene: An Epidemiological Survey on A General Population Group
- 2063633810-3813 Genetic Polymorphism of Cytochrome P450 As A Biomarker of Susceptibility to Environmental Toxicity
- 2063633815-3824 Smoking Among Psychiatric Patients
- 2063633826-3831 Evaluation of Certain Risk Factors for Lung Cancer in Cracow (Poland)
- 2063633833-3840 Prevalence and Predictive Value of P53 Mutation in Patients with Oesophageal Squamous Cell Carcinomas: A Prospective Clinico-Pathological Study and Survival Analysis of 70 Patients
- 2063633842-3848 Ki-Ras Mutations in Exocrine Pancreatic Cancer: Association with Clinico-Pathological Characteristics and with Tobacco and Alcohol Consumption
- 2063633850-3859 Risk Factors for Raynaud's Phenomenon Among Workers in Poultry Slaughterhouses and Canning Factories
- 2063633861-3880 Molecular Events in Lung Carcinogenesis
- 2063633882-3885 Cyp1a1, Cyp2e1 and Gstm Polymorphisms Are Not Associated with Susceptibility to Squamous - Cell Carcinoma of the Esophagus
- 2063633887-3891 the P53 Tumor Suppressor Targets A Novel Regulator of G Protein Signaling
- 2063633893-3896 New Tumor Suppressor Found - Twice. Prepaper Publicity Ignites Race to Publish. Shape- Changing Crystals Get Shiftier
- 2063633898-3899 Who Reform and Global Health
- 2063633901-3903 Showdown Over Clear Air Science. Puzzling Over A Potential Killer's Modus Operandi
- 2063633905-3910 Polymorphisms in the Glutathione S-Transferase Class Mu and Theta Genes Interact and Increase Susceptibility to Lung Cancer in Minority Populations (Texas, United States)
- 2063633912-3927 Plant Foods and Colon Cancer: An Assessment of Specific Foods and Their Related Nutrients (United States)
- 2063633929 Smoking, Alcohol and Coffee Consumption, and H Pylori Infection
- 2063633931-3934 Grand Rounds at the Clinical Center of the National Institutes of Health Evaluating Coronary Heart Disease Risk Tiles in the Mosaic
- 2063633936-3939 New Clues to Asthma Therapies. Why the Rise in Asthma Cases? New Lead to Safer Marrow Transplants
- 2063633941-3946 Cancer Undefeated
- 2063633948-3964 Lung Tissue Responses and Sites of Particle Retention Differ Between Rats and Cyanomolgus Monkeys Exposed Chronically to Diesel Exhaust and Coal Dust
- 2063633966-3986 Implementation on Epa Revised Cancer Assessment Guidelines: Incorporation of Mechanistic and Pharmacokinetic Data
- 2063633988-3999 Particle Pollution and Sudden Infant Death Syndrome in the United States Policy Memorandum
- 2063634001-4007 Neighborhood Social Environments and the Distribution of Low Birthweight in Chicago
- 2063634009-4014 the Effects of Cigarette Smoking and Gestational Weight Change on Birth Outcomes in Obese and Normal-Weight Women
- 2063634016-4017 Annotation: Cigarette Smoking, Nutrition, and Birthweight
- 2063634019-4020 Helicobacter Pylori Infection and Coagulation in Healthy People
- 2063634022-4023 Prospective Study of Helicobacter Pylori Seropositivity and Cardiovascular Diseases in A General Elderly Population
- 2063634025-4027 Age Specific Trends in Asthma Mortality in England and Wales, 830000 - 950000: Results of An Observational Study
- 2063634029-4036 Childhood Leukemia and Electromagnetic Fields: Results of A Population - Based Case - Control Study in Germany
- 2063634038-4047 Association of Smoking, Body Mass, and Physical Activity with Risk of Prostate Cancer in the Iowa 65+ Rural Health Study (United States)
- 2063634049-4056 Tobacco and Non-Hodgkin's Lymphoma: Combined Analysis of Three Case-Control Studies (United States)
- 2063634058-4063 How Much Pain for Cardiac Gain?
- 2063634065-4071 A Prospective Study of Body Mass Index, Weight Change, and Risk of Stroke in Women
- Date Loaded
- 07 Jun 1999
Document Images
Volume 147
Number 7
April 1, 1998
ORIGINAL CONTRIBUTIONS
American Journal of
f
EPIDEMIOLOGY
Copyright 0 1998 by The Johns Hopkins Un/versfty
School of Hygiene and Public Health
Sponsored by the Society for Epidemiologic Research
A BRIEF ORIGINAL CONTRIBUTION
Quantitative Evaluation of Multiplicity in Epidemiology and
Public Health Research
Kenneth J. Ottenbacher
Epidemiologic and public health researchers frequently include several dependent variables,
repeated
assessments, or subgroup analyses in their investigations. These factors result in multiple tests of
statistical
significance and may produce type 1 experimental errors. This study examined the type 1 error rate
in a sampl~
of public health and epidemiologic research. A total of 173 articles chosen at random from 1996
issues of the
Amefcan Journal of Public Health and the American Journal of Epidemiology were examined to determine
the
in.c, idenc,,e of type 1 en'ors. Three different methods of computing type 1 error rates were used:
expedment-
w=se error rate, error rate per experiment, and percent error rate. The results indicate a type 1
error rate
substantially higher than the traditionally assumed level of 5% (p < 0.05). No practical or
statistically
significant difference was found between type 1 error rates across the two journals. Methods to
determine and
correct type I errors should be reported in epidemiotogic and public health research investigations
that include
multiple statistical tests. Am J Epidemiol 1998;147:615-19.
bias (epidemiology}; probability; research design; significance tests
Levin noted recendy, "Multiple comparisons are a
very common feature--and, indeed, very often a ne-
cessity-in epidemiologic and public health research'"
(1, p. 628). He went on to discuss various procedures
used to protect against type 1 errors, including the
commonly used Bonferronl method and a procedure
developed by Hol~kin and G-ensler (3) argue
that the Holm-adjusted p value should be routinely
used to reduce the type 1 error rate in studies involving
multiple statistical tests.
Received for publication February 10, 1997, and accepted for
publication October 10, 1997.
Abbreviations: EP, error rate per experiment; EW, experiment-
wise error rate; PE, percent error rate.
From the University of Texas Medical Branch at Galveston,
Galveston, TX.
Reprint requests to Dr. Kenneth J. Ottenbacher, SAHS, Rm.
4.202, University of Texas Medical Branch, 301 University Blvd.,
Galveston, TX 77755-1028.
Problems involving multiple statistical-testing of
hypotheses in health care and medical research arise
for the following reasons: 1) the repeated analysis of
accumulating data; 2) the use of multiple dependent
measures; and 3) the analysis of data from subgroups
(4). All three of these practices are common in public
health and epidemiologic research. For example,
Godfrey (5) demonstrated that researchers frequendy
present and analyze means from several groups within
the same study. She found that the most common
method of statistically comparing several means in-
volved the use of multiple t tests. Godfrey correctly
argued that the use of urtivariate statistical procedures
to analyze the results of studies containing multiple
contrasts was inappropriate. Her analysis revealed that
of 50 articles examined from the New England Journal
of Medicine, a majority (54 percent) used improper
univariate statistical procedures to analyze differences
between sub~'oup means.
615

616 Ottenbacher
The use of several dependent variables in the anal-
ysis of data from a single sample also results in mul-
tiple statistical tests being reported. The complex na-
ture of epidemiologic and public health research has
led investigators to routinely include multiple depen-
dent variables in their investigations (6). An epidemi-
ologic researcher may be interested in the effect of a
particular intervention on dependent variables such as
weight, blood pressure, hematocrit, and serum cho-
lesterol values in a sample of patients. As the number
of dependent variables increases, so does the number
of statistical tests. When this occurs, the researcher
may obtain positive results on the basis of sampling
error (7).
hlumerous clinical researchers have suggested that
multiple hypothesis testing without adjusting for
inflated type 1 error rates is a common problem
in medical and public health research (8-10). The
purposes of this investigation were: 1) to examine
the extent of the multiple testing in epidemiologic
and public health research, and 2) to determine
the prevalence of type 1 errors in a sample of pub-
lished research.
ME'FHODS
Five issues of both the American Journal of Public
Health and the American Journal of Epidemiology
were randomly selected from the journal issues pub-
lished in 1996. Each individual article was examined
to determine the experiment-wise error rate, the error
rate per experiment, and the percent error rate (see
descriptions of error rates below). All articles that
reported tests of statistical significance were included
in the investigation. Articles that summarized the re-
suits of previously published research and articles that
did not report statistical significance tests were not
included in the analysis.
Experiment-wise error
The overall experiment-wise error rate (EW) is the
probability of making at least one type 1 error for
the collection of tests performed in the investigation.
The experLment-wise error rate can never be smaller
than the error rate per comparison. The relatidn of
per-comparison and experiment-wise error rates de-
pends on the degree of statistical dependence of the
tests. For totally independent tests, the experiment-
wise error rate is equal to 1 - (1 - a)c, where c is the
number of independent tests and a is the error rate per
test (traditionally 0.05 or 0.01). From this equation, it
is apparent that experiment-wise error rate increases
rapidly with the number of h.ypotheses statistically
examined. For example, in a study for which five
statistical tests are conducted at the 0.05 level of
significance, the EW is I - (I - 0.05)5 or 0.23.
Error rate per experiment
The error rate per experiment (EP).is the expected
number of type 1 errors in a particular group of sta-
tistical significance tests and is computed using the
formula EP "= c(¢~), where c represents the number
of comparisons, and ¢~ is the significance level and
remains constant across all tests. For example, given
20 independent statistical comparisons at the p = 0.05
confidence level, EP = 20(0.05) = 1. This means that
at the 0.05 level we would expect one type 1 error in
20 tests of statistical significance. It is important to
note that the error rate per experiment (EP) is an
expected value, while the experiment-wise error rate
(EW), as defined above, is a probability. The experi-
ment-wise error rate for 20 comparisons at the 0.05
significance level is I - (1 - 0.05)-'0 or 0.64.
indicating that the probability of at least one type 1
error occurring among these tests reported as s, ignifi-
cant at the 0.05 level is 0.64.
Percent error rate
The formula for computing the percent error rate
(PE) is PE = lOOccdM, where c is the total number
of comparisons, ~ is the alpha level for a set of
comparisons, and M is the number of statistical tests
less than the designated alpha level The percent error
rate reflects the proportion of results labeled as statis-
tically significant that are likely to be chance results.
As the ratio approaches 1.00 (100 percent), it
indicates that the number of tests found to be
statistically significant approximates-the number of
tests one would expect to find to be significant
purely by chance. As the ratio decreases and
approaches the individual alpha level for a set of
comparisons, it reflects the percent of results that are
attributable to chance. The percent of results Iikely
to be caused by non-chance factors is equal to 100
- PE. For example, if 1 out of 20 comparisons
evaluated at the 0.05 level is statistically significant,
the PE = 100(20)(0.05)/1 = 100 percent, suggesting
that the number of tests found to be significant, that
is 1, is the number expected by chance. On the other
hand, if 4 out of 20 comparisons conducted at the
0.05 significance level are found to be statistically
significant, then PE = 100(20)(0.05)/4 = 25 percent,
indicating that about 25 percent of the results are
expected as the result of chance, while the
remaining 75 percent (three tests) are likely to be
due to non-chance factors.
Am J Epidemiol Vol. 147, No. 7, 1998

Quantitative Evaluation of Multiplicity 617
Rating process
The reporting style in some of the articles made the
determination of the exact number of statistical tests
conducted and the number found statistically sigrtifi-
cant a difficult task. Two independent raters with
research degrees (PhDs) reviewed all articles and
identified both the total number of tests conducted and
the number reported as statistically significant. When
the two raters did not agree, a third rater reviewed the
article in question and the value agreed upon by at
least two raters was used in the analysis. In spite of the
high agreement between the raters (see below), the
results reported in this investigation should be viewed
as approximations of the various error rates rather than
as exact values. A post hoc analysis is necessarily
somewhat arbitrary in determining the number of tests
conducted because the actual number cannot be pre-
cisely determined without direct access to the original
The relation between error rates per comparison and
error rates per experiment is complex with dependent
tests, a condition which may be assumed to always
hold to some degree when multiple statistical tests are
conducted using subjects from the same sample. Stra-
han (11) has argued that, although it may be difficult
to estimate the exact experiment-wise error rate due to
correlation among the variables, it should be clear that
it is greater than 5 percent. When discussing the im-
pact of non-independence on error rates, it is important
to distinguish types of non-independence that may
exist. Ryan (12) originally identified the following
four instances where non-independence may occur.
The first includes all those situations where several
groups or subgroups are statistically compared within
the context of one study. The second case is referred to
as "multiple tests with intercorrelated variables." This
most commonly occurs when researchers compute
multiple correlation coefficients for a single sample.
The third instance of multiple testing is the use of
multiple factors in the analysis of variance. The F
ratios obtained from a factorial analysis of variance
may not be independent if a common error estimate is
used across the tests. Similar problems arise if other
statistical procedures such as multiple t tests are used
to analyze data in what is essentially a factorial design.
The final type of multiple testing situation is what
Ryan (12) referred to as "replicated tests of a single
hypothesis.'" This classification includes studies for
which several different methods of assessing the same
dependent variable are employed,
The situations described by Ryan (12) are not mu-
tually exclusive. They do serve, however, to make it
clear that interdependence between multiple statistical
tests is complex and produced by numerous factors.
Although the lack of independence may influence
error rates, Ryan argues that it is not the main problem
in interpreting error rates. He states that "~'he error
rate per comparison and per experiment are com-
pletely unaffected by independence or lack of it. The
only important factor in these rates is th~ number of
comparisons to be made. Only the experiment-wise
error rate is affected by lack of independence" (12, p.
34). In the case of the experiment-wise error rate, the
more highly related the tests, the closer the experi-
ment-wise error rate is to the error rate specified for an
individual comparison.
In this examination, multivariate statistical tests that
included procedures to control for type 1 error rates
were considered as a single statistical test. This in-
cluded analysis of variance (ANOVA) involving tests
of interaction and accompanying post-hoc procedures
using Scheffe, Tukey, Duncfin, Newman-Keuls, or
other appropriate methods of post-hoe analysis. Each
ANOVA, including the post hoe analysis, was counted
as one statistical procedure.
RESULTS
The 71 articles in five issues of volume 86 of the
American Journal of Public Health and the 102 arti-
cles in five issues of volume 141 of the American
Journal of Epidemiology contained sufficient statisti-
cal information to be included in the analysis. The
interrater agreement for all information coded from
each of the articles was examined using the intraclass
correlation coefficient (ICC) (13). The ICCvalues for
all recorded information ranged from 0.91 to 1.00.
Descriptive information for the experiment-wise error
rate, the error rate per experiment, and the percent
error rate for the articles published in the two journals
appear in table 1.
A comparison of the values for different error rates
illustrates that experiment-wise error rate (EW) and
the percent error rate (PE) have an easier interpretation
than the error rate per experiment (EP), since EW and
PE are essei~tially bounded while EP has no upper
limit. The tabled values indicate that the EW in many
articles is high, revealing a likelihood of type I errors
in the reports, This is not surprising given the stochas-
tic nature of the quantitative analysis of public health
research. The prospect that many articles which report
large numbers of statistical significance tests also re-
port occasional type 1 errors does not seem alarming.
What is of more concern is the percent error rate. The
average individual alpha, level used in a ~ven study
provides a lower bound for the percent error rate.
Thus, for most of the investigations included in the
ao.alysis, 5 percent is the lowest value PE can achieve
given the 0.05 significance level. Yet. in many of the
.4m J E,~idemiol Vol. 1.4-47, No. 7, 1998

618 Ottenbacher
TABLE 1. Type 1 error rates for random articles published in the American JoumalofPub#c Health
and the American Joun~l of Epldemtotogy, 1996
No. Expedmem.,..,~¢~ Enor rate
Pement
Journal of ~ rate per expe~mem enor rate
a~ Mean SO" IVkmn SO
Meen SO
Am J Public Health, Vol. 86
(nos. 3, 4, 7, g, 12) 71 0.68 0.24 0..90
0.57 19.16 9.01
Am J Epiderniol, Vol. 141
(nos. 2, 5, 6, 9, 10) 102 0.70 0.29 0.87
0.51 18.73 9.32
"SD, standard deviation.
studies, the PE indicated that approximately 20 per-
cent or more of the findings may be erroneous. The
average PE for the studies in the American Journal of
Public Health was I9.16 percent, while the average
mean PE for articles in the American Journal of Epi-
demiology was 18.73 percent (table 1).
In a majority of the 173 articles (n = 156), the error
rate per experiment (EP) was greater than 5 percent.
The analysis also suggests that the percent error rate
provides information not specifically contained in the
EW and EP. The correlation between EW and EP rates
for the articles included in table 1 was r = 0.47. The
correlation of PE with EP was r = 0.41 and the
correlation of PE with EW was r = 0.32.
DISCUSSION AND CONCLUSIONS
The problem of multiple hypothesis testing has im-
plications regarding the interpretation and implemen-
tation of epidemiologic research. For example, more
than a decade ago the Food and Drug Administration
refused to approve sulfinpyrazone (Anturane®, CIBA,
Summit, New Jersey) as a medication to reduce mor-
tality in the fast 6 months following myocardial in-
farction (14). The refusal was based in part on the
results of a clinical trial that included the repeated
analysis of accumulated data. No procedure was used
to control for the effect of multiplicity and the validity
of the results was open to question.
The probability of obtaining statistically significant
results from two independent tests that address the
same research question can be obtained by multiplying
the individual probabilities that each test will produce
a significant result. For p = 0.05, the probability that
both tests will be statistically significant is 0.05 x
0.05 = 0.0025. The probability that neither result will
be significant is 0.95 × 0.95 = 0.9025. The probabil-
ity that at least one of the two test results will be
statistically significant is 1 - 0.9025, or 0.0975. Thus,
the probability of incorrectly deciding that the mem-
bers of either one or both pairs of means are unequal
using just two tests is nearly twice the probability of
making the same error for a single test (0.0975 vs.
0.05). If we add a third comparison, the probability
that none of the three tests will be significant is 0.95 ×
0.95 X 0.95 = 0.8574, so the probability that at least
one test will be significant is about 14 percent or
nearly three times the 0.05 level. As the number of
independent statistical tests increases, the probability
becomes much larger than 0.05, the original alpha (see
table 1).
In trials wh~re multiple dependent variables are used,
the obvious soludon to control or reduce experiment-
wise error is to use some form of multivariate analysis,
Multivariate procedures such as Hotelling's T2, ,disclqA'rli-
nant function analysis, and logistic regression offer via-
ble alternatives to traditional tmivariate approaches when
multiple dependent variables are present. These proce-
dures have been described by public health and epide-
miologic reseamhers and are beyond the scope of this
paper (15, 16).
In some instances, the best solution may be to re-
duce the per comparison significance level to a more
stringent criterion. The Bonferroni adjustment pro-
vides a widely advocated procedure to achieve this
goal. The Bonferroni inequality involves dividing the
alpha level desired for the overall family of statistical
tests (usually 0.05) by the number of statistical com-
parisons to be conducted. If two groups are compared
on five separate dependent measures, each statistical
comparison would be evaluated at 0.05/5 = 0.01. The
Bonferroni method controls the type 1 error rate for
each decision and maintains the selected alpha level
(e.g., 0.05). for all the tests conducted in the investi-
gation. The limitation of the Bonferroni method is that
as the probability of making a type 1 error is de-.
creased, the chance of committing a type 2 error is
increased. Silverstein (17) demonstrated that when
more than a small number of comparisons (say, five to
eight) are included in a study, the Bonferroni proce-
dure results in a dramatic loss in statistical power.
Benjamini and Hochberg (18) have recendy described
alternatives to the Bonferroni adjustment that do not
result iri substantial reduction in statistical sensitivity..
The Bonferroni and other p value adjustment methods,
however, are viewed as too conservative by some
Am J EDiderniol Vol. t47, No. 7, 1998

Quantitative Evaluation of Multiplicity 619
investigators (17). Levin noted that researchers who
ar~ reluctant to use conservative correction methods
such as the Bonferroni adjustment "will want to ex=
plore some newer techniques.., in which less strin=
gent but still interesting criteria replace the familywise
error rate criterion" (1, p. 629). Procedures such as the
percent error rate do not directly control type I error,
but they do provide the investigator (and reader) with
valuable information concerning the possible presence
of a type 1 error in a family of statistical tests.
Determining the experiment-wise error rate for a
"family" of statistical procedures can be a complex
task. In this study, the statistical test was the unit of
analysis and no distinction was made among statistical
procedures within a study versus those between stud-
ies. Statistical tests conducted within a study generally
use data from the same sample and are, therefore,
assumed to be more related than statistical tests from
different investigations (or samples). It is possible,
however, that two different samples may be included
in one research report, or that a single research article
might include the results of more than one investiga-
tion. An argument could be made that the family-wise
error rate should be determined based on statistical
tests that address the same research question across
multiple investigations, or even across the lifetime of
an investigator working in a particular area (12). The
individual statistical test was the unit for determining
the experiment-wise error rate in this study. Other
units are possible, for example, the study sample, the
research report, the research question, or even the
investigator. How the different units of analysis effect
the experiment-wise error rate for a "family" of sta-
tistical tests is a question that can only be answered by
additional research.
Technical or statistical solutions to the problem of
multiplicity in cpidemiologic research should not ob-
scure a more fundamental scientific principle. There is
a continuing need in health-related research to formu-
late concise research questions and hypotheses before
the collection and analysis of data. Stati.stical hypoth-
esis testing is necessarily an empirical compromise
between claiming too much and suggesting too little.
Public health and epidemiologic researchers must pro-
spectively define research questions and hypotheses as
succinctly as possible and interpret the results using an
alpha level appropriate to the extent of multiple test-
ing. Knowledge of experiment-wise err6r procedures
can help achieve this goal.
ACKNOWLEDGMENTS
This research was partially supported by grant no. MCJ-
360646-010 from the US Department of Health and Human
Services, Bureau of Maternal and Child Health.
REFERENCES
I. Levin B. Annotation: on Holm. Simes, and Hochberg multiple
test procedures. (Comment). Am J Public Health I996;86:
628-9.
2. Holm S. A simple sequentially rejective multiple test proce-
dure. Scand J Star 1979;6:65-70.
3. Aickin M, Gensler H. Adjusting for muldple testing when
repotting research results: the Bonferroni vs. HoLm methods.
Am J Public Health 1996;86:726-8.
4. Wa~ JI-L Most~ll~r F. Ingelfinger JA. P-values. In: Bailar JC
RI. Mostcllex F, cots. Medical uses of statistics. Waltham. MA:
NEJM Books, 1986:179-~77.
5. Godfrey K. Statistics in practice. Comparing the means of
several groups. N Engl J Med 1985;313:1450-6.
6. Tuk~y JW. Some thoughts on clinical trials, especially prob-
lems of multiplicity. Science 1977:198:679-84.
7. Savitz DA. Oishan AF. Muldpte comparisons and related
issues in the interpretation of epidemiologic data. Am J Epi-
demiol 1995:142:904-8..
8. Abt K. Problems of repeated significance tesdng. Control Clin
Trials 1981:1:377-81.
9. Cupples LA, Heeren T, Schatzldn A. et al. Multiple testing of
hypotheses in comparing two groups. Ann Intern Med 1984;
100:122-9.
Thomas DC, Siemiatycki J. Dewar R. et al. The problem of
muldple inference in studies designed to generate hypotheses.
Am J Epidemiol 1985;122:1080-95.
Strahan Pal:. Multivariate analysis and problems of type I
error..I Court Psych 1982:29:1"~5-9.
Ryan TA. Muldple comparisons in psychological research.
Psych Bull 1959:56:26-47. -
Shrout PE, Fleiss JL Intraclass correlations: us~'s in assessing
rater reliability. Psych Bull 1979;86:420-8.
Anonymous. Sulfinpyrazone in the prevention of sudden death
after myocardial infarction. The Anturane Reinfarction Trial
Research Group. N Engl J Med 1980;302:9_50-6.
Bray J'H, Maxwell SE. Multivariate analysis of variance. Bev-
erly Hills, CA: Sage Publications, 1985.
Altman 13t3. Practical statistics for medical research. New
York: Chapman & Hall, 1991.
Silverstein AB. Power lost and statistical power regained. The
Bonferroni procedure in exploratory research. Educ Psych
Meas 1986:46:303-7.
Benjamini Y, Hochbe~ Y. Controlling the false discovery
rate: a practical and powerful approach to multiple testing. J R
Smt SOC [B] 1995:57:289-300.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Am J Epiderniol Vot. 1~,7, No. 7, 1998
