Philip Morris
Is Meta-Analysis A Valid Approach to the Evaluation of Small Effects in Observational Studies?
Fields
- Author
- Shapiro, S.
- Type
- PSCI, PUBLICATION SCIENTIFIC
- BIBL, BIBLIOGRAPHY
- Area
- CARCHMAN,RICHARD/OFFICE
- Litigation
- Iwoh/Produced
- Characteristic
- EXTR, EXTRA
- MARG, MARGINALIA
- Site
- R530
- Named Organization
- Boston Univ
- Author (Organization)
- Boston Univ
- Elsevier Science
- J Clin Epidemiol
- Elsevier Science
- Named Person
- Shapiro, S.
- Master ID
- 2063633034/3485
Related Documents:- 2063633034-3485 Book 6 Tabs 1 - 39
- 2063633036-3041 Environment and Cancer: Who Are Susceptible ?
- 2063633043-3050 Risk Factors for Primary Lung Cancer Among Non-Smoking Women in Taiwan
- 2063633052-3058 Air Pollution and Respiratory Health Among Children with Asthmatic or Cough Symptoms
- 2063633060-3067 Human Cancer Syndromes: Clues to the Origin and Nature of Cancer
- 2063633069-3073 Genetic Testing for Cancer Risk
- 2063633075-3080 Oncogenic Transcription Factors in the Human Acute Leukemias
- 2063633082-3086 Nucleic Acid-Based Methods of the Detection of Cancer
- 2063633088-3093 Original Paper Vegetable and Fruit Intake and the Risk of Lung Cancer in Women in Ain Barcelona, Spain
- 2063633095-3098 P53 Mutations in Human Head and Neck Cancer Cell Lines
- 2063633100-3109 People, Places and Coronary Heart Disease Risk Factors: A Multilevel Analysis of the Scottish Heart Health Study Archive
- 2063633111-3116 Sex Differences in Up-Regulation of Nicotinic Acetylcholine Receptors in Rat Brain
- 2063633118-3125 Risk Factors and Sex Differential in Coronary Artery Disease
- 2063633127-3135 the Causes and Prevention of Cancer Gaining Perspective
- 2063633137-3141 Socioeconomic Status, Number of Siblings, and Respiratory Infections in Early Life As Determinants of Atopy in Children
- 2063633143-3153 Biomonitoring Exposure to Environmental Tobacco Smoke (Ets) : A Critical Reappraisal
- 2063633249-3258 A Case-Control Study of Cytochrome P450 1a1, Glutathione S-Transferase M1, Cigarette Smoking and Lung Cancer Susceptibility (Massachusetts, United States)
- 2063633268-3277 Childhood Asthma in Four Regions in Scandinavia: Risk Factors and Avoidance Effects
- 2063633279-3291 Lung Cancer
- 2063633293-3303 National Incidence of Smoking and Misclassification Among the U.S. Married Female Population
- 2063633305-3311 Fatty Foods and the Risk of Lung Cancer: A Case-Control Study From Uruguay
- 2063633313-3351 Tobacco Smoking
- 2063633353-3362 Smoking and Lung Cancer: Risk As A Function of Cigarette Tar Content
- 2063633364-3372 Tar Content of Cigarettes in Relation to Lung Cancer
- 2063633374-3378 Comments on : Law, M.R. Et Al., (970000) << Environmental Tobacco Smoke Exposure and Ischaemic Heart Disease: An Evaluation of the Evidence. >> Bmj, 970000, 315(7114) :980
- 2063633379
- 2063633380-3381 Comments on the Paper: 'environmental Tobacco Smoke Exposure and Ischaemic Heart Disease: An Evaluation of the Evidence'
- 2063633382-3389 'secondhand Cigarette Smoke Affects Blood Platelets, in A Way Which Increases the Likelihood of A Thrombus.' (Page 10)
- 2063633390-3392 Stanton Glantz Claims
- 2063633393-3425 'environmental Tobacco Smoke Exposure and Ischaemic Heart Disease: An Evaluation of the Evidence'
- 2063633426-3433 Environmental Tobacco Smoke Exposure and Ischaemic Heart Disease: An Evaluation of the Evidence. The Accumulated Evidence on Lung Cancer and Environmental Tobacco Smoke
- 2063633435-3471 Placental Toxicology
- 2063633472-3474 Placental Toxicology
- 2063633476-3484 Lung Carcinoma Trends by Histologic Type in Vaud and Neuchatel, Switzerland, 740000 - 790000
- Date Loaded
- 07 Jun 1999
Document Images
"l C|in Epi~!emio| Vo|. 50, No. 3, pp. 2Z3-229, 1997
Copyright © 1997 Elsevier Science Inc.
ELSEVIER
COMMENTARY
0895-4356/97/$17.00
Pll S0895-4356(96)00360.5
Editors' Note: This paper was presented at a meeting on small risks, sponsored by the Robert Koch
Institute,
and held at Potsdam, Germany in October 1995. It is presented here, in slightly shortened form, with
permission.
It will appear, together with a rebuttal and a discussion, in a book: Epidemiologic Practices in
Assessing Small
Effects. Proceedings. Robert Koch Institute (in press).
-- A.R.F.
Is Meta-Analysis a Valid Approach to the Evaluation
of Small Effects in Observational Studies?
Samuel Shapiro*
SLONE EPIDEMIOLOOY UNIT, BOSTON UNIVERSITY SCHOOL OF MEDICINE, BROOKLINE, MASSACHUSE'~S 02146
The Robert Koch Institute is a uniquely appropriate setting
for the consideration of causality as applied to small risks,
since Koch was the first person to develop a set of causal
criteria. His purpose was to apply them to infections, and
it was his pioneering efforts in that application that subse-
quently stimulated Hill, Lilienfeld, and Susser [1], among
others, to propose corresponding criteria for application to
epidemiologic research.
One important criterion of causality is the strength of any
given association. In any reasonably well conducted study, a
weak association may be due to confounding or bias, but
it is unlikely that a strong association can be completely
explained away by defects in study design. That point is
critical to the topic of meta-analysis: when associations are
strong (as, say, with smoking and lung cancer), there is no
need to resort to it. It is when associations are weak that
meta-analysts are tempted to combine studies, in the erro-
neous belief that the statistical significance thereby accom-
plished translates to causality.
At the request of Dr. Ernst Wynder, one of the organizers
of the meeting, some of the material that I will consider
today has been published before, but I will again refer to it
briefly in order to systematically cover the general theme.
I would like to commence by referring to a disastrous error
in which I wa~ a participant at a relatively young epidemio-
logical age [2]. That error has influenced my attitude to the
process of inferring causality from nonexpe'rimental data
ever since. In 1974 the first study in Fig. 1 stimulated the
hypothesis that rauwolfia alkaloids increase the risk of breast
cancer [2]. It was published back to back with two other
studies [3,4] in a single issue of the Lancet. All three studies
were acknowledged at the time to have methodological lim-
itations, but based on the total evidence, the respective au-
thors felt that the defects in the individual studies, as it
"Address for correspondence: Samuel Shapiro, Professor of Epidemiology
and Director of Slone Epidemiology Unit, Boston University School of
Medicine, 1371 Beacon Street, Brooktine, Massachusetts 02146.
Accepted for publication on 19 September 1996.
were, "canceled each other out." The three studies, in their
turn, stimulated a cascade of additional studies [5].
In 1979 the International Agency for Research on Can-
cer reviewed 15 studies, [5] (Fig. 1), and concluded that
the evidence from the better conducted ones was against an
increase in the risk of breast cancer attributable to rauwolfia.
Finally, yet another study [6] produced statistically stable
data that showed no increase in the risk of breast cancer
associated with rauwolfia use: the relative risk was 0.9, and
the study was large enough to set an upper 95% confidence
limit of 1.2.
The hypothesis-raising study [2] was based on 150 cases
of breast cancer, and the final, null study [6] was based on
1816 cases. In their essentials the methods were the same
in both studies [2,6]. There never were adequate grounds to
propose the hypothesis in the first place, and the original
association was probably due to chance, even though it was
statistically significant. Yet as one scans the data in Fig. 1,
it is a reasonable bet that had a meta-analysis been under-
taken at the time, it would have produced a positive overall
association that could well have been labeled as "causal."
Fortunately, when these studies were published the tech-
nique of meta-analysis had not yet been applied to medical
data.
These thoughts occurred to me when the late Tom Chal-
mers and his group published one of the early meta-analyses
of randomized controlled trials (RCTs). They suggested that
anticoagulant therapy improves the prognosis of myocardial
infarction [7]---a conclusion that could not confidently
have been reached simply from an assessment of the data
in any individual study. I suggested to Chalmers that the
new methodology might be of some use in combining infor-
mation from RCTs, but that it could not be used as a valid
tool in nonexperimental research because of uncontrollable
confounding and bias. He disagreed, and soon moved on to
the meta-analysis of nonexperimental data as well [8].
Meta-analysis rapidly achieved considerable popularitym
so much so that it is now unusual to open a medical journal

224
S. Shapir!
~;o~-Control Studies
Boston
Bristol
Helsinki
Los Ar~eles
• Rochester
Rockland
Baltimore
England 8. Wales
Finland
Berlin
Scotland 8. England
Oakland
USA
~F--~ 49
Cohort Studies
Olmstead County
Kaiser-Permanente
I 2 ~ 4
Re~olive Risk
FIGURE l. Risk of breast cancer in relation to reserpine use:
rehtive risk estimates and 95% confidence intervals in 13
case-control and 2 cohort studies. Confidence intervals are
not provided if they were not published or could not be esti-
mated from the published data. Some studies estimated more
than one relative risk. (Reproduced with permission from
[10].)
that does not contain one. Today we can further broadly
divide meta-analyses into four types: the meta-analysis of
published nonexperimental data, of"raw" nonexperimental
data (sometimes the preferred terminology is "combined
analysis," "pooled analysis," "collaborative analysis" or
"overview"), of published RCTs, and of "raw" data from
RCTs.
In the critique that follows I will not consider the meta-
analysis of RCTs. RCTs are only exceptiona[[y designed to
identify causes of disease; because of randomization, con-
founding is relatively unlikely in any individual RCT, and
exceedingly unlikely in a meta-analysis; and in well de-
TABLE 1. Water chlorination and cancer'; meta-analysis of
12 studies
Site RR (95% CI)
All 1.15 (1.09-1.20)
Bladder 1.21 (1.09-1.34)
Rectum 1.38 (I.01-1.87)
"See reference 13.
signed RCTs bias can either be eliminated, or else con-
trolled much more rigorously than is possible in nonexperi-
mental research. There are nevertheless problems in the
meta-analysis of RCTs (for example, variable quality, gener-
alizability), but the subject is large and complex, and it is
best dealt with as a separate topic.
One of the first meta-analyses of published nonexperi-
mental research was an evaluation of breast cancer risk in
relation to alcohol consumption [8]: there were statistically
significant summary relative risk estimates of 1.4 for the
case-control studies, and 1.7 for the cohort studies--both
of them "small" risks. Elsewhere I have reviewed the find-
ings of that meta-analysis, and of a subsequent updated one
[9]. Here I will simply summarize the main defects. There
was misclassification and variable definition of alcohol con-
sumption among the studies; there was misclassification of
the timing of intake and of the quantity consumed; multiple
sources of information and selection bias were likely, and
quite possibly in the same direction, across the studies; a
spurious "quality score" was used to assess the individual'
studies; and multiple sources of confounding were present
in all the studies. Particularly with regard to the latter possi-
bility, the determinants of the changing incidence of breast
cancer are still largely unknown; many of the determinants
of alcohol consumption are still largely unknown, different
in different cultures, in each of which they are also chang-
ing, and difficult or impossible to measure. Yet despite such
considerations, the authors suggested that the relative risk
estimates derived from their meta-analysis were consistent
with causality.
When my critique of the alcohol/breast cancer meta-
analysis was published [10], one of the authors acknowl-
edged that mistakes had .been made, but he argued that the
technique was then still in its infancy, and that the method-
ology had since improved [11,12]. That claim is tested in
the next e~ample, a meta-analysis of 12 studies of cancer
risk in relation to water chlorination [13], again carried out
by Chalmers' group, several years later. The following were
the main results (Table 1): for all the cancer sites studied
the relative risk estimate was 1.15, and statistically signifi-
cant; for bladder cancer it was 1.21; for rectal cancer it was
1.38. Applying these estimates to United States incidence
rates, the authors claimed that at least 4200 cases of bladder
cancer per year and 6,500 cases of rectal cancer per year are
associated with water chlorination.

Meta-Analysis and Evaluating Small Effects in Observational Studies
225
TABLE 2. Water chlorination and cancer:, quality scores for
sp__ecRic criteria"
Studies
No. of criteria complying (%)
Criteria evaluated Low Median
Selection 6 40 75
Adjustment for 7 10 50
confounding
Exposure assessment 6 0 50
Data analysis 6 10 55
All criteria 25 0 55
•See reference 13.
In this meta-analysis an attempt was again made to quan-
tify the quality of the individual studies: "... studies were
scored on the basis of selection of subjects, measurement of
and adjustment for confounding variables, exposure assess-
ment, and statistical analysis. The overall quality score was
calculated from the three subscores: a general methods
score, a data analysis score, and an exposure assessment
score. Each subscore was calculated as the percentage of ap-
plicable quality criteria that were met in each study ....
The cumulative quality score was a weighted average of the
three scores, with both general methods and exposure as-
sessment receiving twice the weight of the data analysis
score."
Such language is, I believe, sufficiently dense and arbi-
trary to confirm what we all know to be true: that quality
cannot be scored, measured, and taken into account. More-
over, who are these meta-ana[ysts, sitting on high, to decide
for the rest of us what is and is not good quality, and then
to measure it? Quality is best evaluated qualitatively: as op-
posed to meta-analysis, in any adequate qualitative review,
we require that the author should give reasons for judging
the quality of any given study as good or bad in transparent
and easily comprehensible language. It is then up to the
reader to decide whether he agrees or disagrees.
It is nevertheless instructive to examine the results of the
quality assessment (Table 2). A total of 25 criteria, classified
under four major headings (selection, confounding, expo-
sure assessment, and data analysis) were used to derive qual-
ity scores (6 criteria were applied to the selection of sub-
jects, 7 to adjustment for confounding, and so on--Table
2). Among the 12 studies the median compliance with all
25 criteria was only 55%, and it was as low, or lower, for
confounding, exposure assessment, and data analysis. In ad-
dition, the lowest values under all four headings ranged from
zero to 40%. In other words, by the authors' own standards,
the data were unsatisfactorymso unsatisfactory that the
only sensible thing to do, surely, would have been to aban-
don the enterprise.
But it was not even necessary to undertake the impossible
task of quantifying the quality of the individual studies, be-
cause there was clear evidence of bias in the presented data:
TABLE 3. Water chlorination and cancer:, relative risks for
gastrointestinal and bladder cancer deaths in two studies
Alavanja" Brennimans
(n = 3446) (n --- 5208)
Bladder 1.69" 0.98
Colon 1.61~ 1.11
Colorectal 1.71' 1.13'
Esophagus 2.12" 0.97
Liver -- 1.00
Pancreas 1.97~ 1.02
Rectum 1.9Y 1.22
[Lung] 1.79¢ m
~See reference 14.
~See reference 15.
cLower 95% confidence interval excludes 1.0.
those studies that reported elevated risks tended to report
elevations for all cancers studied, and not for specific can-
cers. To illustrate that point, in Table 3 two of thd largest
studies are compared [14,15]. Alavanja eta/. [14] studied 7
cancer sites, and found elevated risks associated with chlo-
rine exposure for all of them, ranging from 1.61 to 2.12. By
contrast, Brenniman eta/. [15] (who also studied 7 cancers,
6 of which overlapped with those studied by Alavanja eta/.
[15]) found relative risks in the range of 0.97 to 1.22, all
but one of them (a point estimate of 1.13 for colorectal
cancer) compatible with unity. With the possible exception
of x-rays, there is no known carcinogen that increases the
risk of cancer at all the sites listed in Table 3. Nor is it likely
that there can be: the epidemiologic characteristics of the
various tumor sites are markedly different, and there is no
biologic plausibility for a universal carcinogen that would
increase the risk across the board. By that standard, the
study of Alavanja et al. [14], which made a major contribu-
tion to the overall findings in the recta-analysis, was clearly
biased.
Based on this example, the claim [11] that the recta-
analysis of published studies has emerged from its infancy,
and improved its methods, cannot be defended.
I turn next to the recta-analysis (or combined analysis)
of "raw" data. The conceptual argument for such a proce-
dure is that the published information may not be sufficient
to conduct a valid recta-analysis; that it is commonly neces-
sary to recode and otherwise resort variables across studies
to make them compatible, and to make other adjustments,
before we are able to properly synthesize information. That
objective cannot be accomplished by a recta-analysis con-
fined to the published material.
One of the first major exercises along these lines was con-
ducted by Howe eta/. [16], who initially attempted to exam-
ine 14 case-control studies of breast cancer risk in relation
to dietary fat intake. Two studies had to be excluded be-
cause the authors declined to collaborate (which, inciden-
tally, casts doubt on the validity of the meta.analysis, ab
initio). Depending on the specific dietary element under
2063633262 :~

226
S. Shapiro
TABLE 4. Dietary fat and breast cancer"
No. of Relative risk
studies Q5 vs. Q1b p trend
Total fat (g/day) 8 1.46 0.0002
Saturated fat (g/day) 9 1.57 <0.0001
Vitamin C (rag/day) 8 0.86 0.031
"See reference 16. SQuintiles.
study, 3 to 4 of the remaining studies were then excluded
because they exhibited markedly heterogeneous effects, so
that in the end, only 8 to 9 of the original 14 studies were
recta-analyzed.
The principal findings were as follows (Table 4): the rela-
tive risk estimate for the highest quintile of total fat intake
was 1.46, and for saturated fat it was 1.57. The associations
were statistically significant. The main conclusion was that
from a combined analysis of international data a high intake
of fats, saturated fats in particular, appears to increase the
risk of breast cancer by some 1.5-f.old.
Now, based on ecological and other evidence it is reason-
able to propose that dietary fats may indeed increase breast
cancer risk, but the question here is whether the recta-
analysis meaningfully tested that hypothesis. The following
is a partial list of some of the problems that were present
in this study:
Dietary fat intake is notoriously difficult to record in
interview-based studies, with correlation coefficients rela-
tive to "gold standard" measurements (such as from pro-
spectively recorded food diaries) usually of the order of 0.3
or less, and seldom better than 0.5 [17]. In this instance,
the misclassification thereby introduced was further com-
pounded by having to combine heterogeneous question-
naire data. For example, some of the data were based on a
24-hr dietary recall instrument, and others on periods cov-
ering as much as two weeks; some questionnaires covered
as few as 22 food items, others as many as 80. Such misclassi-
fication could easily have facilitated the occurrence of infor-
mation bias. The hypothesis was widely broadcast, and dif-
ferential reporting of fat intake could have been ubiquitous
across the studies. Probably there was also uncontrollable
confounding. For example, fat intake may have been simi-
larly associated with socioeconomic status in the various
studies.
If such considerations were not sufficiently daunting, the
investigators simply dealt with heterogeneity by means of
circular reasoning: they made the remarkable claim that the
relative risk estimates among the studies were homogeneous
[17]. Of course they were: they had to be, since those studies
that exhibited heterogeneity in the first place were ex-
cluded.
(Author's note: Since the Potsdam meeting, a further
meta-anal~ysis of breast cancer risk in relation to fat intake
that included seven follow-up studies has been published,
with null results [18]. Here we are confronted, not for the
first time, by the ultimate irony of two conflicting recta-
analyses, neither of which enlighten us as to whether satu-
rated fats do or do not influence breast cancer risk.)
There has been yet a further refinement to the idea of
meta-analyzing "raw" data. The argument has been ad-
vanced that that objective can only truly be achieved if the
original investigators are themselves involved in the nuts
and bolts of the enterprise. My next example is a combined
analysis of 12 case-control studies [19,20] of ovarian cancer
in which this was done. The recta-analysis (or collaborative
analysis, the designation preferred by the authors) was car-
ried out over a four-year period by one group, who met each
year for several days together with representatives from the
original 12 studies, all of them experienced epidemiologists.
The latter were intimately involved with the manipulations
to which their data were subjected, with the analysis, and
with the interpretation of the results.
The collaborative analysis excluded four American stud-
ies [21-24] on the grounds that they did not involve per-
sonal interviews [21-23], or that the data were not available
in computer-retrievable form [24] (which, incidentally, are
not valid reasons), and three European studies that were
meta-analyzed and published separately [25] (which, inci-
dentally, is also not valid).
Some of the findings from the American collaborative
analysis documented the obvious, for example, the high risk
associated with nulliparity, and the protective effect of in-
creasing parity. Those associations were so strong, of course,
that they were fully demonstrable in each of the individual
studies. For that purpose no meta-analysis was required. A
further claim made for the collaborative analysis, however,
was that it was possible to evaluate the separate effects of
nulliparity, as opposed to infertility, as independent risk fac-
tors for ovarian cancer. That claim is questionable, since
infertility and nulliparity were so closely commingled and
correlated that it is questionable whether any distinction
that was observed was biologically meaningful. Elucidation
of the question of whether infertility, independently of low
parity, is a risk factor for ovarian cancer would require a
study designed along lines radically different from the case-
control studies included in the collaborative analysis.
A further claim was that analysis of the risks according to
whether hospital or community controls were used yielded
somewhat different results. The implied claim was that the
studies that enrolled community controls were superior to
those that enrolled hospital controls. If so, why were the
inferior studies included? By contrast, of course, in a qualita-
tive review, the authors could have considered the validity
of the control selection in each individual study.
In short, this most "ideal" of recta-analyses yielded no
new information regarding parity, a well documented and
powerful risk factor. For that factor we are not in the domain
of small risks, and a recta-analysis was not needed in the

Meta-Analysis and Evaluating Small Effects in Observational Studies
227
TABLE 5. Use of fertility drugs in relation to risk of epithelial
ovarian cancer"
Number Number
of cases of exposed RR (95% CI)
Invasive
Nulligravid 88 12 27 (2.3-316)
Gravid 538 8 1.4 (0.5-3.6)
Total 626 20 2.8 (1.3-6.1)
Low malignant
potential
Total P P 4.0 (1.1-13.9)
"See references 19 and 20.
~Numbers not given.
TABLE 6. Fertility drugs used by cases of ovarian cancer"
Drug n
C[omiphene
Estrogens
Estrogen and progestogen
Thyroid hormone
Dextroamphetamine and amobarbital
Unknown (invasive cancer)
Unknown (low malignant potential)
l
4
1
2
1
16~
"Data provided to the author by the authors of references 26, 27, and 28.
~Numher is approximate.
CNumber unknown.
2063633264 i
first place. Nor was the meta-analysis able to sort out the
separate risks related to parity and infertility.
Setting those matters aside, however, a major claim made
to justify the collaborative analysis was that it uncovered
one new and important association that would not other-
wise have been discovered in the individual studies (Table
5). Three of the 12 studies [26-28] had recorded informa-
tion on the receipt of fertility drugs: in the combined data
[19] the relative risk for epithelial ovarian cancer was sig-
nificandy elevated at 2.8; in the subgroup of nu|ligravidae
it was 27; among gravidae, it was 1.4 and nonsignificant.
Some of the cancers were classified as tumors of low malig-
nant potential. They were analyzed separately [20]: the rela-
tive risk for those tumors was 4.0, and statistically signifi-
cant.
The authors interpreted the associations as supporting
the hypothesis that fertility drugs increase the risk of ovar-
ian cancer. Yet, it is readily demonstrable that whatever did
account for the findings, it was not the use of fertility drugs:
as expected, well over 50% of the cases were over 50 years of
age when they contracted ovarian cancer in the late 1970s;
infertile women would thus have sought medical help at the
age when they needed it, mostly in the early 1960s, or ear-
lier. At that time fertility drugs either did not yet exist in
the United States (e.g., clomiphene), or they had only re-
cently been experimented with on a small scale in prelimi-
nary and exploratory studies (e.g., menotropins) [29]: it was
only after 1966, when clomiphene became available, that
the large-scale use of ovulation-inducing drugs commenced.
As a matter of simple arithmetic, only an odd case or two
that occurred at an unusually young age could have been
exposed to fertility ckugs.
That arithmetic was confirmed in subsequently published
data [30] from the three studies (Table.6): in the combined
series of invasive cancer and tumors of low malignant poten-
tial, there was only one case known to haye been exposed
to a fertility drug (clomiphene). That patient was a 30-year-
old woman who received clomiphene a short time before
she was diagnosed, as having ovarian cancer. The strong
likelihood is that the cancer or a precursor lesion "caused"
the infertility, and thus its treatment, not the reverse. That
single case apart, the remaining drugs that were identified
were not fertility drugs. Most of the drugs were unknown,
in large part because in one study [28] the names of the
specific drugs were not recorded. However, the only defensi-
ble assumption that can he made about the unknown drugs,
based on distribution of the known exposure shown in Ta-
ble 6, and on the timing of the exposures, is that they were
not fertility drugs.
Now, there are good experimental and clinical grounds
to suggest that fertility drugs may indeed increase the risk
of ovarian cancer [31]. The point made here, however, is
that the combined analysis made no contribution to answer-
ing that question.
In summary, this most sophisticated of combined analyses
produced only one new finding, and a demonstrably false
one. How could such an obvious and avoidable error have
been committed by an experienced group of investigators?
Part of the answer must be that even when investigators
actively collaborate in a meta-analysis under the most opti-
mal conditions, it is impossible for them to be as immersed
in the data as they would be if they were engaged in the
analysis of their own studies. The error demonstrates yet
another defect of meta-analysis: by definition, such an un-
dertaking is conducted at one or more removes from the
original data. The greater the number of removes, the
greater is the likelihood of error, due simply to an increasing
lack of familiarity with the intricacies of the study material.
My final example has been selected to examine a general
theme that pervades the meta-analytic literature. The argu-
ment is as follows: in the meta-analysis of a large number
of reasonably well conducted studies, bias and confounding
should, in the aggregate, tend to "cancel each other out"--
as has been stated or implied in some of the studies reviewed
above. That argument tends to be made most explicitly for
confounding; and when it is applied to RCTs, it is undoubt-
edly true. For the argument to hold true in the domain of
nonexperimental research, however, the very large and du-
bious assumption must be made that the right studies, with
the right weights, in the right directions, are present. Other-
wise the "canceling out" will not occur. Even if it is assumed

228
that there is no bias, and that uncontrolled confounding is
the only issue, there can be no reassurance that the "cancel-
ing out" will occur, since the same confounder may be
shared by more than one study.
That point has been quantitatively illustrated by Post-
huma et al. [32] who carried out a meta-analysis of studies
of estrogen use in relation to the occurrence either of car-
diovascular disease or cancer. The summary relative risk es-
timates for the two outcomes, respectively, were 0.57 and
0.83. What was informative about this meta-analysis, how-
ever, was that those studies that showed the greatest reduc-
tions in cardiovascular risk also showed the greatest reduc-
tions in total cancer risk.
Posthuma et d. [32] suggested that this correlation may
reflect a "healthy woman effect," in which the women at
lowest risk for both cardiovascular disease and cancer were
the ones who tended to take estrogens. The correlation
could also be explained by other shared confounders, such
as life style. Since it is clear that estrogen itself cannot itself
reduce the total risk of cancer, the spurious reduction in
cancer risk that correlated with the reduction in the cardio-
vascular risk can only be explained by shared confounders,
mostly in the same direction, across the studies. (Inciden-
tally, the findings of Posthuma et a/. also raise questions
about the magnitude of the cardiovascular risk reduction,
but that matter is beyond the scope of this presentation.)
What is likely to he the future of recta-analysis? It appears
that it is unlikely to go away, and for that reason some epi-
demiologists have argued that, rather than oppose it, a bet-
ter approach might be to try to contain its excesses [33]. I
disagree. [ think there is something profoundly amiss in the
uncritical way in which epidemiologists, and indeed the
medical profession as a whole, have allowed themselves
to be seduced by the numerological abracadabra of meta-
analysis. Perhaps the technique will succumb to its own ab-
surdity, but if not, the next step in this surrealistic evolution
will be the meta-analysis of meta-analyses, in which the
meta-analyst will be totally divorced from reality, and to-
tally surrounded by numbers without context. If anyone in
this audience believes that development is far off, he should
familiarize himself with the latest fashion of so-called "evi-
dence-based medicine" and "systematic review," now play-
ing on the Internet [34]."
I would like to conclude by quoting Alvan Feinstein [35].
Feinstein and I have had our differences from time to time,
but in this instance we are in total agreement: "the meta-
analysis of non-randomized observational studies resembles
the attempt ofa quadriplegic person to climb Mount Everest
unaided."
References
1. Susser M. What is a cause and how do we know one? A gram-
mar for pragmatic epidemiology. Am J Epidemiol 1991; 133:
635-648.
2. Boston Collaborative Drug Surveillance Program. Reserpine
and breast cancer. Lancet 1974; 2: 669-671.
3. Armstrong B, Stevens N, Doll R. Retrospective study of the
association between use of rauwolfia derivatives and breast
cancer in English women. Lancet 1974; 2: 672-675.
4. Heinonen OP, Shapiro S, Tuominen L, eta/. Reserpine use
in relation to breast cancer. Lancet 1974; 2: 675-677.
5. World Health Organization. Some pharmaceutical drugs.
(IARC monographs on the evaluation of the carcinogenic risk
of chemicals to man, Vol. 24.) Lyon, France: lntemational
Agency for Research on Cancer (distributed by WHO Publi-
cations Centre USA, Albany, NY); 1980: 211-241.
6. Shapiro S, Parsells JL, Rosenberg L, eta/. Risk of breast cancer
in relation to the use of rauwolfia alkaloids. Eur J Clln Phar-
macol 1984; 26: 143-146.
7. Chalmers TC, Matta RJ, Smith H Jr, Kunzler A~-M. Evidence
favoring the use of anticoagulants in the hospital phase of
acute myocardial infarction. N Engl J Med 1977; 297: 1091-
1096.
8. Longnecker MP, Berlin JA, Orza MJ, eta/. A meta-analysis
of alcohol consumption in relation to risk of breast cancer.
JAMA 1988; 260: 652-656.
9. Longnecker MP. Alcoholic beverage consumption in relation
to risk of breast cancer: Meta-analysis and review. Cancer
Causes and Control 1994; 5: 73-82.
10. Shapiro S. Meta-analysis/shmeta-analysis. Am J Epldemiol
1994; 140: 771-778.
11. Longnecker MP. Re: "Point/counterpoint: Meta-analysis of
observational studies." Am J Epidemlol 1995; 142: 799-800.
12. Shapiro S. Dr. Shapiro replies. Am J Epidemiol 1995; 142:
780-781.
13. Morris RD, Audet A-M, Angelillo IF, Chalmers TC, Mos-
teller F. Chlorination, ch!orination by-products and cancer:.
A meta-analysis. Am J Path Health 1992; 82: 955-963.
14. Alavanja M, Goddstein I, Susser M. A case-control study of
gastrointestinal and urinary tract cancer mortality and drink-
ing water chlorination. In: Tolley RL, Gorcher H, Hamilton
DH Jr, Eds. Water Chlorination: Environmental Impact and
Health Effects. 2nd ed. Ann Arbor, MI: Ann Arbor Science
Publishers; 1978: 395-409.
15. Brenniman GL, Vasilomanolakis-Lagos J, Amrel J, Tsukasa
M, Wolff AH. Case-control of cancer deaths in Illinois com-
munities served by chlorinated or non-chlorinated water. In:
Tolley RL, Brungs WA, Cumming RL, eta/., Eds. Water Chlo-
rination: Environmental Impact and Health Effects. 3rd ed.
Ann Arbor, MI: Ann Arbor Scientific Publishers, 1980:
1043-1057.
16. Howe GR, Hirohata T, Hislop G, eta/. Review. Dietary factors
and risk of breast cancer: Combined analysis of 12 case-
control studies. J Nail Cancer Inst 1990; 82: 561-569.
17. Shapiro S. Do tram fatty acids increase the risk of coronary
heart disease? A critique of the epidemiological evidence. Am
J Clln Nutrition. (In press)
18. Hunter DJ, Spiegelman D, Adami H-O, eta/. Cohort studies
of fat intake and the risk of breast cancer: A pooled analysis.
N Engl J Med 1996; 334: 356-361.
19. Whittemore AS, Harris R, Imyre J, et a/. Characteristics re-
lating to ovarian cancer risk: collaborative analysis of 12 US
case-control studies. II. Invasive epithelial ovarian cancers in
white women. Am J Epidemiol 1992; 136:' 1184-1203.
20. Harris R, Whittemore AS, Imyre J, eta/. Characteristics re-
lating to ovarian cancer risk: collaborative analysis of 12 US
case-control studies. III. Epithelial tumors of low malignant
potential in white women. Am J Epidemlol 1992; 136: 1204-
1211.
21. Annegers JF, Strom H, Decker DG, eta/. Ovarian cancer: In-
cidence and case-control study. Cancer 1979; 43: 723-729.
22. Demopoulos RI, Seltzer V, Dubin N, et a/. The association
of parity and marital status with the development of ovarian
2063633265

Mera:Analysis atxd EvaLuating Small Effects in Observational Studies
229
carcinoma: Clinical implications. Obstet G~/necol 1979; 54:
150-155.
23. Newhouse ML, Pearson RM, Fulierton JM, et d. A case-
control study of carcinoma of the ovary. Br J Prey Soc Med
t977; 3t: 148-153.
24. Wynder EL, Dodo H, Barber HRK. Epidemiolo~ of cancer
of the ovary. Cancer 1969; 23: 352-370.
25. Negri E, Franchesi S, Tzonou A, et d. Pooled analysis of 3
European case-control studies: I. Reproductive factors and risk
of epithelial ovarian cancer. Int J Cancer 1991; 49: 50-56.
26. Hartge P, Schiffman MH, Hoover R, eta/. A case-control
study of epithelial ovarian cancer. Am J Obstet Gyneeol
1989; 161: 10-16.
27. Cramer DW, Hutchison GB, Welch GR, eta/. Determinants
of ovarian cancer risk. I. Reproductive experiences and family
history. J Nail Cancer Inst 1983; 71: 711-716.
28. Nasca PC, Greenwald P, Chorost S, eta/. An epidemiologic
case-control study of ovarian cancer and reproductive factors.
Am J Epidemioi 1984; 119: 705-713.
29. Shapiro S. Re: "The authors reply" to re: "Characteristics re-
lating to ovarian cancer risk: collaborative analysis of 12 US
case-control studies. II. Invasive epithelial ovarian cancers in
white women." [Letter to the Editor] Am J Epidemiol 1994;
140: 3.
30. Shapiro S. Risk of ovarian cancer after treatment for infertil-
ity. [Letter to the Editor] N Engl J Med 1995; 332: 1301.
31. Whittemore AS. The risk of ovarian cancer after treatment
for infertility, hl Engl J Med 1994; 331: 805-806.
32. Posthuma WFM, Westendorp R.GJ, Vandenbtoucke JP.
Cardioprotective effect of hormone replacement therapy in
postmenopausal women: Is the evidence biased? Br Med J
1994; 308: 1268-1269.
33. Petiti DB. Of babies and bathwater. Am J Epldemiol 1994;
140: 779-782.
34. Chalmers I, Altman DG, Eds. Systematic Reviews. London:
BMJ Publications; 1995.
35. Feinstein AR. Meta-analysis: Statistical alchemy for the 21st
century. J Clin Epidemiol 1995; 48: 71-79.
