Philip Morris
News & Numbers A Guide to Reporting Statistical Claims and Controversies in Health and Other Fields
Fields
- Author
- Cohn, V.
- Mosteller, F.
- Area
- SCIENTIFIC AFFAIRS/BLACK LATERAL OLD S&T
- Type
- PUBL, PUBLICATION, OTHER
- Master ID
- 2023512310/2514
Related Documents:- 2023512310-2514 Epidemiology and Environmental Tobacco Smoke
- 2023512316-2317 Statistical Significance and Confidence Intervals
- 2023512329-2340 Environmental Tobacco Smoke and Lung Cancer: A Critical Assessment
- 2023512341-2348 What Is the Epidemiologic Evidence for A Passive Smoking - Lung Cancer Association?
- 2023512361-2362
- 2023512364-2440 A Dictionary of Epidemiology
- Document File
- 2023512309/2023512515/Ets Issue Binder: Epidemiology
- Characteristic
- EXTR, EXTRA
- MARG, MARGINALIA
- Litigation
- Okag/Privilege Withdrawn
- Okag/Produced
- Named Organization
- Harvard Univ
- Library of Congress
- Author (Organization)
- Harvard Univ
- Washington Post
- Site
- R529
- Date Loaded
- 24 May 1999
- UCSF Legacy ID
- zjc02a00
Document Images
®
News~
Numbers
A GUIDE 'IC7 REPORTING STATISTICALCLAIMS AND CONTROVERSIES IN HEALTH
AND OTHER FIELDS
Victor Cohn
SENIOR WRRER AND COLUMNiST, FORMER SCIENCE EDf[DR.
Wachingfan Flui
FOREMP-D sv Frederick Mosteller
ROGER G,LEE PROFESSOR EMERTMS OF MATHEMATICAL STA77STdCS,
Hmrmd (Iniurney
A1~-oject of the CentEr for Health Communication
Harvard School of Public Halth
Iff IOWA STATE UNIVERSITY PRESS / AMES
®
8
m
®
®
®

A Note tc
® 1989 Victor Cohn. All rights neserved
Compoeed by Iowa State LJnivreisry Pness
Printed in the United States of Americs
No pan of this book may be reproduced in any form or by any ekcvonie or merlianical
means, inrliiding information aorage and reuieval rystems, without written pennission
from the publisher, except for brief passages quoted in a nview.
First edition, 1989
Library of Congress Cataiogin;-in-Publication Data
Cohn, Victoq 1919-
News & numbers.
'A project of the Center for Health Communication, Harvud School of Public
Health.'
1. Public health-Statiaia. 2: Environmental bealth-Statistics. 3. Vital
.utiatio. I. Harvard School of' Public Health. Center for Health Communiea-
uon:, II. Tide. III. Tide: News and numbers.
RA407.Cb4 1989 362.1'021 88-6807
ISBN 0-8138-1442-1
ISBN 0-8138-1437-3 (pblc.)
THE ruTe
fied. They aF
stated or iinF
porting, busii
This gui
language of :
about t},- ^na
on sor.( P
project of the
health and th
ciples and mz
used by inqu:
a scientific re
environment
weighandc
shows how tJ
N
0
N
W
U1
~
W

A Note to Readers
THE rules of statistics are the rules of good thinking, codi-
fied. They apply to any kind of reporting in which numbers-
stated or implied-are involved: political reporting, science re-
porting, business, economics, sports, or whatever:
This guide is an attempt to explain the role, logic, and
language of statistics, so we reporters can ask better questions
about the many alleged facts or findings that rest, or should rest,
on some credible numbers. Because this manual began as a
project of the Harvard School of Public Health, the reporting of
health and the environment is the major example. But the prin-
ciples and many of the suggested "questions for reporters" can be
used by inquiring reporters in any field. They can help you read
a scientific report or listen to the conflicting claims of politicians,
environmentalists, physicians, scientists, or almost anyone and
weigh and explain them. And the final chapter specifically
shows how these principles apply in all areas.
!
VICTOR COHN
N
0
N
Ca
t!1
~
N
>~A
tiP
CA
0
0

~._
~_---
- ~.
Contents
hQRF:WL)FCI) KY F':rdentk r4fnatel4., ix
ACKNCXNLEDC:Iv1F.P1"I?5; xi
1. Facts and Figures-We Can Do Better, 3
2. The Certainty of Uncertainty, 8
3. The Scientific Way, 12
Probability, 14
'Power' and Numbers, 20
Bias and Confounden, 24
Vuiability, 30
4. Studies, Goo&and Bad; 35
Experiments versus Seductive Anecdotes, 37
Clinical Trials, 38:
What Makes a Study Honest' 40
Epidcmiolcgy: Hippocrates to AIDS, 43
S. Questions Reporters Can Ask, 48
6. 'I'ests and Testing, 64
Drugs and Drug Triali, 68
Animals as Models for Us. 72
7. Vital Statistics: The Numbers of Life and Health, 74
Crude Rates versus Rates Tham Compare, 76
OtAer Ways to Compaee, 78
Rcponing Hoapita! Death Rate, 79
Cancer Rates and Cancer'Cutei , 86
The linportant Quetioru about Cuxer, 88
Shifes, Drifts, and Blip, 96
vii
®
®
®
w
a
0
®
0

viii CON'1't:N'tS
8. The Statistics of Environment and Risk, 98
Who's Bdleva6le? 1U7'
Qlleft1oT/5 ttl Ask, 108
Evaluating Envnnnmemal Huards, 116
Advice from kcponers, 121.
9. The Statistics of Politics, Economics, and' Democraey 126
The State of'the Nation's Sutittia, 146
The Bonom Lne 151
w r1 t:R t-: '1 v r.EA R N moR r: A Bibleogapliy and orheT Sourcu, 153'.
NO'1'F.S, 157
GLOSSARY/iNAEX, 165
Foreword
REPORIT
science to the
accvracy.A1th!
stories, the bic(
presents speci2
misleading mt
sistent," and 'y
sults that ane
layTnen' r
definitio.__ -a~ftc
siderable differ
Science h
such as biostat
have been imp
ertheless, they
permanent for
Victor Cc
ual to'help all
wants to give t
facts or mystif
Cohn's bo
Science Policy
Research and
that faculty m
have been able

/
<
®
N
Foreword
REPORTERS play an essential role in communicating
science to the public. In common with scientists, they desire
accuracy Although health and medicine provide many exciting
stories, the biostatistScs that scientists must use in their studies
presents speciaI problems for reporters. It gives uncommon and
misleading meanings to common~ words like "significant," "con-
sistent," and "power." Mathematical statistics often produces re-
sults that are disturbingly counterintuiti've, at least at first, to
laymen and scientists alike. In vital statistics and epidemiology,,
definitions often seem arbitrary, and slight changes make con-
siderable differences in the findings.
Science writers often take short courses in special topics
such as biostatistics. I have taught in some of these courses and
have been impressed by the seriousness of the participants. Nev-
ertheless, they need some of this material in an accessible and!
permanent form.
Victor Cohn~ of the Y1rashington Post has prepared this man-
ual to help all reporters cut through these statistical tangles. He
wants to give them a guide to the ways that statistics can darify
facts or mystify the reader.
Cohn's book grew out of the Media Project of our Health
Science Policy Working Group of the Division of Health Policy
Researeh and Education at Harvard University. I am pleased
that faculty members of the Harvard School' of Public Healtlt
have been able to help him produce this book as a visiting fellow
®
ix
0
®
t
®

x FOREWURD
in 1978 and 1984 and as a contributor to the Health Science
Policy Working Group.
Through the Media Project, with the help of Jay Winstens
we have also examined sources of pressures on the science
writer.' In the future we wanr to use what we have learned
through many discussions with science writers to advise scien,
tists on their role in the media.
By such e$brts, including this book, and by many similar
efforts in this and other fields, scientists and writers may gradu-
ally upgrade the whole communication system, scientific and
journalistic: Thus we may clear the communication channel
between science and~ the public.
FREDSRICK MOSTELr _FR
AcknowlE
MY mai
has been Ur.
tus of mathe
partments of
Harvard Sch
energy, and! }
for the fact
approach rat:
statcrr ~r
Tt,, prc
tions and by
ing which p
journalists, b
Ididmt
School of Pu
Center for IF
guide, and :
Thomas A. l
were Dts. C
Kaiser orgar:
and Peter Iv
writings 116
Cass Pete.r,o
Runkle, no :
I aLso o,
.

i
i
R
~
MY main mentor and guide in the preparetion of tlus book r
has been Dr. Frederick Mosteller, Roger I. Lee professor emeri- s
tus of mathematical statistics and former chairman of the de- s
pamnents of Biostatistics and' Health Policy and'Managemenr, 2
Harvard School of Public Health. He gave so fully of his time, ;
energy, and knowledge that he should be listed as coauthor but
for the fact that I sometimes used a journalist's freewheding ~
approach rather than a statistician's rigor. This makes any mis- ;~
statements mine.
The project was supported~ by the Russell Sage Founda-
tion, and by the Council for the Advancement of Science Writ-
ing, which pointed the way in holding seminars on statistics for
journalists induding the first of its kind in 1964.
I did much of the work as a visiting fellow at the Harvard~
School of Public Health, where Dt: Jay Winsten, director of the
Center for Health Communication, was another indispensable
guide, and Drs. John Bailar III, Nan Laird, Philip Lavin,
Thomas A. Louis, and Marvin Zelen were valuable helpers. As
were Dts. Gary D. Friedman and T homas M. Vogt of the
Kaiser organizations, Michael Greenberg of Rutgers UniNersity, n.
and Peter Montague of Princeton University (on all of whose Q
writings I leaned); Lewis Cope of the Minneapolis Star Tribune; r, w",
Cass Peterson of the Washington Post; and my daughter, Deborah ci
Runkle, no mean statistician. ~J1
I also owe thanks to Harvard's Drs. Peter Braun, Harvey ~
~
©

Fineberg, Howarr] Frazier, Howatd' Hiatt, William Hsaio,
Hetb Sherman, and William Stason. And to Drs. Stuart A.
Bessler, Syntex Corporation; H. Jack Geiger, City University of
New York; Nicole Schupf Geiger, Manhattanville College;
Charjes Moertel, Mayo Clinic; Arnold Reltnan, New Eregland
fourrusl ofil?edr<inc Eugene Robin, Stanford Universiry;and Sid-
ney Wolfe, Public Citizen Health Research Group. Also
Katherine W2llman, Council of Professional Associations on
Federali Statistics; Howard~ L. Lewis, American Heart Associa-
tion; Philip Meyer, University of North Carolina; Mildred~
Spencer Sanes; Earl Ubell, WCBS-TV, New York City; and
Philip Hilts, Cristine Russell, and Barry Sussman, Washington
Po.rt. I am indebted to my editors at the Washington Post, particu-
larly Abigail 1 Trafford, Ben Cason, Carol Krucoff, Len Downie,
and Howard Simons for their understanding and support.
The work was also aided by the Andrew W. Mellon Foun-
dation. The American Cancer Society, American Heart Asso-
ciation, Commonwealth Fund Gannett Foundation, Henry J.
Kaiser Family Foundation, Mayo Medical Resources, Milbank
Memorial Funds Pew Charitable Trusts, Philip ~ L. Graham
Fund, Russell Sage Foundation, and John~ Cowles, Jr., have
contributed to this manual's initial distribution.

a
Facts and Figures -
We Can Do Better
Facts and Figures! Put 'em Down!'.
-Chules Dick'ens (in 77r Chbnc)
There are lies, there arr damtud lies, and thete axe statistics.
-Duraeli
Almost everyone has heard that 'tigures don't 6e, but liars can figute' We need
statistics, but Uars give them a bad name, eo to be abk to tell the liars from the
statisticiasu is crucial.
,
®
-Dr. Robert Hooke
I I E journalists like to think we deal mainly in facts and
ideas, but much of what we report is based on numbers.
Politics comes down to votes. Budgets and dollais dominate
government. The economy, business, employment, sport.s-all
dtmand numbers.
'Ihe environment, pollutants, toxic chemicals. Again, we
see counts and measurements and, most likely, widely varying
estimates, some careful, some questionably high or low: An
environmentalist says a nuclear power plant or toxic waste
dump will cause so many cases of cancer. An industry spokes-
man denies it. What are their numbers? Where did they get
them? How valid are they?
A doctor reports a promising, even exciting new treatment.
Is the claim justified or based on a biased or unrepresentative
sample? Or too few patients to justify any claim? Science, medi-
cine, technology, the weather, intelligence- all are statistical.
IN

i!
CHAPTER 1
Science is observation, experimentation, measurement, and all
these involve numbers, whether we reporters pay attention to
them or not.
Statistics are used or misused even~by people who tell us, "I
don't believe in statistics," then claim that all of us or most people
or many do such and such. The question for reporters is, how
should we not merely repeat such numbers, stated or implied,
but also interpret them to deliver the best possible picture of
reality?
We can be better reporters if we understand how the best
statisticians-the best figumrs-figure. And if we learn a few
questions to help us separate the wheat from the chaff:
I do not say that telling the truth-describing reality-will'~
then become easy, for we are constantly bombarded'witli sweep-
ing claims in convincing wrappings, and the disputed subjects
are endless. Medical and~ surgical treatments, radiation, pesti-
cides nuclear power, the probability of environmental disasters,
the side effects of inedicines-almost nothing seems settled.
Like it or not, we must wade in. Whether we will' it or not,,
we have in effect become part of' the regulatory apparatus. Dr.
Peter Montague of Princeton University tells us, The environ-
mental and toiric situation is so complex, we can't possibly have
enough officials to monitor it. Reporters help officials decide
where to focus their activity"
"f,ournalists opened up" the Love Canal toxic waste issue by
"independent investigation," according to Cornell University's
I1r: Dorothy Nelkin. The extensive press coverage contributed
to investigations that eventually forced the re-staffing of the En-
vironmental Protection Agency and the creation of a national
toxic waste disposal! program:"'
That very coverage, however, may also have stampeded
public officials into hasty, ill-conceived studies that left un-
answered ~ the crucial question: Did the Love Canal wastes ac-
tually cause birth defects and other physical problems?2 The
very way we report a medical or environmental controversy can
affect the outcome. If we ignore a bad situation,, the public may
f
I
FACTS Af:D FIG
suffer. If we v.
"no danger," tI&
experimental i
false hope.
It isnot ,
National Can
refuse to con
think "carcinc
persons proba
cancers are er
most inforrnec
related main]
and very pos
percent ofL
aL
carcinogens-
foods, air, an
When it
issues, or wl-
making the si
state or unde
of he 1J
statisti~ at
terpretationm
evident; you
negative]. A
sterile is mon
that apple pi
We also
the space or
news di:recto ~
story yet." EN
done. In a r
major southc
traction afren
who worked
numbers fro

suffer. If we write "danger; the public may quake. If we write
"no danger," the public may be falsely teassured. If we paint an
experimental rnedicali treatment too brigtitly, the public is given
false hope.
It is not just what we write, it is what we emphasize. A
National Cancer Institute survey indicated that many persons
refuse to consider healthy changes in life-style because they
think "carcinogens are everywhere in the environment." Such
persons probably have read or heard again and again that most
cancers are environmentally related, although, in the opinion of
most informed scientists, most fatal "environmental" cancers are
rdated~ mainly to individual behavior, outstandingly smoking,
and very possibly diet. By various estimates, perhaps 5 to 15
percent of all cancers are related to exposures to man-made
carcinogens -chemicals we have inserted into the workplace,
foods, air, and water.'
When it comes to such emotionally charged and complex
issues, or when it simply comes to nanning for page one or
making the six o'clock news, the best among us sometimes over-
state or understate. Philip Meyer, veteran reporter and' author
of Rairion ,Journalirm, writes, 'Journalists who m.isinterprett
statistical data usually tend to err in the direction of overin-
terpretation.... The tzason for this professional bias is self-
evident; you usually can't write a snappy lead upholding [Ithe
negative]1 A story purporting to show that apple pie makes you
sterile is morr interesting than one that says there is no evidence
that apple pie changes your life'
We also work fast, sometimes too fast, with severe limits on
the space or tirne we may fill. We find it hard to tell editors or
news directors, "I haven't had; enough time. I don't have the
story yet:' Even, a long-term project or special may be hurriedly
done. In a newsroom "long-term" may mean a few weeks. A
majpr southern newspaper had to print a long, front-page re-
traction after a series of front-page stories alleged that people
who worked at or lived~ near a plutonium plant suffered in excess
numbers from a blood; disease. "Our reporters obviously had

FACTS AND FIGURFS: WE CAN DO BETTER 7
not patently absurd, it may not be the ltad you would go for a
year later"
We reporters are also subjecr to human hope and human
fean A new `cure" comes along, and we want to believe it. A
new alarm is sounded, and:we too tremble..
Alarrns also make news. We too often obey a sardonic
maxim: Bad news is good news; good news is no news. Ih: H.
Jack Geiger, a respected former science writer andnow a profes-
sor of medicine, says,
I know I wrote stories in which I explained' or interpreted the results
wtvngiy. I wrote stories that didn't have the dixlAuners I should have
written. I wrote stories under competitive pressure, when it became
clear later that I shouldti t have written them. I wrote stories when I
hadn't asked-because I didn't know enough to ask-Was your study
capable of getting the answers you wanted? Could' it be interpreted to
say something else? Did you take into acmunt possible confounding
factors?'

6 CHAPTER 1
confused statistics and'scientific data;' the editor admitted. "We
did not ask enough questions."s
We tend to oversimplify We may report "A study showed
that black is white" or "So-and-so announced'~t}iar ...," when a
study merely suggested that there was some evidence that such
might be the case. We may slight or omit the fact that a scientist
calls a result 'preliminary." As scientific unsophisticates, we may
confuse a study that merely suggests a hypothesis that should be
investigated-very frequently the case-with a study that
presents strong and~ condusive evidence.
We often omit essential perspective, context, or back-
ground! Dr. Thomas Vogt of the Kaiser Permanente Center for
Health~ Research tells of seeing, the headline `Heart Attacks
From Lack of 'C' " and then, two months later, 'People Who
Take Vitamin C Increase Their Chances of a Heart Attack"a
Both stories were based on limited, and'i far from condusive,
animal studies.
Scientists who do poor studies or overstate their results
deserve part of the blame. But bad~ science is no excuse for bad
journalism. We tend to rely most on "authorities" who are either
most quotable or quickly available or both, and'they often tend
to be those who get most carried away with their sketchy and
unconfirmed but "excfting" data-or have big, axes to grind,
however lofty their motives. The cautious, unbiased scientist
who ~ says, "Our results are incondusive" or "We don't have
enough data yet to make any strong statement" or "I don't know"
tends to be omitted or buried' someplace down in the story.
We are influenced too by intense and growing competition
to tell~ the story first and tell it most dramatically, I was once
asked by a Harvard researcher, "Does competition, affect the way
you present a story?' I thought and had to answer, "We have too
almost overstate. We have to come as dose as we can within the
boundaries of truth to a dramatic, compelling statement. A
weak statement will go no place" Another reporter said4 'he
fact is, you are going for the strong [lead and story]. And, while
FACTS AND FIC
not patently a'
year later.'"
We repor
fear. A new "c
new alarm is ~
Alarms a
maximc Bad r
Jack Geiger, a
sor of inediar
I know I wrote
wrongly. I wroi
written. I wrot
dear later that
hadnh asked-1
capable of getti
say something
factors?'
How car..
N
©
N
W
C11
~
IV
~
CA
j

The Cerlainty
of Uncertainty
Too muah of the Kornce reporting in the press [blurs] what we'tc sure of and'
what we're not very sure of and what is incandusive. The notion of tentative-
ness tends to diop out of much reporting.
-Lk. Harvey Brooks
The only trouble with a sure thing is the uncertainty.
-Author unknown
THE first thing to understand about science is that it is
almost always uncertain. A scientist, seeking to explain or trn-
derstand something-be it the behavior of an atom or the effect
of the toxic chemicals at a Love Canal-usually proposes a
hypothesis, then seeks to test it by experiment or observation. If
the evidence is strongly supportive, the hypothesis may then
become a theory or at some point even a law, like the law of
gravity.
A theory may be so solid that it is generallyy accepted.
Example: the theory that cigarette smoking causes lung cancer,
for which almost any reasonable person would say the case has
been proved, for all practical i purposes. The phrase "'for all prac-
tical purposes" is important, for scientists, being practical peo-
plh, musr often speak at two levels: the strictly scientific level
and'the leveJ of ordinary reason that we require for daily guid-
ance.
Example: In June 1985, 16 forensic experts examined the
bones that were supposedly those of the "Angel of Death," Dr.
Josef Mengelt. Dr. Lowell Levine, deltgated by the Depart-
ment of Justice, then said; 'he skeleton is that of Josef
0
THE CERTAIhM
Mengele withi.r
cos Segne of th
with ~ the law o
cians." Pushed
important mati
of the patdiolof
findings.' (Iat,
But all' ar
tainty in almos
bilit} that such
Widely bc
wholly or part)
say" reports L
, Jnurnal of Madi,
help the public
with an eltmer
a probable nat
not certainty. V
best opin,'^*t att
furure." )
Exa.--,.e:
mended'that M
cal cancer. Th
three years fo:.
Statistics had
matter is still c
changed again
Scientists
a failing. Whe
sionalliy' shows
right, the scier
ing.
The publi
have a hard
sions. We all
todaN, and ano
show discussir

f
Mengele within a reasonable scientific certainty;` and Dr. Mar-
cos Segre of the Uivversity of Sao Paulo, explained, "We deal
with the law of probabilities. We are scientists and not magi-
cians" Pushed by reporters' questions- after all, this was an
important matter, and what should the public believe? -several
of the pathologists said they had "absoliitely no doubt" of their
findings.` (Later evidence made the case even stronger.)
But all any scientist can scientifically say -say with cer-
tainty in almost any such case-is, there is a very strong proba- ?
bility that such and such is true. :
Widd'y believed theories or conclusions are often proved ~
wholly or partly wrong. 'When it comes to almost anything we *
say;r reports Dn Arnold Relman, editor of the New Ergland s
fournal of Medici'ru, 'you, the reporter, must realize-and~ must ;
help the public undetstand-that we are almost always dealing ;
with an dement of uncertainry. Most scientific information is of
a probable nature, and we are only talking, about probabilities,
not certainty. What we are concluding is the best we can do, our t
best opinion at the moment, and~ things may be updated in the t~
future.
Example: Until 1980 the American Cancer Society recom-
mended that women have an annual Pap smear to detect cervi-
cal cancer. The recommendation was then changed to every
three years for many women, after two initial' examinations.
Statistics had shown that this would be equally effective.j The
matter is still controversial, and the recommendation has been
changed again in the light of new knowledge.
Scientists are often wrong. In science this is not necessarily
a failing. When new evidence disproves an old~ theory, or occa-
sionally shows that some little believed, even kooky notion is
rigfit, the scientific method is doing what it should. It is work-
ing.
The public, and even some reporters and especially editors,
have a hard time understanding these sometimes drastic revi-
sions. We all hear the question, Why do they say one thing
today and another thing tomorrow? I was once on a radio talk
show discvssing unsettled medical controversies when a testy

10 c}ihPTE.R 2
listener phoned in to exdaim, ` They say is a damned liar!"
'hey" of course may be different theys who arrive at dif-
ferent conclusions about inconclusive evidence in a thousand'
areas: the role of fats and~ cholesterol in the diet, the effects of
low-level radioactivity; the cause of' the extinction of dinosaurs.
Why so much uncertainty? Science is always a continuing
story. Nature is compltx, and almost all methods of observation
and experiment are imperfect. "There are flaws in all studies,"
says Harvard's Dr. Marvin 2,r1en.' There may be weaknesses,
often unavoidable ones, in the way a study is designed or con-
ducted'. Observers are subject to human bias and error. Subjects
fluctuate. Measurements fluctuate.
Many studies are thus inconclusive, and virtually no single
study proves anything. "Fundamentally" writes Dr. Thomas
Vogt, "all scientific investigations require confirmation, and un-
til it is forthcoming all results, no matter how sound they may
seem s are preliminary.'
Medicine, in particular, is full of disagreement and con-
troversy. "No clinical trial is ever perfect" Harvard's Dr. John
Bailar observes. Unlike new drugs, medical treatments and tests
and surgicali operations need, not even be subjected to experi-
mental studies before being applied. `Most treatments escape
and will' continue to escape rigorous evaluation" Bailar says.s
The reasons are many: lack of funds to mount enoughh
trials; lack of enough patients at any one center to mount a
meaningful trial; the expense and difficulty of doing multicenter
trials; the swift evolution and obsolescence of medical tech-
niques; the fact that, with the best of intentions,, medieal data-
histories, physical examinations, interpretations of tests, descrip-
tions of symptoms and discases-arz notoriously inexact and
vary from physician to physician; and the serious ethical obsta-
dts to trying a new procedure when an~ old'~ one is doing some
good, or to experimenting on children, pregnant women, or the
mentally ill!
While all studies have flaws, some have more flaws than
others. Study after study has found that many artides in the
most prestigious medical journals are replete with shaky statis-
THE CERTAINTY
tics and lack of
tients' complica:
up. Papers pres
reported by thc
mere progress :
tive results that
or criticism or
uncertain findi
The upshc
organization's
care is based a
. . Seemingly
doctrines, perp
out to be suppK
be found."
In genera
possible benef.
that only a ra
cancer. Only r
less dra-"- trc
om}; o-" t_t~
is rich in, tren
or statistically
discarded.
Occasiom
sults: More of
data that contr
tical methods
ascribing fraui
inmindtheo
competence tc
So some
tainry need n
survive on, th
policy, to govc
basis of incom
can do so.
N
C
N
W
CA
~
N
~
m
O

tics and lack of any explanation of such crucial matters as pa-
tients' complications and the number of patients lost to follow-
f up. Papers presented at medical meetings, many of them widely
reported by the media, are even Itss reliable. Many papers are
mere progress reports on incomplete studies. Some state tenta-
tive results that later collapse. Some are given to draw comment
or criticism or get others interested in a provocative but still
uncertain~ finding.b
The upshot, according to Dr. Gary Friedman of the Kaiser
organization's Permanente Medical Group: "Much~ of health
care is based on tenuous evidence and incomplete knowledge. .
.. Seemingly authoritative statements and accepted~ medical
doctrines, perpetuated through textbook and lectures, often turn
out to be supported' by the most meager of evidence, if any can
be found.'
In: general, possible risks tend to be underestimated'd
and
possible benefits overestimated. For decades surgeons swore
that only a radical mastectomy was the treatment for breast
cancer. Only recently were clinical trials mounted to show that
less drastic treatments seem equally effective. Prefrontal lobot-
omy, overstrict bed rest, drugs by the c.arload-medical history
is rich in treatments that were given for years without question
or statistically rigorous study, only to be proved wrong and
discarded.
Occasionally, unscrupulous investigators falsify their re-
sults. More often, they may wittingly or unwittingly play down
data that contradict their theories, or they may search out statis-
tical methodfr that give them the results they wanr. Before
ascribing fraud, says Harvard's Dr. Frederick Mostelltr, "
keep
~
in mind the old saying that most institutions have enough in-
competence to explain almost any results"
So some uncertainty almost always prevails. But uncer-
tainty need not stand in the way of good sense. To live-to
survive on this globe, to maintain our health, to set public
policy, to govern ourselves -we almost always must act on the
basis of incomplete or uncertain information. There is a way we
can do so.

Somehow the wortdrous promise of t}ie earthl is thai ithert are things beautiful in
it, things wondrous artd alluring, and by virtue of your trade, you want to
underswtd them.
-Mitchell Feigenbaum
Corref/ Uaite+tity physiciu md'rmd&~
The great tragedy of Sciener-the slaying of a bwutifullhypothesis by an ugi~
fAct.
-'ITwmas Henry Huxlev
TO neporters, the worid is full of true believers, peddling
their "truths." The sincerely misguided and the outright fakers
are often highly convincing, also newsy. How can we tell the
facts, or the probable facts, from the chaH?
We can borrow from science. We can try to judge all possi-
ble claims of fact by the same methods and rules of evidence that
scientists use to derive some reasonable guidance in ~ scores of
unsetded issues.
As a start, we cam ask these questions:
How do yvu knom?
Have the cfaims ban subjeckd'to any studies or experiments?
Were the studies acceptable ones, by general agraenunt ? For exam-
ple: Were they without any substantial bias?'
Have nsulls been fairly consirteni from study to study?
Have du fmdtngs >uvlted in a coruenrtu among others in the same
frcld ? Do at luast the majority of infornrrd ' fxrsonr agrec?' Or should ux
unlhhald judgnrn! until there is more euidence?
Always: Are the cancGut'aru backed by beGictzbk stasistrcal aiderrce.P
12
THE SCIEh rlC '
And mhat it t/u c
be?
Obviousty,
rather than nur
that reporters c
There art
usefiil' ones: T}'
interpreting da
a way of' extr-ac(
of mathematic:
Statistics c
and inexpert si
be difficult for t
possible. Unce
in~ almost all.
There are
"Edison had it
author. 'It doe
lt did not take
ton's et~ -.nt
t.
centuny . .,'9(
until'then hac
Overwhe
probability, 4
and the use c
called the on
many events:
women, yet t
before it bec:
develbp hear
some years r
was to: womt
The bes
line (for exar
a study is ac

®
®
®
®
THE SCI&VIIFIC WAY 13
And uAet is tke degna of ccriainty or unartuiruy.~ How sure can you
be?
Obviously, much of statistics involves attitude or policy
rather than numbers. And much, at least much of the statistics
that reporters can most readily apply,, is good sense..
There are many definitions of statistics as a tool. A few
useful ones: The science and art of gathering, analyzing, and
interpreting data; a means of deciding whether an effect is real;
a way of extracting information from a mass of raw data; a set
of mathematical, processes derived from probability ttteory.
Statistics can be manipulated by chaiiatans, seif-deluders,
and inexpert statisticians. Deciding on the truth of a matter can
be difficult for the best statisticians, andsometunes no decision is
possible. Uhcertainry will ever rule in some situations and lurk
in~ almost all.
There are rare situations in which no statistics are needed..
"Edison had it easy," says Dr. Robert Hooke, a statistician and
author. "It doesn't take statistics to see that a light has come on."'
It did not take statistics to tell 29thrcentury physicians that Mor-
tons ether anesthesia permitted painltss surgery or to tell 20th-
century physicians that the first antibiotics cured infections that
until then had' been highly fatal.
Overwheltningly, however, the use of statistics, based on
probability, is called the soundest method' of decision making,
and the use of large numbers of cases, statistically analyzed, is
called the only means for determining the unknown cause of
many events. Birth control pills were tested on several hundred
women, yet the pills had to be used for several years by millions
before it became unequivocally dear that some women would
develop heart attacks or strokes. The pills had to be used for
some years more before it became dear that the greatest risk
was to women who smoked and women over 35.
The best statisticians, let alone practitioners on the firing
line (for example, physicians), often have trouble deciding when
a study is adequate or meaningfL Most of us cannot become
N
®
®
®
®
.,...-ow. *:.
®
®

14 CHAPTER 3
statisticians, but we can at least learn that there are studies and'
studies, and the unadorned c]aim~°1Ne made a study" or "We did'
an experiment" may not mean much. We can lcarn to ask more
pointed questions if we understand some basic concepts and
other facts about scientific studies.
These are some bedrock statistical concepts:
Probability
'Power" and numbers
Bias and confounders
Variability
Probability
Scientists cope with uncertainty by measuring probabilities.
Since all i experimental results and all events can be influenced
by chance and almost nothing is 100 percent certain in science
and medicine and life, probabilities sensibly describe what, has
happened and should happen in the future under similar condi-
tions. Aristotle said, 'he probable is what usually happens," but
he might have added that the improbable happens more often,
than most of us realize.
The accepted numerical expression of probability in evalu-
ating scientific and medical studies is the P(or probab:lf y) ~value.
The P value is one of the most important figures a reporter
should look for. It is determined by a statistical formula that
takes into account the numbers of subjects or events being com-
pared in order to answer the question, could a difference or
result this great or greater have occurred by chance alone.7'By more
precise definition, the P value expresses the probability that an
observed relationship or effect or result could have samrd to
occur by chance f there had aceually ban no rral efict. A low P value
means a low probability that this happened,, that a medical
treatment, for example, might have been declared beneficial
when in truth it was not.
Here is why the P value is used to evaluate results. A
THE SCIENTIFIC
scientific invest
commonly sets
h}potlu_sis; that
back the origin
pothesis. The
number or as
"gneater than'
pened, that th
r
-,.nanoe -or, , to
carialrnn.
By com
only 5 or fewe
pened by char
usually called .
ues are used).
ally implies th
A Alker
statistically sig
result is due tc
In ~~
the shoL
dinary logic.
replaces 'it ca
Why the
People have ;
purposes. Anc
Mosteller telll
class and afte:
cious going o:
the fifth heac
chance in 16
that five heacr
there is some
neighborhooc
Another
late a cnnfsde.

THE SC18fPI7FIC WAY 15
®
scientific investigator first forms a hypothesis. Then he or she
cotnmonly sets out to try to drsprorx it by what is called the wd
h,Ybnthuczr. that there is no effect, that nothing wi1 happen.. To
back the original hypothesis, the results must rtjad the null hy-
pothesis. The P value, then, is expressed either as an exact
number or as <.05, say, or >.05, meaning less than' or
'greater than" a 5 percent probability that nothing has hap-
pened, that the observed result could have happened just by
e$ance-or, to use a more elegant statistician's phrase, by mndom =
canation:
By convention, a P c+aluc of . 05 or 14u; meaning there are `
only 5 or fewer chances in 100 that the result could have hap- `
pened by chance, is most often regarded as low. This value is
'
usually calltd'statirtically s~mufrcant (though sometimes other val- 5
ues arc used), The unadorned term 'statistically significant" usu- _
ally implies that P is .05 or less. _
A higher P cnlire, one graater than . 05, is usually seen as not
statistically significant. The higher the value, the more likely the ~~
result is due to chance. t ~
In common language, a low chance of chance alone calling
the shots replaces the ~it's certain" or 'dose to certain' of or-
dinary logic. A strong chance that chance could have ruled
replaces "it can't be" or 'almost certainly can't be."
Why the number .05 or less? Partly for standardization.
People have agreed that this is a good cutoff point for most
purposes. P.rnd partly out of old friend common sense. Frederick
Mosteller tells us that if you toss a coin repeatedly in a college
dass and after each toss ask the class if there is anything suspi-
cious going on, 'hands suddenly go up all over the room' after
the fifth head or tail in a row. There happens to be only 1
chance in 16-.0625, not far from .05, or 5 chances in 100-
that five heads or tails in a row will show up in five tosses, 'so ~
there is some empirical evidence that the rarity of events in the 0
neighborhood of .05 begins to set peoples teeth on ed'ge" N
Another common way of reporting probability is to calcu- ~
late a confrdenu 1a~1; as well as a confdnr,c interpal (or c»nf:dimc edimce ~
~
~
Ll'1
e
0
M
®
M

36 CHAPTER 3
limits or rrnege)'. This is what happens when a politiral pollster
reports that candidate X would~ now get 50 percent of the vote
and thereby lead candidate Y by 3 percentage points, 'with a 3-
percentage-point margin of error plus or minus and a 95 per-
cent confidence level.' In other words, Mr. or Ms. Pollster is 95
percent confident that X's share of the vote would be someplace
bet+ween 53' and'47 percent. Similarly, candidate Y's share might
be 3 percentage points greater (or less) than the figure predicted.
In a close election, that margin of error could obviously turn a
predicted defeat into viaory: And that sometimes happens.
An~impottant point in looking at the restilts of political polls
(and any other statements of eonfidence): In the reports we
read, the plus or minus 3 (or whatever), percentage points is
often omitted, and the pollster merely mentions a'3-point
margin of error.° This means thete is actually a 6-point range
within which the truth probably lurks.
The more people who are questioned in a political poll or
the larger the number of subjects in a medical study,, the greater
the chance of a high confidence level and a narrow, and there-
fore more reassuring, confidence interval.
No matter how reassuring they sound, P values and confi-
dence statements cannot be taken as gospel, for .05 is not a
guarantee, just a number. There are several important reasons
for this.
All that P values measure is the probability that the results
might have been produced by some sneaky random process. In
20 results where only chance is at work, 1, on the average, will
have a reassuring-sounding but misleading P vali,e of <.05.
One, in short, may be a false positive.
Dr. Marvin Zelen points ouo that there may be 6,000 to:
10,000' clinical (medical) trials of cancer treatment under way
today, and if the conventional value of .05 is adopted as the
upper permissible limit for false positives, then every 100 studies
with no actual i benefit, may, on ~ average, produce 5 false-positive
results. Hence, we may expect 50 false positive results, on
THE SC]£'.TIFIC V
averageforever
fact has said', "W.
chemotherapy in
therapies in the
paths.
Arrtaangly;
tected. Scientists
negative results.
them. Nor are se
ing studies that
firmatoryy studie
Statistical i
cause and effect
member the roo
Uriless an associ
thatthecaseisc
ing more study
To statistic
ference betweer
there is r ' -as
1
conelatinn .tn
If the nw
value may sim
detect somethin
jects. Highly "si
ble differences i.
An impr
other variable
not taken into
Statistica
cal l- that is, m
rienced reporte
and jump to t1
called their stu
tween two !larF

®
THE SCIEMIFIC WAY 17
average, for every 1,000 trials with no beneficial effects! Zden in
fact has said, 'We may now have reached an impasse in cancer
chemotherapy in which there are large numbers of false-positive
therapies in the clinic;' leading physicians down many false
paths.
Amazingly, most false positives probably remain unde-
tected. Scientists do not profit much professionally by reporting
negative results. journal editors are not keen on publishing
them. Nor are scientists keen on doing costly and time-consum-
ing studies that merely confirm someone else's work, so "con-
firmatory studies are rare," Zelen reports.
Statistieal significance alone does not mean there is a
cause and effect. Corrrlation or associatioa is not causation. Re-
member the rooster who thought his crowing made the sun rise?'
Unless an association is so powerful and so constantly repeated
that the case is overwhelming, association is only a due, mean-
ing more study or confirmation is needed.
To statisticians, incidentally, there is this important dif
ference between correlation and association: Auoaation means
therc is at least a possible relation between two variables. A
comfation is a measure of the association.
If the number of subjects is too small, an unimpressive P
value may simply mean that there were too few subjects to
detect something that might have shown an effect in more sub-
jects. E-iighly "significant" P values can sometimes adorn negiigi
ble differences in large samples.
An impressive P value might also be explained by some
other variable or variables -other conditions or associations -
not taken into account.
Statistical significance does not mean biological, dini-
cal-that is, medic.af-or practical significance, though inexpe-
rienced reporters sometimes see or hear the word "significant"
and jump to that condusion even reporting that the scientists
called their study "significant." Example: A tiny difference be-
tween two large groups in mean hemoglobin concentration, or
0
M
CH
W
0
®
0
®

]8 CHAPTER 3
red blood count (say, 0.1 g/100 mL, or a tenth of a gram per
100 rnilliliters)i may be statistically significant yet medically
meaningiess:'
Eager scientists can consciously or unconsciously manip-
ulate the P value by failing, to adjust for other factors, by choos-
ing to compare different end points in a study (say, condition on
Itaving the hospital' rather than length of survival), or by choos-
ing the way the P value is calculated or reported.
There are several mathematical paths to a P value, such as
the chi-square ()?), t, F,, r, andpaired t tests. All may be legiti-
mate. But be wanned; Dr. David Salsburg of Pfizer, Inc., has
written in the Ameican Statistuii of the unscrupulous practi-
tioner who "engag,rs in a ritual known as 'hunting for P values' "
and finds ways to modifiy the original data to "produce a rich
collection of small' P values" even if those that result from simply
comparing two treatments 'never reach the magical .05 "'
"If you look hard enough through your data," contributess
an investigator at a major medical center, "if you do enough
subset analyses, if you go through 20 subsets, you can find
one"-say, "the effect of chemotherapy on premenopausal
women with two to five lymph nodes"-'with a P value less than.
.05. And people do this"
"Statistical tests provide a basis for probability statements,"
writes Dr. John Bailar,, "only when the hypothesis is fully devel-
oped before the data are examined.... If even the briefest
giance at a study's results moves the investigator to consider a
hypothesis not formulated before the stUdy was started, that
glance destroys the probability value of the evidence at hand':"
(At the same time, Bailar adds, 'review of data for unexpected
..
dues ... can~be an immensely fruitful source of ideas" for new
hypotheses "that can be tested' in the correct way" And occa-
sionally 'findings may be so striking that independent confirma-
tion ... is superfluous.")°
A rather sophistieated-and possibly touchy - line of ques-
tioning that some reporters might want to try if they're skeptical:
How did yare mrirx at yow P oaGw? Did wu use the tet planntd in
THE SCIEh'T'[FIC ~ k
advance t n y(ltlf pro: report tha brst-soun,
An6you~ m,
The laws of
even impossible-
We've all i tal
and bumped ini
don't know, but
work, the chanca
1,024. Yet I woi,
year period. W?
statistiaans call'
few people wi&
cover,, there will
birth defects thE
in a great while
In a large
unusual. They
and ofter µ
duce unr_~olf
evidence. 'he
large number c
occurred. They
itry, are wrong.'
'We [repo
dence,". Philip
and we are rif
mind our read
from a few in
member. The
A statistic
people or, a st;
whom such ar
The chance of
oping leukemi

i
THE SCIEMFIC WAY 19
adramut in your protarol or study derigre; or did you apply srrxral Grsts, then
report the best-souwrdiitg one?
And you may think of other questions.
The laws of probability aL#o teach us to apaY some unusual,
even impossible-sounding events.
We've all taken a trip to New York or London or someplace
and bumped into someone from home. The chance of that?' I
don't know, but if you and I tossed for a drink every day after
work, the chance that I would ever win 10 tunes in a row is 1 in
1,024. Yet I would probably do so sometime in a four- or five-
year period. What I like to call ttu Law of Unusual Events-
statisticians calliir the Law of Small Probabilities-tells us that a
few people with apparently fatal illnesses will inexplicably re-
cover, ttiere will be some amazing clustets of cases of cancer or
birth defects that will have no common cause, and I may once
in a great while bump into a friend far from home.
In a large enough population such coincidences are not
unusual. 1liey are the rule. They produce striking anecdotes
and often striking news stories. In the medical world they pro-
dutx unreliable, though often cited, testimonial or anecdotal
evidence. 'he world is large," Vogt notes, "and one can find a
large number of people to whom the most bizarre events have
occurred. They alli have personal explanations. The vast major-
ity are wrong.7
'We [reporters] are overly susceptible to anecdotal evi-
dence," Philip Meyer writes. `Anecdotes make good reading,,
and we arc right to use thern.... But we often forget to re-
mind our readers-and ourselves-of the folly of generalizing
from a few interesting cases.... The statistic` is hard to re-
member. The success stories are not."
A statistic to ask about is the drnomirralor-the number of
people or, a statistician would say, the populalion or domain -in
whom such an event might happen. Zden cites this example:
The chance of any youngster between ages five and nine devel-
oping leukemia is 3 in 100,000 per year. In a school with 100
®
e
8
©
®

children ~ of this age group, we would expect only 3 cases in 100
years. But in this nation with thousands of schools, we would
occasionally-such is chance-firld schools with 3 or more cases
in a single year. 'Then one is faced with the problem of interpre-
tation," Zden says. "Is this one of those rare events that is surely
going to be observed? Or is it due to some causal factor?"
A reporter in tiis instance might ask a statistician at the
National Cancer Institute or a medical center, What is the
chance of such an event in such a population? How many
similar unusual events are probably never reported?
'Tower" arad"hjumbers
This gets us to another statistical concept: pouxr. Statisti,
cally, 'powet' means the probability of finding something if it's
there. Example: Given that there is a true effect, say a difference
between two medical treatments or an inarase in cancer caused
by a toxin in a group of workers, how likely are we to find it?
.Samplc siu confers power. Statisticians say, "Funny things
can happen in small samples without meaning very much" ...
"There is no probability until the sample size is there" ...
"Large numbers confer power" ..."Large numbers at least
make us sit up and take notice."'
All this concern about sample size can also be expressed as
the lau of lnrgc numbers, which, says that as the number of cases
increases, the probable truth of a condusion or forecast in-
creases. The vaMity(truth or accuracy) and relinbility (reproduci-
bility) of the statistics begin to converge on the truth.
We already learned this when we talked about probability.
'There u another unrrlated uac of the wotd 'pawer, 5oenuns rnrrunoniy epeak of
inocsing or 'raiang" some quantity by a puar of 2 or 3 or 100 or w}iatever: 'Powef
hec mrina the product you get when you muluply a number by itarlf one or more
umes. 7htu, in 2 x 2= 4, 4 is the ¢condpower of 2, or to put it ano[her way, there
are two 2's in your equation. This is oommonly written 2' and known as 2 to the seoond
power or,iust 2to the aecond. In 2 x 2 x 2= 8, 2 his been ruted to the third power.
Whrn you think abour 21Dyou we the need for the shorthand.
But by thinkin.
both sample si
too affects the p
if the number ,
shift from succc
cally decrease t,
If six patit
rate, the shift
success rate to
any case that t
valid or aceur,
not have relia'
samples. The
no fatal biases
would have ir
I have m\
dairn, T' k a
finding '3c
example, zoume
Would it aersn
Or if they
100 percent ir
total and subtr
changed4 rxcei
analysis. But 1~
times try , thre
problem or er

0
THE SCIENTIFIC WAY 21
But by thinking of power as statisticians do-as a function of
both sample size and the accuracy of measurement, since that
too affects the probability of finding something-we can see that
if the number of treated patients is small in a medical study, a
shift from success to failure in only a few patients could dramati-
cally decrease the success rate.
If six patients have been treated with a 50 percent success
rate, the shift to the failure column of just one would cut the
success rate to 33 percent. And the total number is so small in
any case that the rtslilt has little reliability. The result might be
valid or accurate, but it would not be generalizable - it would
not have reliability until confirmed by careful studies in larger
samples. The larger the sample, and assuming there have been
no fatal biases or other flaws, the more confidence a statistician
would have in the result.
One canny science reporter,L,ewis Cope, says,
I have my own "rule of two." If someone makes some numerical
claim, I look at the numbers, then see how much I might change the
finding by adding or subvacting two from any of the 5gures. For
example, someone says there ate five cases of cancer in a community:
Would it seem meaningful if there were three?
Or if there were eight cases this year but four the year befotz-a
100 percent increase-I ask myself, "If I add two cases to last year's
total and'subtraa two from this yeat's, is there a chance things haven't
changed, except by chance?" This approach will never supplant neftned
analys;s. But by playing around with the nurnbers this way-I some-
times try three instead of two- a reporter can often spot a potential
problem or error.
A statistician says, "I'his can help with small numbers but
not large ones" Mosteller contributes "a little trick I use a lot on
counts of any size." He explains, "Let's say some political unit
has 10,000 crimes or deaths or accidents this year. Has some-
thing new happened? The minimum standard deviation [see
®
®
a
®
M
M
M
. .;<=_ - ..
k

22 cHAPm 3
page 33] for a number like that is 100-that is, the square root
of the original number. That means the number may vary by a
minimum of 200 every year without even considering, growth,
the business cycle, or any other effect. This will supplement
your ttportet's approach"
Looking for error in reported results, statisticians try to spot
both false positives and false negatives: The folse pontirx (or Type
I or alpha errvr in statistical language you may see) is to find a
result or effect where there is none. The fa1ct negatiue (or Type II
or beta error) is to miss an effect where there is one. The latter is
parvcularly common when thenc are small numbers. 'I'hene are
some very well conducted studies with small numbers, even five
patients, in which the results are so dear-cut that you don't have
to worry about power," says Dr. Relman. You still have to
worry about applicability to a larger population, but you don't
have to doubt that there was an effect. When results are nega-
tive, however, you have to ask, How large would the effect have
to be to be discovered?"
Many scientific and medical studies are underpowered -
that is, they include too few cases.'Whenever you see a negative
result," another scientist says you should ask, What is the
power? What was the chance of finding the result if there was
one?" One study found that an astonishing 70 percent of 71
well-regarded clinical trials that reported no effect had too few
patients to show a 25 percent diSerence in outcome. Half of the
trials could' not have detected a 50 percent difference.'
A statistician scanned an article on colon cancer in a lead-
ing journall "If you read~ the artic3e carefully," he said,,'you: will
~see that if one treatment was better than, the otlier-if it would
increase median survival by 50 percent, from five to seven and a
half years, say-they had only a 60 percent chance of finding it
out. That's little better than tossing a coin!"
The weak power of that study would be expressed numeri-
cal]y as .6; or 60 percent. Scan an article's fine print or foot-
notes, and you willsometimes find such a poux- sratement: Most
THE SCIENriFIC
authors still dc
cially when rea
How largc
lated that a tri~
percent chancf
Sometime
ltind'of cancer
pect that the r
X, you woulc
excess rate to
significance. 'I
suffer a myoci
oral contracep
cent sure of ot
you would ha
Even ~ the
zero numeratc
treated 14: ltu]
lAr dysfunctioi
remains. how
any re ~n.
may be unall
Al] this n
1Nhat's the .cizc
20 individual!
persons woul.
Always try to
The mosthem. When
numbers and
people, or ev<
And imow t
people 5tauricall
one or morc pari
rorrs, or pMyad~
tarm iwcnsr for a

t
THE setENntIc WAY
authors still don't report one, but the practice is growing, espr
cially when results are negative.
H'ow large is a large enough sample? One statistician calcu-
latedthat a trial has to have 50 patients before there is even a 30
percent chance of finding a 50 percent difference in results.
Sometimes large populations indeed are needed'.10 If some
kind of cancer usually strikes 3 people per 2,000, and you sus-
pect that the rate is quadrupled in people exposed to substance
X, you would have to study 4,000 people for the observed
excess rate to have a 95 percent chance of reaching statistical
significance. 'Ihe likdihood that a 30-to-39-year-old woman will
suffer a myocardial infarction,, or heart attack, while taking an
oral contraceptive is about 1 in 18,000 per year. 'I'o be 95 per-
cent sure of observing at least one such event in a one-year trial,
you would have to observe nearly 54,000 women."
Even the lack of an effect-statistically sometimes called a
zero numerator-can be a trap. Say, someone reports, "1Ne have
treated 141eukemic boys for five years with no resulting te:sticu-
lar dysfiunction"-that is, zero abnormalities in 14. The question
remains, how many cases would they have had to treat to have
any real chance of seeing an effect? The probability of an effect
may be small' yet higtily important to know about.
All this means you must often ask, Whai's ymcr dmominntor?
iWJrat's the siza of your pop'ulalinn?' A disease rate of 10 percent in
20 individuals may not mean much. A 10 percent rate in 200
persons wvuld be more impressive. A rate is only a figure.
Always try to get both the numerator and the denominator.
The most important rule of all about any numbers: Ask for
them. When anyone makes an assertion that should include
numbers and fails to give them, when anyone says that most
people, or even X percent, do such and such, you should ask,
And know t}ut to a rtacrocian a populauon dos not nsa.uily mean a group of
pwpie. S~y, a p~ is any, group or mLfecti°n of pertinan units-urun wiih
one or moce perunent char.ctaiwa in aawnoo-pmpk.,evena, objav. reeMda, ew
.cma, or physiblogical values (likr blood prenure readings). Stanstxan allo use the
tertn owavsr 6or a whok group of peopk or unita under nu*
4+
I
®
®
®

M
®
24 cRAFTER3
What mr .yow nonbas? After aIl, some researchers reportedly
announced a new treatment for a disease of chickens by saying,
°33.3 percent were cured, 33.3 percent died, and the other one
got away."
Bias and Con f ounders
One scientist once said that lefties are overreptesented
among basebalI's heavy hitters. He saw this as 'a possible result:
of their hemispheric lateralization, the relative roles of the two
sides of the brain.' A critic who had seen more ball games said
some simpler covariables could explain the difference. When
they swing, left-handed hitters are already on the move toward
first base. And most pitchers are right-handers who throw most
often to right-handed hitters.l'
Scientist A was apparently guilty of bias, meaning the intro-
duction of spurious associations and: error by failing, to consider
other influential factors. The other factors may be called QorhWia-
blcs, uozmialzs, rnterrurung or conhib111ing wnables, aon, found:ng amra-
bks, or confounders. A simpler term may be "other explanations"
Statisticians call bias 'the most serious and pervasive prob-
lem in the interpretation of data from clinical trials" ..."the
central issue of epidemiological rescarch" ..."the most com-
mon cause of unrellable data' Able and conscientious scientists
try to eliminate biases or account for them in some way. But not
everybody who makes a scientific, medical, or environmental
claim is that skilled. Or that honest. Or that all-powerful. Some
biases are unavoidable by the very difficulty of much research,
and the most insidious biases of all, says one statistician, are
"those we don't know exist."
Some biases may be uncovered'by assiduous investigation.
A father noticed that every time one of' his I 1' kids dropped a
piece of bread on the flbor, it landed with~ the buttered~ side up~
"I'his utterly defies the laws of chance," he exclaimed. Close
examination disdosed the cause: The kids were buttering their
bread on both sides.
THE SCIEhTIFU,
I told thi
called about ~
prizes in a cht}iat this could
bought nearly
He had o
tist and repon
factm5?.
Not even
human failing
'I wouldn't h
investigators (
maybesoe)
overr-rosy hue
Other pc
motion and p
scious or unc
bias. Dr. 'Ihc
New Y-^i^ te:
firm, 1 )ic
main statisti,
though not sc
drugs for diz
prrviously pt
acknowledge,
known to tht
In contrr
dru g firnn bi
signed by in(
side board 1e
outcome. `Itt
iJiterzsi in bic
disdbsed so ~
Even a
Johns Hopki~
with prisrns

®
®
®
THE SCIENIYFIC WAY 25
I told this story to one statistician, who said, "I was once
called about a person who had won first, second, and third
prizes in a church lottery. I was asked to assess the probability
that this could have happened. I found out that the winner had
bought nearly all the tiekeu."
He had of course asked the obvious question for both scien-
tist and reporters: Could the rdatranship dcsc7s'btd be orplairud by other
fwto,.~
Not everyone will tell you, of course, for bias is a pervasive
human fairig. As one candid scientist is said to have admitted,
"I wouldn't have seen it if I hadn't believed it" Enthusiastic
investigators often tell us thar findings are exciting. But they
may be so exciting that the investigators paint the results in
over-rosy hues.
Other powerful human dtives-the race for academic pro-
motion and prestige, financial connections -can also create con-
scious or unconscious conflicts of interest or attitudes that feed
bias. Dr. Thomas Chalmers of Mount Sinai Medical Center in
New York tells of a drug trial~ financed' by a pharmaceutical
firm, in which both the head of the study committee and the
main statisticians and analysts were the firm's employees,
though not so identified in any credits. He tells of a study of oral
drugs for diabetes in which the fact that the first author had
previously published 14 artid s on the subjecr, and in 7 had
acknowledged support by the ?~vg manufacturers, was "not
known to the reader"
In contrast, Chalmers describes a study also financed by a
drug firm but with a contract specifying a study protocol de-
signed by independent investigators and monitored' by an out-
side board less likely to be influenced by a desire for a favorable
outcome. 'It is never possible to eliminate" potential conflicts of
interest in biomedical' research, he concludes, but they should be
disclosed so others can evaluate them. "'
Even a genius may be biascd! Horace Freeland Judson of
Johns Hopkins University tells how Isaac Nervton experimented
with prisns and ltnses and developed a theory of color, light,
Is
M
n
iiim
®
i
®
M
w

and the solar spectrum. He did not report seeing some dark
lines-absorption lines, which mark varying wavelengths-that
his instruments must have shown. A modern scientist argues
that I`lewton's theory, not his instruments, had no place for that
evidence: 'To the observing scientist, hypothesis is both friend
and~ enemy'"
For years technicians making blood counts were guided by
textbooks that told them two or more 'properly" studied samples
from the same blood should not vary beyond narrow "'allowable"
limits. Reporte& counts always stayed inside those limits. A
Mayo Clinic statistician rechecked and found that at least two
thirds of the time the discrepancies exceeded the supposed
limits. The technicians had' been seeing what they had been told
to expect and'discounting any differences as mistakes. This also
saved them from~ the additional labor of doing still more count-
ing.
Both the biaced obsenrr and the biared .eubjat are common in
medicine. A researcher who wants to see a treatment result may
see one. A patient may report one out of eagerness to please the
researcher. There is also the powerfiil plaubo ffict. Summarizing
many studies, one scientist found that half the patients with
headaches or seasickness-and a third of those suffering from
coughs, mood changes, anxiery, the common cold, and even the
disabling chest pains of angina pectoris - rrponed relief' from a
"nothing pill."" A placebo is not truly a nothing pill;, the mere
expectation of relief seems to trigger important effects within ~ the
body. But in a carefW study the placebo should not do as well as.
a test medication; otherwise the test medication is no~better than
a placebo.
Sampling bias is the bugaboo of both political polls and medi,
cal i studies. Say you want to know what proportion 1 of the popu-
lace has heart disease, so you stand on a corner and ask people
as they pass. Your sample is biaaed'; if only because it leaves out
those too disabled to get around. Your problem, a statistician
would say, is sefatioa. A politiaal pollster who fails to build a valid
probability sample, easy when questioning only a thousand or
THE SCIEhTIF
so people fror.
A doctor
patienr popul,
average-ma
tion ~ as a who]
treat rrlativel-
the dispropor
cally seek out
Cleveland or
ber of di>bcu]
a$luent and ,
werr valuablt
the samples (
men and woi
An inve
distorting, a
otherwise `th,
in those disc
omits those v
people n
are dn )
they came d<
away, they d
had unfavor
Mostelle
ous anestheti'
hospitals. Urr
dead had be
plained by t:
wound up w
The pre
tected, when
of patients tr
treated conm
compared. I
randomized

THE scIENnFic wAr 27
so people from coast to coast, has equally poor selection."
A doctor in a clinic or hospital with an unrepresentative
patient population-healthier or sicker or richer or poorer than
average-may report results that do not represent the popula-
tion as a whole. Veterans Administration hospitals, for example,
treat relatively few women; their condusions may apply only to
the disproportionate number of lower-income men who typi-
cally seek out the VA hospitals' free care. A celebrated Mayo or
Cleveland or Ochsner clinic sees both a disproportionate num,
ber of difficult cases and' a disproportionate number of patientss
affluent and well enough to travel. The famed Kinsey reports
were valuable revelations of sexual behavior but flawed because
the samples consisted disproportionately of upper middle-class
men and women and of those willing to talk.
An investigator may also introduce bias by comutrainirtg, or
distorting, a sample-by failing to reveal norverporrse or by
otherwise "throwin.g away data.' A surgeon cites his success rate
in those discharged from the hospital after an operation but
omits those who died during or just after the procedure. Many
people drop out of studies-sometunes they just quit-or they
are dropped for various teasons: They could not be evaluated,
they came down with some *irre]evant" disorders, they moved
away, they died. In fact, many of those not counted may have
had unfavorable outcomes had they stayed in the study.
Mosteller tells of a nationwide study of a possibly danger-
ous anesthetic. The investigators n-lied' on autopsy results at 38
hospitals. Unfortunately, only about 60 percent of the relevant
dead had been autopsied, and "anything could have been ex-
plained by the missing 40 percent, so that part of the study
wound up with a handful of nothing"
The presence of significant nonresponse can often be de-
tected, when reading, medical papers, by counting the number
of patients treated! versus the number of untreated or differently
treated controls-patients with whom the treated patients are
compared. If the number of controls is strikingiy greater in a
randomized clinical trial (though not necessarily in an epidemio-
®
®

®
28 GHAP7ER'3
logical or environmental study), there were probably many
dropouts. A well'-conducted study should describe and account
for them. A study that does not may report a favorable treat-
menrresult by ignoring the fate of the dropouts-a confounding
variable.
Age, gender, occupation, nationality, race, income, so-
cioeconomic status, health status, and powerful behaviors like
smoking, are all possible confounding-and frequently ig-
nored-variables. In the 1970s, foes of adding fluoride to city
water pointed to crude cancer mortality rates in two groups of
10~U.S. cities. One group had added'fluoride to water, the other
had not, and from 1950 to 1970 the cancer mortality rate rose
faster in the fluoridated cities. The National Cancer Institute
pointed out that the two groups were not equal: The diference
in cancer deaths was almost entirely explained by differences in
age, race, and sex. The age-, race-, and sex-adjusted di$erence
actually showed a small, unexplained lower mortality rate in the
fluoridated cities:"
If you look carefully at the fate of women taking birth
control pills, you find that advancing age and smoking arr the
two great eonfounders. You must take both into account to find
the greatest clusters of ill effects. Smoking has been an important
confounder in studies of industrial' contaminants like asbestos,,
in which, again,, the smokers suSer a disproportionate number
of ill eSects.1e
A 1947 survey of Chicago lawyers showed that those who
had mere high school diplomas before entering legal training
earned 6.3 percent more, on the average, than college gradu-
ates. The confounder here-the real explanation-was age. In
1947' there were still many older lawyers without college de-
grees, and they were simply older, on the average, and hence
more established."
Occupational studies often confront another seeming para-
dox: The workers exposed to some possible adverse effect turn
out to be healthier than a control group of persons without suchh
exposure. The confounder: the well-known henllhy-uer,Ies effect:
®
®
THE SCIEL'TIFIC.
Workcrs tend t~
in gencrall
Some stu
increase in cas
gens. It took a
They commo;
were emitted.
seratr,rficd, or br
Such findings
genetics, whei
ing or ruling
blcs - are om
put blacks in ,
ent'rMriabl¢ th
"Inatw
plains, "one
which affects
that more pe(
seen as the t
incide, ) t
of cours-, so
stantly expos
than others. I
the black wo
one indepenc
portant undt
may be that
each other,
coworkers, t
cold weather
dry.-ing nasal
viruses.
The sea
pursuits of t
physician wl
any student

®
0
THE SCIENIIFIC WAY
Workers tend to be healthier and live longer than the population
in general.
Some studies of workers in steel mills showed no overall
increase in cancer, despite possible exposures to various carcino-
gens. It took a look at black workers albne to find excess cancen
They commonly worked~ at the coke ovens, where carcinogens
were emitted. This was a case where the population had to be
stbatifug or broken up in some meaningful way, to find the facts.
Such findings in blacks often may be falsely ascribed to race or
genetics, when the real or at least the most important contribut-
ing or ruling variables-to a statistician, the indepnudent raricr-
bles-are occupation and the social and economic plights ttiat
put blacks in vulnerable settings. The excess cancer is the depmd-
ent aanahle the result.
"In a two-variable tdationship," Dr. Gary Friedman ex-
plains, "one is usually considered the independent variable,
which affects the other or dependent variable.''O Take the fact
that more people get colds in winter. Here weather is commonly
seen as the underlying, or independent variable, which affects
incidence of the commoncold, the dependent variable. Actually,
of course, some people, like children in school who are con-
stantly exposed to new viruses, are more vulnerable to colds
than others. In the case of these children, then, as in the case of
the black workers at the coke ovens, there is often more than
one independent variable. Also, some people think that an im-
portant underlying reason for the prevalence of colds in winter
may be that children are congregated in school, giving colds to
each other, thence to their families, thence to their families'
coworkers, thence to the coworkers' families, and so on. But
cold weather-and home heating?'-may still figure, perhaps by
dzw,; nasal passages and making them mote vulnerable to
viruses.
The search for tsw rmrabla is obviously one of the main
pursuits of the epidemiologist; or disease detective-or of any
physician who wants to know what has affected a patient, or of
any student of society who seeks true causes. Like colds, many
®
0
e

cHAYrzR3
medical conditions, such as heart disease, cancers and probably
mental illness, have multiple contributing factors. Where many
knowns measurable factors are involved, statisticians can use
mathematical teclutiques-the terms you willisee include malteplc
regrrssion, rnaltivariatc analysit and discriminmtt analysis and fnctof
cGcrter, path, and twa stc~c ldzrt-squarrs mial ysv - to relate all the
variables and try to find which are the truly important predic-
tors. Yet, some situations, like the striking decline in U.S. heart
disease monality in recent years, defy such analyses. These
years have seen several major changes in American life that
may play a role: less smoking among men, consumption of a
leaner diet, more tea,eational exercise (though more sedentary
work). Medical care is far better, including the treatment of
hypertension, which disposes people to heart disease. Many of
these variables cannot be well measuredi and' the effect of' some
is debatable, so-a common situation in science-the truth re-
mains uncertain.
variabzti y
Doctors always say, 'Most things are better in the morning,"
and they're mostly, right. Most chronic or recurring conditions
wax and wane. We tend to wake up at night when the condition
is at its worst. Then, no matter what is done by way of treat-
ment the next day, the odds are that we'll feel better.
This is regression towmd t1u moan: the tendency of aIl values in
every field of science-physical, biological, social, and eco-
nomic-to move toward the average. Tall! fathers tend to have
shorter sons, and short fathers, taller sons. The students who get
the highest grades on, an exam tend to get, somewhat, lower ones
the next time. The regression effect is common to all repeated
measurements.
Regression is part of an even more basic phenomenom
raarintinn, or aoiability. Virtually everything that is measured var-
ies from measurement to measurement. When, repeated, every
experiment has at least slightly different results. Take a patient's
THE Sc[EM1'77FIC
blood pressurE
row, and the r
different times
vary gready.
The impo
also measuren
and observer
doctors will re
be gnossly diff(
.
heart mutine:
hearing to det
one time to th
cancer resean
usual'rcgulari
too well and t}
enough varial
Biological
physiology ar
tients, T-act
di$er i_ jr
lations, and-
within the sa
Every pK
each with m:
such as heigh
and-if we \
tion-we mu
We can't get
need singie \
Enter ht
nYdian, and r
some idea o'
properties, o'
When n
maan or cn11u
number of v

®
®
THE SCIENTIFIC WAY 31
blood pressure, pulse rate, or blood count several times in a
row, and the readings will be somewhat different. Take them at
different times of day or on different days, and the readings may
vary greatly.
The important: tr.asons? In part, fluctuating physiology, but
also measurement errors, the limits of measurement aauracy,
and observer variation. Exarnining the same patient, no two
doctors wi1I' report exactly the same results, and the results may
be grossly different. If six doctors examine a patient with a faint
heart murmer, only one or two may have the skill or keen
hearing to detect it. Eicpcrimental results so typically di$er from
one time to the next that scientific and medical fakers -a Boston
cancer researcher, for ezampie-have been detected by the un-
usual rcgulariry of their reported results, with numbers agreeing
too well and the same results appearing time after time, with not
enough variation from patient to patient.
Biolqgical umiation is the most important cause of variation in
physiology and medicine. Different patients, and the same pa-
tients, react differently to the same treatment. Disease rates
di$er in diferent parts of the country and among different popu-
lations, and-alas, nothing is simple-there is natural variation
within the same population.
Every population, after all, is a collection of individuals,
each with many charat.~teristirs. Each characteristic, or nana6le,
such as height, has a dirtrihrtion of values from person to person,
and-if we would know something about the whole popula-
tion -we must have some handy summaries of the distribution.
We can't get much out of a list of 10,000 measurements, so we
need singie values that summarize many measurements.
Enter here the familiar awqe or, more exactly, the rneme,
madicn, and mode. These and a few other measures can give us
some idea of the look of the whole and its many measurable
properties, or parameurs.
When most of us speak of an average, we mean simply the
mami, or milMnetic arxm~r, the sum of all the values divided by the
number of values. The mean is no mean tool; it is a good way
©
®
®
®
®
0

32 CHAPTER J
to get a typical number, but it has limitations, especially when
there are some extreme values. There is said to be a memorial
in a Siberian town to a fictitious Count Smerdlovski, the world's
champion at Russian roulette. On the average he won, but his
actual record was 73 and 1.1'
If you look at the average salary in a hospital, you wo not
know that half the personnel i may be working for the minimum
wage, whilc a few hundred persons make $100,000 or more a
year. You may learn more here from the median, the figure that
divides a population into two equal halves. The median can be
of value when a group has a few members with extreme values,
like the 400-pounder at an obesity clinic whose other patients
weigh from 180 to 200 pounds. If he leaves, the patients' mean
weight might drop by 10 pounds, but the median might drop
just l pound.11
The most frequently occurring number or value in a distri-
bution is called the modc. Wheni the median and the mode are
about the same, or even more when means median, and mode
are roughly equal, you can feel comfortablt about knowing the
typical value.
You still' need to know something about the exceptions, in
short, the disper.rion (or spread or scatter) of the entire disuibu-
tion. One measure of spread is the range. It tells you the lowest
and highest values. It might inform you,, for example, that the
salaries in that hospital range from f1'0,000 to =250,000.
You can also divide your values into 100:percerrtars so you
can say someone or something fall5 into the 10th or 71st per-
centile, or into quartitrs (fourths) or quirntilrs (fifths). One useful
measure is the interqumtile range, the interval between the 75th
and 25th percentiles-this is the distribution~ in the middle,
which avoids the extreme values at each end. Or you can divide
a distribution into n+bgroupr-those with incomes from s10,000
to $20,000, for example, or ages 20 to 29, 30 to 39; and' so on:
Al1 these values can easily be plotted. With many of the
things that scientists, economists, or others measure-1Qs, for
example, and other test scores-we typically tend to see a famil~
THE SCtEhTIFlC
iary bel]Lshaped
end, or taif. TY
19th-century C
But you may i
clusters, a Gime
A widely i
great deal. No
tance from the
range, this har
how spread ou
In what one st
in most sets c
being measum
average by m
more than 2 <
than 2.57 star
"Once yo
shaped distrib
the whole pict
cvrve wl
variatik ht
the more sprc
"Dhrni, nM
d.-prncling on tlir
diBrrvntts bn.ac:
numbrr ol .quares
of e pnpulriKm r.u
tc.uh A, in
Sgimi,ima. ~r~
i man. Ixin~ ~ an ~u.
uardm.e~rmn~iK nc
AuItFu»Ln
N0
N
C.1)
CJ1
~
~
CO
N

®
®
THE SCIENTIFIC WAY 33
iar, bell-shaped ; rwnnal distributorq high in the middle, low at each
end, or 1ail. This is the classic CCouuian currx, named after the
19th-century German mathematician Karl Friedrich Gauss.
But you may also find that the plot has two or more peaks or
dusters, a 6imodal or multimoda!'dirhibution.
A widely used number, the stmtdard dcviation, can reveal a
great deal. No matter how it sounds, it is not the average dis-
tance from the mean but a more complex figure. ` Unlike the
range, this handy figure takes full account of every value to tell
how spread out things are-how dispersed the measurements.
In what one statistician calls a truly remarkable generalization,
in most sets of measurement "and without regard; to what is
being measured" only I measurement in 3 will deviate fiom the
average by more than 1 standard deviation, only 11 in 2& by
more than 2' standard deviations, and only 1 in 100 by more
than 2.57 standard deviations.
"Once you know the standard deviation in a normal, bell-
shaped disu ibution,, according to Thomas Louis, 'you can draw
the whole picture of the data. You can visualize the shape of the
curve without even drawing the picture, since the larger the
variation of the numbers, the larger the standard deviation and
the more spread out the curve -and vice versa.n
'Tlcrv i.s nrrn than mw way tu cakulLic it, and thcrv are avrrd vanatNxu,
d'ependint; on the statiwwian:a hurpa: A uwnmun wK ib to aJcJ the squates of the
di/lercnces betw.xn each number uxd the mean, then divide that number by the totat
number ot squerts, otten rekinil io aa the am>axr (minus I if vou're toohing at a sampk
ot a population rather than the whok population). Then cakulate the squarr eoa ot the
n-wL. A., in
Snma-timc, vatisuiciana cakulate the uaW"vd'druimmn of 1M nWn-this because ttx
mcan, hring an a%vraW; is ks. .aria6lc than cinyl. nnawnrnwm.. 5,rtk call thts tM
ilmd®d nror cK aandmd'efw o, tM mew A. In,
All ttn- jIx;vcarc nwanun. 14 di.lwrnun..
2N
W
~
N
~
(b
cla
®

This is the part I always hate.
Sit down before fact as a little child, be prepared to give up every prernnceived
notion, fbllow humbiy wherever and'to whatever abysses nature leads, or you
shaD learn nothing.
4
-Jotin Hunter
1lkA-aawy BrioiA aawniu
-Thomas Henry Huxley.
THERE is no disease that strikes older people more tragi-
cally than Alzheimet's disease, which makes a useless tangle of
the brain. At a prestigious New England university a researrh
team imaginatively inserted catheters into the skulls of four pa-
tients aged 64 to 73 to deliver a continuous infusion of either a
theoretically promisimg drug or, altrmately; an ineffectual saline
solution for comparison.
After 18 months the investigators published a paper saying
that according to observations by the patients' families, three
patients showed marked improvement and the fourth at least
held his own. Favi*+Ating, of course. Some reporters learned of
the work and began inquiring. The investigators let a'I'W crew
do a story and also held' a news conference, with one patient

Example: If the average score of all students who take the
SAT college entrance test is relatively low and the spread-tlie
standard deviation-relatively large, this creates a very long-
tailed, low-humped curve of test scores, ranging, say, fromm
around 300 to 1500. But if the average score of a group of
brighter students entering an elite college is highs the standard'
deviation of the scores will be less and the curve will' be high-
humped and' short-tailed, going from maybe 900 to 1500.
"If I just told you the means of two such distributions, you
might say they were the same," another scientist says. "But if I
reported the means and the standard deviations, you'd know
theyy were different, with a lot more variations in one"
From~ a human standpoint, variation tells us that it takes
more than averages to describe individualk: Biologist Stephen
Jay, Gou1d learned in 1982 that he had a serious form of cancer.
The literature told him the median survival was only eight
months after discovery. Three years later he wrote in Discmxr,
"All evolutionary biologists know that means and medians are
the abstractions," while variation is "the reality," meaning "half
the people will live longer" than eight months.
Since he was young, since his disease had been diagnosed
early, and since he would'~ neceive the best possiblt treatment, he
decided he had a good chance of being at the far end of the
curve. He calculated that the curve must be skewed well to the
right, as the leh half of the distribution hadto be "scrunclied up
between zero and eight months, but the upper right half [could]
extend out for years." He conduded, "I saw no reason why I
shouldn't be in that small tail.... I would have time to think,
to plgn and to fight." Also, since he was being placed on an
experimental new treatment, he might if fortune smiled "be in
the first cohort of a new distribution with . . . a right tail ex-
tending to death by natural causes at advanced old age.'"
Statistics cannot tell us whether fortune will smile, only that
such reasoning is sound.
f
Studie:
Good (
Why think? Why
Sit down befors fa
notion; follow hur
shatl kam nothinc
This is the part Ii
T HERE is
cally than ATzI
the brain. At ,
team imaginat
tients aged 64
theoretically p:
solution for o0
After 18'r
that accordinf
patients showr
held his own.
the work and '.
do a story an

cttAPTER4
brought forth for on-camera testimonials. Except for some
newspapers that decided to print nothing, the story flew far and
wide.
The head' investigator, a chief resident in neurosurgery,
cautioned that the results, though encouraging, were 'very
early" and "certainly do not prove this is an effectiwe treat7rtent"
He advised healthy skepticism. But headlines unequivocally
read: "Alzheimer's Test Found~ Successfiil," "Alz}ieimer's:. A New
Pmmise,° "First Breakthrough Against Alzheimer's;:' "Pump Of-
fers Hope,' 'Possible Alzheiinet's Cure' Within two months the medical center logged 2,600 phone
calls, mainly from desperate families, and critics began asking
why a press conference had been held, since a study of only four
patients-with unblinded investigators getting their assessments
from hopeful families-meant; little.
Harvard's Dr. Jay Winsten conduded t}iat 't}ie decision to
hold a press conference ... far outweighed in impact the mod-
ulating effect of the investigators' qualif}ling language. The vis-
ual impact of [one] patient's on-camera testimonials all but
guaranteed that TV coverage would oversell the researchs de-
spite any qualifying language""
When dubious daims are made - about Alzheimer's, a new
cancer drug, a possible AIDS cure-and'the daims get widely
reported, there is commonly a lot of postmortem ducking and
soulLsearrhung among reporters and'editors. Then someone else
makes some sensational' clairrt, and the same thing may happen
all over again.
The biggest error in mediaaJ science, according to Dr.
Thomas Chalmers, is "the uncontrolled pilot study in which the
investigators try a treatment on 10 patients, and if it seems to
work ... are tempted'to report~ it" to fellow scientists, let alone
the media.'
All science is only a stab at the truth. Even with the best of
statistics, "We scientists don't know how to tell the whole tiuth,"
Mosteller reminds us.' Outside this honest limitation lie vast
realms of inadequate science with plausible-sounding yet shaky,
®
UP111=11CMI
STUDIES. GOOD A'
statistics. A Fren~
said 150 years ac,,
the numerical m,
time than the tr
often give it. `Sa
every idiot in th(
program thinks
The big pi
have little to do
do with judgme
to conduct it, tt
fnenzied media
many chanoes '
calls for sophist
hope of' telling
repon?
A fundarr
ducted study k
indude rs
and to L J e
methods and
th'is kind of a9N
This is n,
there is much
of numbers a
EXpaiM
.
Student,
credit-rated,
what has be
studies carr)
Science
as generaiiz
tured into
science. Ob

STUDIES, GOOD AND BAD 37
statistics. A French physician, Pierre Charles Alexandre Louis,
said 150 years ago, "I'he only reproach which can be made to
the numerical method" is that it 4rtquires much more labor and
time than the most distinguished members of our profession"
often give it. `Some days," says one modern statisticians "I; think
every idiot in the country who can put his hands on a computer
program thinks he's a statistician"
The big problems of statistics, say its best practitioners,
have little to do with computations and formulas. They have to
do with judgment, we're told, with how to design a study, how
to conduct it, then analyze and' interpret the results. In a day of
frenzied media competition for the public's eye and ear-and
many chances to do harm~ by shaky reporting-journalism too
calls for sophisticated judgment. How, then, can we have some
hope of telling which studies seem credible, which we should
report?
A fundamental principle is that every conscientiously con-
ducted study has a careful design: a method or plan of attack to
include the right kind and number of patients or petri dishes
and to try to eliminate bias. Dif'erent problems require different
methods, and one of the most basic questions in science is, Can
tlus kind of apxrirnent thir dest~n, yield the ansuxrl
'I'his is not a simple question for a reporter to answer, but
tliere is much we can know. What kinds of studies, what kinds
of numbers and controls and methods, should we look for?
Experim.ents versus Seductive Anecdotes
Students and eggs can be graded, citizens and cities can be
credit-rated, and scientific evidence can be weighed according to
what has been called a hierarchy of evidence. Some kinds of
studies carry little weight, some more, some a grean deal.
Science and medicine started witli mucdola, unreliable as far
as generalization is concerned, yet provocative. Anecdotes ma-
tured~ into systematic o6scruatiors, the most ancient form of
science. Observation told the ancients much about the stars, it
®
®
®
®
M

told~ the pharaohs' physicians much about the sick, and it is still
important, for simple "'eyeballing'' has developed into deta collex-
tion and the recording of case hirforics. These are respectable, yea,
indispensable methods yet still only one part of science. Case
histories may not be typical, or they may reflect the beholder.
Medicine continues to be p1agued by Big Authorities who insist
"T know what I see"
There can be useful, even inspired, observation and analy-
sis of natural cperiments: Excess fluoride in some waters hardened
teeth, and this observation led to fluorid'ation of drinking water
to prevent tooth decay. There are also man's inadvertent experi-
ments, disastrous and benign, to be studied. Hiroshima trig-
gered wide analysis of the effects of nuclear radiation, invaluable
yet frustrating because there were no good~ measures of exposure
levels, a gap that has caused confusion and controversy ever
since.
In 1585 or so, Galileo dropped those weights from a tower
and'he]ped invent the aeiintifu aperiment; a study in which the
experimenter aonvvls the conditions-controlled conditions are
the heart of the experimental method-and records the efect.
Experiments on objects, animals, germs, and people matured
into the modem aprrimenlal study,, in~ which the experimenter
typically changes only one or some other planned number of
variables to see the outcome.
Clinical Trials
The experimentali method is the essence of experimental
medicine's current "gold standard":' the controUed, randdmized clini-
cal trial: At its best, the investigator tests a treatment or drug or
some other intervention by randomly sdeeting at least two com-
parable groups, the ezpe,irnental group that is tested or treated and
a control group that is observed' for comparison.
True clinical trials are expensive and difficult. It has been
estimate& that of 100 scheduled trials, 60 are abandoned; not
,
SCUDIFS,, GOOD ~
implemented, o
culty in rernri6
lems, or, some
(making contirr
group unethica
sults, and jµst :
theless are callc
to evaluate m
Randomized c
heart attadc de
atrokes, and th
No doctor~ ob!
shown these ti
Types of
Among
similar group~
no treaunent.
In cia ssr
ments in suco,
~
contro&,'
observL
treatmen.. Tl
outcome of tz
between stud
become mor
health-cons6
patients in z
studies eithe,
cholesterol a
some of the
fewer fats-;
Invcst
son with ol
percent, sa~
uienul oonbr

implemented, or not completed, whether for lack of funds, diffi-
culty in, recruiting or keeping patients, toxicity or other prob-
lems, or, sometimes, rapid evidtnce of a difference in effect
(making continued denial of effective treatment to a control
group unethical). Another 20 trials produce no noteworthy re-
suits, and just 20, results worth publishing. Clinical trials none-
theless are called the strongest, most, precise, most decisive way
to evaluate medical interventions and learn true causation.
Randotnized clinical' trials proved that new drugs could cut the
heart attack death rate, that treating hypertension could prevent
strokes, and that polio, measles, and hepatitis vaccines worked.
No doctor, observing a limited number of patients, could have
shown these things.
Types of clinical studies include the following:
Among the most reliable are prrra!!d stLdia comparing
similar groups given different treatments, or a treatment versus
no treatment. But such studies are not always possible.
In onssover scudus the same patients get two or more treat-
ments in succession and act as their own controls. Similariy, scl, f-controlW studiu evaluate an
experimental treatment by control
observations during periods of no treatment or of some standard
treatment. There are pitfalls here. Treatment A might affect the
outcome of treatment B, despite the usual use of a uaashout prriad
between study periods. Patients become acclimated: They may
become more tolerant of pain or side effects or, now more
health-conscious, may change their ways. The controls-the
patients in a control group-don't always behave in parallel
studies either: In one large-scale trial of methods to lower blood-
cholesteroL and risk of heart disease, many controls adopted
some of the same methods-quitting cigarette smoking, eating
fewer fats-and reduced their risk too.
Investigators often use hirtoncal condvAs (meaning compari-
son with old records: historically the cure rate has been 30
percent, say, and the new therapy cures 60 percent) or other
exle.nal contw(r (such as comparison with other studies). These

Q, cHArrER4
controls art often misleading-the groups compared are fre-
quently not comparable, the treatments may have been given
by different methods-but they are still at times useful.
What Makes a Study Honest?
Obviously all studies, including the best, have potential
pitfalls:
Lack of'adcquatc controls is fatal if you really want to put the
results in the bank.
The group or smnph studicd, 10 people or 10;000; must be
lasgr enough to get a valid result and repse,sentatirx enough to
apply to a larger population. Because people vary so widely in
their reactions, and a few patients can fool you, fair-sized groups
of patients are usually neededl And enough of the right kind of
subjects arc needed for a suitable sample. Picking patients for a
medicali study is no different from picking citizens to be ques-
tione& in a political poll. In, both, a sample i's studied, and
inferencrs-the outcome of an election, the results in patients in
general-are made for a larger population.
To get a large enough sample, medicall researchers more
and more try to conduct m+iltrcenter triaLs, which are appealing
because they can include hundreds of patients, but expensive
and tricky because one must, try to maintain similar patient
selection and quality control at 10'or 100 institutions. Suceessful
multicenter trials established the value of controlling hyperten-
sion to prevent strokes. They demonstrated the strong probabil-
ity that less extensive surgery is as effective as more drastic
surgery for many breast cancers.
The smnfiili should be rmidomizcd-divided by some random
method into comparable experimental and control groups. Ran-
domization can easily be violated. A doctor assigning patients to
treatment A or B may, seeing a particular type of patient, say or
think, "I'his patient will be better on B."
If treatment B has been established as better than A, there
should be no random study in the fusr: place and certainly no
I
STUDIES, GOOI
study of that c
'the trial's gua
one critique. E
are often assit
puter-generat(
Tocomb
and get answf
study popul,~
groups by ag,
stratify can h
tampons in t
cases were bi~
The ide
can be trick,
mav fail to s<
But some p
stronger pat
treated imm
We repc
or di!
nomia
major newsl
parity with i
vantaged gr
page did ti
older peopl
incomes be'
are still ma
'To cn
blirrded-to
bl;nded, so I
a treatmen
know whet
ineffective 1
a good~ res
There is i

S'IUDIFS, GOOD AND BAD 41
study of that doctor's patient. When randomization is violated,.
"the triaPs guarantee of lack of bias goes down the drain," says
one critique. As a result, patients who consent to randomization
are often assigned to study groups according to a list of com-
puter-generated random numbers.
Ta combal'bias-the influence of confounding variables-
and get answers applicable to various populations, the sample or
study population must often be siratfW ' or separated into
groups by age, sex, socioeconomic status, and so on. Failure to
stratify can hide true associations. The role of high-absorbency
tampons in toxic shock syndrome was darified only when the
cases were broken down by precise type of tampon used,
The identification of important subcategories of patients
can be tricky indeed. A study of open-heart surgery patients
may fail to separate out those who had to wait for their surgery
But some patients die waiting, and those left are relatively
stronger patients who do better, on the average, than those
treated immediately after diagnosis.
Wt reporters may also fail to pay attention- to stratification,
or distribution. In early 1985 the Ptesident's Council of Eco-
nomic Advisers reported that-to quote the page-one lead in a
major newspaper-"elderlyArnericans have achieved economic
parity with the rest of the population and no longer are a disad-
vantaged group" Not for several1 paragraphs, now on an inside
page, did the story note that "there's a lot of variability;' and
older people are also 'nore l"ikely ... to have members with
incomes below the average of their age group "' In short, there
are still many elderly trapped in poverty.
To cnmbal bias in inuxstigators or patiertts, studies should be
blinded- to the extent feasible, sing(e-, double-, or, best of all, triple-
blindid, so that neither the doctors nor the nurses administering
a treatment nor the patients nor those who assess the results
know whether today's pill is treatment A, treatment B, or an
imeffective placebo: Otherwise, a doctor or patient who yearns for
a good result may see or feel one when the `right" drug is given:
There is a tale of an overualous receptionist who, knowing

42' CIiAFI'ER Z
which patients were getting the rr.al drug and not the placebo;
was so encrwraging,to these patients that they began saying'they
felt good, wiIly-nilly.'
Barring observant receptionists, the use of a plaeebfl-from
the Latin meaning "I shall1 please"-may help maintain blind-
ness. Placebos actually give some relief in a third of all patients,
on the average, in various conditions. The effect is usually tem-
porary, howevu, and a tnily effective drug ought to work sub-
stantially better tltan~ the placebo.
Blinding is often impossible or unwise. Some treatments
don't lend themselves to it, and some drugs quickly trveal i themr
selves by various effects. But an unblinded test is a weaker test.
Finally, what makes a study honest is honesty, John Bailar
warns of deliberate or careless deceptions that seem to be uni'-
versally accepted today, practices that sometimes have much
value but at other times are "inappropriate and improper and,
to the extent that they are deceptive, unethical." Among them:
the selective reporting, of findings, leaving out some that might
not fit the conclusion; the reporting of a single study in multiple
fragments, when the whole might not sound so good; and' the
failure to report the low power of some studies, their inability to
detect a result even if one existed'.'
Dr. Charles Moead of the Mayo Clinic says,
Probably the majority of cancer patients treated with chemotherapy
today art receiving regimens that have not been proved e$ective by
randomized trial! ... Many artides,publishe.d in our major journals
make claims for fantastic therapeutic accomplishments with no ran-
domiz,ed'contralk. ... Many, if not most, of the randomized'studies .
.. are of such poor quality that their Izsiilts are unbelievable....,
Ptrcious few have withstood the sautiny of carefully designed
confirmatory scientific study.
He calls a multitude of poor methods statistical legerde-
main: 'tfie games we play, trying to squeeze out that little bit of
breakthrough" Why the pressure to play them? 'Salvation," Dr.
Si'SJDIFS. C',OOD.
David Salsburf;
prestige, invitr
references in t!
Epi&emiok
Clinical s
populations, v
demiology set
a population
ra!' innertigatior
Epidemi,
ies-aome sn
same pitfallss
the right ans<
goes, an epic
sex.
Epideni
epidemics of
miolo,r" .to
we liv X,
the heaatnies
the first en
healthier to
today's enviu
may' have b
he might he
wealthier ar
In 174
ess b
y
succ
, don's c}tinu
Ij to: soot-br
rette. A oe
I
cases on a
drinking w
Street pun,

/
David Salsburg answers. "Ftuit in this world (increases in salary,
prestige, invitations to speak) and beyond this life (continual
references in the citation index) "'
Epidentiolo,~y:.. I~ippocrates to AIDS
Glinical studies deal with patients. Epidemiology deals with
populations, which sometimes are large groups of patients. Epi-
demiology seeks the causes of both health and disease by placing
a population under its own kind of microscope, the epidenu'oCqgi-
cal irsvGrtigation.
Epidemiological studies in many ways parallel' c]inieal stud-
ies-some studies are both-and are subject to many of the
same pitfalls and rules, like avoiding bias and stratifying to get
the right answers about the right subgroups. An old'saw, in fact,
goes, an epidemiologist is a physician broken down by age and
sex.
Epidemiology in its early days was concerned wholly with
epidemics of typhoid, smallpox, and other infections. But epide-
miolbgists today also ask, "What should we eat and how should
we live to stay healthy?" and they study large groups to see how
the healthiest and unhealthiest live. Hippocrates has been called
the first environmentalist because he observed that it was
healthier to live in high places than in low ones. Anticipating
today's environmentalists, he blamed bad air and bad water and
may have been partly right. But he failed to stratify; otherwise
he might have noticed that the people who lived' high were also
wealthier and better nourished than those who lived low.'
In 1740 Percival Pott scorrd a famous epidemiological
success by observing the high rate of scrotum cancer in Lon-
don's chimney sweeps and correctly blaming it on their exposure
to soot-burned organic material, much like a smoked ciga-
rette. A century later, John Snow, plotting London cholera
cases on a map and noting a duster around one source of
drinking water, removed the handle from the now famed Broad'
Street pump and helped end a deadly epidemic. The 19th-

STUDIFS, GOOD ,
century French advocate of statistical methods, Pierre Louis,
observed hospital patients and helped stop the use of bleeding as
a treatment. Ignaz Semrnelweis showed that doctors' dirty
hands t.ransmitted deadly childbed fever to mothers.
Modem epidemiologists successfully indicted smoking as a
cause of lung cancer and heart disease and identified the associa-
tion of fats and cholesterol with dogging of the arteries. They
evaluate vaccines, assess new methods of health care delivery,
and track down the causes of new scourges like AIDS, toxic
shock syndrome, and Legionnaires' disease, all by several
methods. AIl are valuable. All are fuIl of traps.
Epidemiology, like all of' science, started with obsmatianal
studies, and these remain important. They are weak and uncer
tain, we have noted, when it comes to determining cause and
effecr. Yet observation is how we firsr learned of the unfortunate
effects of toxic rain,, Agent Orange, cigarette smoking, and
many sometimes helpful, sometimes harmful; medications-and
of certain sexual! practices and addicts' use of dirty needles on
AIDS.
Some observational studies are simply drseriptirx-describ-
ing the incidence, prevalence, and mortality rates of various
diseases, for example. Other, analytu studies seek to analyze or
explain: the Seven-Country Study, for example, that helped
associate high meat and dairy fat and cholesterol consumption
with excess risk of coronary heart disease. Ecological studies look
for links between environmental conditions and illness. Human
migrations-like that of the Japanese who come to the United
States, eat more fat, and get~ more disease than they did in
Japan-are among valuable natural erfxrzmrnts.
The simplest observational measurement is a count. Samn-
ph'rg is just a more sophisticated kind of count. You can't count
or ques6on everybody, so you seek a sample that~ represents the
whole. Many epidemiological sunxys rely on samples-among
thems government surveys of health and nutritional habits.
Samples and surveys often use guestionnaisa to get information.
A sample or survey is never more than a snapshot of the
scene at the mo
unless fiequentl
than the q,ualin
compared patie
with those their
altnost half of eh
of a year. And
people tend to
often say both }
A survey may
get accurate in
Epidemi
control studirs, or
or crosssational'
look at the ratf
effects by age,
study: A cross
few days.
A o vnt
a disea }]:
examineo gro>
drorne, mairtl
case Control I
tients, or cases
their families
ries that cover
group is then
comp- ,lrrmup,
and other c1ii
The resu
the case-cont
tively eacy' Ic
semble clues
may test sorr
use of tampo O
as ttie main ~
N
W
CJl
~
N
~
~
~A

STUDIES, GOOD AND BAD 45
scene at the moment; it can't portray an ever-changing picture
unless frequently repea.te& Questionnaires may be no better
than the quality of the answers, written or verbal. One survey
compared patients' reporting of their current chronic illnesses
with those their doctors recorded. The patients failed to mention
almost half of the conditions the doctors detected over the course
of a'year. And whether it comes to illness, diets, or drinking,
people tend to put themselves in the best possible light. They
often say both yes and no to the same question in different form.
A survey may stand or fall on the use of sophisticated ways to
get accurate information.
Epidemiologists' studies may also be p.er.vlencr studies, uisr
conbol stLdicr, or cohort studtes. A prraalence study; also called a ewmit
or cran-sartimral study is a wide-angle snapshot of a population: a
look at the rate of disease X or at toxic agent X and its possible
effects by age, sex, or other variables. A political poll is such a
study: A cross section of the nation is examined in a period of a
few days.
A carr-corebvd study examines caus and contrvlr for a close-up of
a disease's relationship to other factors in a smaU, intensively
examined group. The nation hears of cases of toxic shock syn-
drome, mainly in young women. The federal Centers for Dis-
ease Control launches a J,icld in~n to find a series of pa-
tients, or rasa, confirm the diagnosis, then interview tbern and
their families and other contacts to assemble careful case histo-
ries that cover, hopefully, all possible causes or associations. This
group is then compared with a randomly selected; but matched
com,bar group; or control group, of healthy young women of like age
and other characteristics.
The results need to be interpreted with great caution, but
the case-control study is often a quick, highly useful and rela-
tively easy, low-cost first approach or fishing expedition to as-
semble dues about causes or even a working hypothesis. Or it
may test some hypothesis. A case-cantrol' study pinpointed the
use of tampons (later found: to be certain high-absorbency ones))
as the main villain in toxic shock. The relationship of cigarette
©
a
®
iu
®
®

smoking to lung cancer, the association of birth control pills withh
blood vessel problems, and the transmission ~ patterns of AIDS
were identified~ in case-control studies that pointed to the need
for broader investigation.
f'.dimff or incidencr stud:is are motion pictures. They pick a
group of people, or cohort -a cohon was a unit of a Roman
legion-oken stratify or divide them into subgroups, then follow
them over time, often for years,, to see how some disease or
diseases develop. These studies are costly and difficult. Sutbjects
drop out or disappear. Large numbers must be studied to we
rare events. But cohort studies can be powerful instruments and
substitutes for randomized' experiments that would' be ethically
impossible. You can't ethically expose a group to an agent that
you suspect would cause a disease. You can watch a group so
eacposed.
The noted Framingharn study of ways off life that might be
associated with developing heart disease has followed more than
5,000 residents of that Massachusetts town since 1948. The
American Cancer Societ/s 1952-55 study of 187,783 men aged
50 to 69, with 11,780 of them dying during that period, did
much to establish that cigarette smoking was strongly associated
with developing lung cancer.1O'
Many epidemiological, as well as clinical, studies are
handicapped because they must be retrorpectirac. T}iey lbok back
in time-at medical records, vital'statistics, or people's recollec-
tions (for example, those collected in interviews in a case-control
study). People who have a disease are questioned to try to find
common habits or exposures. Women with cervical cancer are
interviewed to see how many took possibly guilty hormones and
how many did' not. People who live around a Love Canal are
asked if they have been ill.
Retrospective studies are notoriously unreliable. Memories
fail or play tricks. Old records are poor and misleading. Defini-
tions of diseases and methods of diagnosis vary sharply over the
years. The patients you find may not be representative. A retro-
spective study, however intriguing, generally only says that
there may be something here that ought to be investigated.
STUDIES, GOOD f
(There are excel,
tive study can I
lected!in the pw
was a retrosper'
pA p%rprcr
the American C
sharply on a se
statistical and r
ford tells how fc
the accuracy e:
adequate prosF
ward looks we:
Epidemi
experiments of
cally inLmeLtio,
tion; somethir
The mas!
Salk polio vac
trial' too j with
to ~ eithr va
placebc
divided betwt~
first- and thiparuetpating
counted all a
those who h.
In the placel
the vaccinatc
subjects late7
shot."
Anothea
tablished' th,
tooth decay.
not. Blindin
tal caries th:&
cebo effect.

('I'here are exceptions. Dr. Gary Friedman writes, "A retrospec-
tive study can be quite reliable if based on data caiefWly co1-
lected in the past. A revealing study of mortality in radiologists
was a retrospective cohort study based on good data")
* A pmpaarx sdudy, in contrast-like the Framingham and
the American Cancer Society studies-looks forward. It focuses
sharply on a selected group who are all followed by the same
statistical and medical techniques. Dr. Eugene Robin at Stan-
ford tells how four separate retrospective clinical studies affirmed
the accuracy of a test for blood dots in the lungs. When an
adequate prospective clinical trial was done, most of the back-
ward looks were proved' wrong."
Epidemiology also includes arperirr+rnlal rtudies; the dassical
experiments of science on a larger human scale. These are typi-
cally tntcruentwn studia. Zhere is some intervention or manipula-
tion; something is done to some of the subjects.
The massive and hugely successful 1954 field trial of the
Salk polio vaccine was a classic intervention trial and a clinical
trial too, with 401,974 first- to t3tird-graders assigned at random
to either a vaccinated group or a control group injected with a
placebo, or dummy shot-and another 947,171 children
divided' between vaccinated second-graders and unvaccinated
first- and third-graders acting as controls. In addition, in all
participating states or counties, the investigators studied and
counted all cases of polio in a grand total of 1,829,916 children:
those who had~ taken part in the study and those who had not.
In the placebo areas, the study was also triple-blinded: neither
the vaccinators, the subjects, nor the doctors who examined the
subjects later for polio knew which children got which kind of
shot. `'_
Another successful intervention study, a conmunity~ bial, es-
tablished the value of fluoridating water supplies to prevent
tooth decay. Some towns had their water fluoridated; some did
not. Blinding was impossible, but the striking difference in den-
tal caries that resulted could not have been caused by any pla-
cebo effect.
/

Just bccausc Dr. Famous or Dr: Bigshot says this is what hc fbund dor.Yn i mean
it is neccsurilj+ so:
-th. Amold Rclman.
Ask to see the numbers, noa jusa the pretty coiors.
-Dr. Richard Muoin
tiarxvwl . /n,trauan aJ Mmhb,
ikverihin}; R! I.una w rtponcn,
WHAT questions should we reporters ask -to make our
news solid, to report the more valid claims and ignore the weak
and phony? When a scientist or physician or anyone else says,
Tve discovered that ...," what should we ask?
In 1949, a year after Britain's National Health Service-
"socialized medicine'- was launched, my editors sent me to
Britain to see how it was working. A bit stumped, I asked Dr.
Morris Fishbein, the provocative genius who long edited the
fournal of the Arrurican Mr,dical Association, "How can~ I, a reporter,
tell whether a doctor is doing a good job?" He immediately said,
"Ask him~ how often he has a patient take off his shiit."
His lesson was plain: No physirali examination is complete
unless the patient takes off his or her dothes. Most reporters are
not skilled statisticians, but we can ask some similgrly revealing
questions. Many of these arz not even statistical, just, simple
ones that, like Fishbein's, probe soft spots and often disclose
either a conscientious approach or one that can't be trusted.
We can learn here from one method of science. We said
49
QUESTIONS RFPi:)t
earlier that a prc
seeking trutli, oft
A is no better tha
sees whether or r
much like the lav
cutor to prove '
guilt}°: A reporte
should be equall
words or thougf
If an invest
case, you may Y
since a good sci,
for you. The n
something.
Here are sc
p1e and obviow
want to ask the
How do }m,
mCnt? YITi 't i
Answer
'I've seet. ~0
block. . . ' rn,
gation, may bc
amt}iing like c
Wh'a1 kirrG'd
dcsi,gn' And a f
1470 wa s
casr-eonttol, ~'irn.
ter for kinds
people just sc
conclusion wi
medical! edito
studj? l4that' s;
mcrwer?'

(
earlier that a properly skeptical scientist, starting a study and
seeking truth, often begins with a nvll hypotlvsis-that tieannent
A is no better than treatment B, that there's nothing there - then
sees whether or not the evidence disproves it. This approach is
much like the lavJs presumption of innocence: It is for the prose-
cutor to prove beyond reasonable doubt that the suspect is
guilry. A reporter, without being cynical and believing nothing,
should be equally skeptical and greet every claim by saying, in
words or thought, 'Show me."
If an invrstigator or claimant is competent and has a good
case, you may have to ask none or very few of these questions,
since a good scientific presentation should' answer most of them
for you. The need for a lot of questions could itself tell you
something,
Here are some possible questions, then, some of them sim-
ple and' obvious ones, a few more terhnical1 for those who might
want to ask them.
How do you know.? Have you doru a study.) Was thac an apni-.
merit? I1'lrat is the aidew? Or is the approach just anecdotal?
Answers like "In my experience .... " "In my hands . . . ,'
"I've seen 20 cases ...' and "Ihere are four cases in our
block ...' may be interesting, may' be worth scientific investi-
gation, may be worth a cautious news story, but there is not yet
anything like certainty.
What kind of study ulas i!? Was there a rystematic rrsaarch plan or
drsigre? And a prowcol or set of rukr?
What uw the study deszgn or mtdod.` obsns.ntional, alxrimenlal,
carrco.rbol, prasperttUr, rdrnspeceive, or wheL? (See the previous chap-
ter for kinds of studies and their uses and limits. )"A lot of
people just scrounge around and try to come up with some
conclusion without any real plan or design at the start," one
medical editor reports. Was the dksign diauhr befmr you smrtnd ;rvev
sdidy ? What sperfte' quatiorcs or hypotiresa a'id yoe sd out to test or
aarurl?

Why did you do it that way ? Do you think it uxis the right kind of
study to get the answer to this guestion or problern?
Was it a trnrc human rxperiment, fpossible, with comfiiarabla groups
picked at random for comparison? If' not, why not? And what was the
subititute ?
If an investigator patiently - you hope-tells you about an
acceptable-sounding design, that's worth a brownie point. If'the
answer is "Huh?" or a nasty one, that may tell you something,
else.
Are yon presenting preliminary data or something fairy eonclusn'irr?
Are you prrsrnting a conclusion or a hypotlesis for ftatM study? "Pre-
liminary" and "interesting" can mean 'unproved"
If'the result is not ruuonab y concltcsirx, should there be further stvd:us
and ' what Aznd?
How many su~ects patients, cases, or penp'te are you taLting about?
Are thae nwnbers lnrge enough; statistically ngorous enough, to get the
aruuxrs you u.iant.1 Was there an adequate number of patients to show a
di,&*nncr between trtatments? Why are you calling a press conference too
rrporl'on foro patients?
Small!numbers can sometimes carry weight. And they may
sometimes be the only ones possible. 'Sometimes small samples
an° the best we can dos one researcher says. But larger numbers
arc always more likely to pass statistical muster,
The number studied can also depend on the subject. A
thorough physiological study of five cases of some difficult disor-
der may be important. One new case of smallpox would' be a
shoc.ker in a world in which smallpox has supposedly been elimi-
nated. In June 1981 the federal Centers for Disease Control'
reported that five young men, all active homosexuals, had been
treated for Pneumocystis =inii pneumonia at three Los Angeles
hospitals.'' This alerted the world to what soon became the
AIDS epidemic.
Who were your subjects? How were theyy sulGetrd? what were your
crzteriafor abnission to the stud.y? Werr rignrout laboratory tests used to.
QU£STIONS REPOR'
&finC the PQ17CntSi or
Was the auigr
randomizcd ~ Randc
cent chance of bei
armed study (one
ttd to thr study btf
How was the randc
If'thr subjes z
"If it is a nonrar
some extraordinz
Was there a c
always be weakeison.P'In.other wo
what are you carnlt
control'group simi/c'
siudicd ?
Vogt calls 4
bly ... th -~nE
ular liter D
Do you hane
atiur of the grnera
the disease or cmu
long way towar~
an the rrsultc apJ
.
Ifv= gm
important fwfiulc
statistical adJustr
sPa'~.rw gr°upsi m
ple to make z
nearly compar
bility and strar
Was the it
treatment ' with a

define the patientr, or uaen chnical'diagnoses (nxeuari[y less nliablc) used,'
Was the assignment of subjeids to b'mtrnent or other v~n
randomizeV Randomization should give every patient a 50 per-
cent chance of being assigned' to one group or the other of a two-
armed study (one comparing two groups). Were the patients admit-
kd to the study before the randomizatiort? This helps elimiinate bias.
How uxis the rmidomizataon done?
If the subjatr uxnnt randomizad, why not?'Qne statistician says,.
"If it is a nonrandomized study, a biascd investigator can get
some extraordinary results by carefulIy picking his subjects"
!
Was thcre a control or comparison group? If not, the study wiIl
always be weaker Who or what wen}+our contmis or bQUS fvr compmi-
son? In other words: When you say ynu have such and such a result,
what are}+ou comparing it' with? Are thc study or poturtt grocrp and thc
contml group similar in all raabacts but the traatrnent'or other variable being
stndird.?
Wogt calls "comparison of non-comparable groups proba-
bTy ... the singie most common error in the medical and' pop-
ular literature on healthh and disease."
lb}pu have rauon to brli'eue yo=n sublacts and contsols waa represent-
atirx of the general pnpelation? Or the paatrtular population-thau with
the disease or condition you are int~ in? The answers here go a
long way toward answering these questions: To what populations
are the rrsults applicable? Would the association hold fpr other groups?
If.yoiv groups are not comparable to the grneral populhtiorc or some
importarrt populatim, have pou taken steps b adjutt for thir? Eith'er
stdtirtical adjustrrient or stratifuat:on of your sample to fiad out about
spwfugrmcps, or both?'Samples can be adjusted for age, for exarn-
ple, to make an older- or younger-than-average sample more
neariy comparable to the general populace. (More on applica-
bility and stratification after a bit. ) :
Was the strrd}r blind-' In a study companng diugs or other f6rnw of
brntinenl wilh a placebo or a'unvny tnattnent; did (I) those arbTSinisteing

®
52 CHhPTER 5
the ftatmnd, (2)'tlwse gctt:ng d; and (3) those assessing the outcome know
who uaar g+dtrng what, or were th~y inderd blindcd; lnounng only that they
were comparing A' and B(or A, B, and C, perhapr)?
Could those gunng or Betting the treatment huve emtily, gguessed which
was which by a d:'ffereencc in naction or tnste or other rusultt?
Not every study can be a blind study. One tzsearcher says,
'hete can be ethical problems in not telling patients what drug
they're taking and the possible side effects. People are not guinea
pigs" True enough,, but a blinded study will always carry more
comaction.
Were there other acapted'qualtty controls? For example, making
sure (perhaps by counting pills or studying urine samples) tliatt
the patients supposed to take a pill really took it.
Were you abLe to foflow}nur protocol or study plnn?'
If there were questionnaires, interviews, or a survey: Were
the querions likely to eGiit atttcale, reliable answers? Was i1 really possible
to get aawatr answers to these questions?
Sampling is as common in mediaal studies as in~ political
polling. Every study examines a sample, not the whole popula-
tion, The sample must be reasonably accurate to~ give valid
results. But badly worded questions can also distort the results.
Respondents' answers can~ differ sharply, depending on~ how
questions am asked. Exarnple: In one study 1,153 subjects were
asked which is safer, a meatment that kills 10 percent of every
100 patients or a treatment with a 90 percent sutvival' rate?
More people voted for the seaond' way of saying precisely the
same thing.'
People commonly give inaccurate answers to sensitive
questions, such as those about sexual behavior. They are noto-
riously inaccurate in reporting their own medical histories, even
those of recent months.
Ask: Ihd you pretest your qursturns for e,~'ectiueners befo>e do:ng your
actual surury.?
Also: What was your nonrerporue rate? Do you report it?
®
QUFSTIOT:S REPOR"
In any studyy
toursc., Do you aam.
Every study
David Sackett saN
masons. Rather,
recover, die, or tf
ability." If an, inve
dropped out, it a
died of "other cau1
being investigate<
after all', they dii
treatment look b.
deaths in every t
SaeFiett add
originall inocptiot
more are not ac
worth reading"'
"Gtnerally true,
Professor V
few relate d
containI all'.. J
sometimes been
. . incJuding or:
what attnt has
data? ... It is :
data to: make t}.
How long u'
i1), survicr wi.fhlreall}^. k,ww the o
And: N'ou
biasis-a dise:
made by findiJ
but a cure waa
"It does pay tc

®
QUESTIONS REPORTERS CAN ASK
In any study: How mm,v of yorn sdrdy subjia tr cmnpAtted the
ernvst.~ Ik you aeemrnt fvr those who drvppad out cnd re11 usrey they did.'
Every study has dropouts. McMaster Ihtiversity's Dr.
David Sackett says, 'atients do not disappear ... for trivial
reasong. Rather, they leave ... because they refuse therapy,
tzcover, die, or retire to the Sunbelt with their permanent dis-
ability.' If an investigator ignores those who didn't do well and
dropped out, it can make the outcome look better. If those who
died of "other causes" are listed among `survivors" of the disease
being investigated-this is sometimes done on the theory that,
after all, they didn't die of the target cause - it can make a
treatment look better unless there are equatnumbers of such
deaths in every branch of the study.
Sackett adds, 'The loss to follow-up of 10 per cent of the
original inception cohort is cause for concern. If 20 per cent or
more are not accounted for, the results ... are probably not
worth reading'" (On which Dr. Thomas Vogt oomments,.
"Generally tnie, but utterly dependent on the situation:")
Professor Warren Burkett of the Universiry of Texas adds a
few related and'pointed questions; "Does the paper or pubGcation
contain all roultr of all apnir+rentr.? Support for a hypothesis has
sometimes been made to seem stronger by selective reporting .
.. including only the data that most dosely fit the theory. To
what edeni has the data of fered ...&en smoothid ,/ttme the raw
data? . . . It is not unknown for researchers to dip and round
data to make them fit [their] predicted resuits" (italics mine):'
Hout lomg wac the sddy'r fodlow-up? How long do patientt ordinar-
ily szvuiuc rwidi this disense.?' Were your patientr follorcad long mough to
set111y bww the outcomes, , good or bod?And: How thorough uaas the fivllary-up? In one report on
ame-
biasis - a disease caused by an amoeba- the diagnosis was
made by finding the amoeba in one of three consecutive stools,
but a cure was declared after observing just one negative stool.
'It does pay to read with care,' a medical professor observes.
W
®
®
I
N

®
Ct:t,,Pt1R 5
Could yotcr nsults har.r ornvrrad just by chance? Haue any statirtical
lcttr bem appl'ied to tcst thir?'
Did you calculatc a P raaluc? Was it fauorablc-.05 or less? (Re-
ported as <.05; see Chapter 3.) P values and confidence state-
menu need not be regarded as straitjackets, but like jury ver-
dicts, they indicate reasonable doubt or reasonable certainty
Remember that positive findings are more likely to be re-
ported and published than negative findings. Remember that a
favorablt-sounding P value of <.05 means only that there is
just I chance in 20, or a 5 pettxn.t probability, that: the statistics
could have come out this way by pure chance tahen there uas
actually no~efect-so I in every 20 statistically significant results
may be a misleading false positive.
There are also ways and ways of arriving at P values. For
example, an investigator may choose to report one of several end
points, death, length ~ of survival, blood pressure, other measurr
ments, or just the patient's condition on leaving the hospital. All
can be impottant, but a P value can ~ be misleading if the wrong
one is picked or emphasized.
You might want to ask: Are all tlic imporiant end points aruf their
P vali/rs rcpflrtcd? Also: Was the tesi giving the P value the appropriate
test; as planned in your anrtkn protocol, or dul yrou fsnally do more than
one lcind af test? (And perhaps report only the best answer?) What
uxrr the other values?
DId you collaborate with a siatistinan in both' yotv dcrign and }rour
analysis?'A statistician s collaboration often may be indicated in a
credit or footnote.
In studies seeking cause wtd cJfat, remember that associatSon~
is not necessarily causation. Rutgers' Dr. Michael Greenberg
reminds us, "Mathematical methods cannot establish proof of
cause and e$ect. They can indicate the probability that a rela-
tionship occurred by chance, can sometimes quantify the exist-
ing relationship between actions and efects, and ~ can under the
best circumstances be used to predict the impact of actions even~
®
®
®
QUES111ONS RE
if the comple.
. View ml
skepticism."
A true cx
prove cause i
and chemistn
association in
experiment) i
ria that you: c
Is the auo
different plac
How ~ stro
describing a ;,
ralio? The wc
lt mainly me
ing the outur
A rdatiu
one by the ot
(see pavr 46>
55 to iL
188 pc. 10(
smokers we:
cancer-thei
Is there
curve or gra
agent, or ca
deed at gre
smokers at F,
is an unsert]
only after sc
Anothe
conclahon Qor
the associati
tion, betwe(
straight, ste
a straight I
®
0
M

if the complex phenomena driving them are not understood.
... View mathematical associations with a healthy degree of
skepticism."
A true experiment, controlling all variables, can sometimes
prove cause and effect aUnost surely This is easier in physics
and chemistry than in human biology When, then, does a dose
association in an observational study (rather than a controlled
experiment) indicate causation? There are several possible crite-
ria that you~ can ask about:
ls the association consistmt? Are similar results usually found in
different places and by different research methods?
Haw strong is the association? If risk is an appropriate way of
describing a particular situation: Wluratt is the relaticr rtsk; or the risk
ratio? The word "strong" is used here in its mathematical! sense.
It mainly means the magiitudr of an effect or risk, the odds favor-
ing the oattome of interest versus no such outcome.
A relative risk, or risk ratio, compares two rates by dividing
one by the other. In an American, Cancer Society smoking study,
(see page 46); the lung cancer mortality rate in nonsmokers aged'
55 to 69 was 19 per 100,000 per year; the risk in smokers was
188 per 100,000. Since 188 divided by 19 equals 9.89; the
smokers were about 9.9 times more likely to die from lung
cancer-their relative risk was 9.9.' That's strong!
Is there an impressive dase-raporue, or casesc and-rffect; cww- a
curve or gradient that shows that the greater the exposure to the
agent, or cause, the greater the effect?' Heavy smokers are in-
deed at greater risk than moderate smokers, and moderate
smokers at greater risk than 6ght smokers. (In some cascs-tfiis
is an unsettAed matter- therc may be a ttueshold effect, an effect
only after some minimum dose.)
Another way of asking about risk and response: What is tha
corrrltrtion coeffuieru-the extent to which a set of measurements of
the association is linear? A perfect linear relationship, or correla-
tion4 between two observations or variables would show up as a
straight, steadily rising set of data poir~tr-in everyday language,
a straight line on a graph. A perfect positive correlation or,
t

®
®
ciLkPTER 5
linear relationship, is given the value +1; +.5 would be a lesser
but still interesting relationship;, -1 or any negative figure indi-
cates an; inrxrx or rugatiix rrlvtionrhs'p, such as a runner's speed
going down as his weight goes up. A correlation of zero means
no consistent association.
How spaific is the associatiori? Does a supposed cause lead to
many supposed effects? Or does an effect depend on many sup-
posed causes? Sucli associations are less specific, and thus more
suspect, until' positive evidence piles up. Smoking indeed causes
many effects. A lung disease, asbestosis, is most common when
there is exposure to both asbestos and cigarette smoke.
Does the supposed cause pra-.edc the did? Is a supposed beo ogical
association epidemiologically. plausibk? One strong argument for a
cause-and-effect rdationship between high consumption of satu-
rated fats and cholesterol and coronary heart disease is that
populations on such diets generally develop more such disease
than those on leaner diets.
Does the arsonctfon make biological sensP Does it agree with
current biological and physiological knowledgc?'You can't follow
this test out the window. Much biological facr is ill understood.
Also, Mosteller watns, "Sonuoie nearly always will clgim to see a
[biological', or physiological] association. But the people who
know the most may not be willing to."
Finally, look for the real why., Ask: Are there other possible
aplanntions?'Ded you ldok for other aplanatiorzs-confounders; or con-
fnundi'ng aariables; that may be producing or helping produce the
association? Sometimes we read that married people live longer
than singles. Does marriage really increase life span, or may
medicaL or other problems make some people less likely to
marry and also die sooner? Maybe the Dutch thought storkss
brought babies because better-off families had morr chimneys,
more storks, and more babies.
Did you tnke steps to avnt'rol or adjust for other possible aplmiatio+u?
Did you do a stratifud analyst;s-a breakdown of the data by strata
like sex, race, socioeconomic status, geograp}ncal' area, occvpa'
tion? Men commonly have more bronchitis and cirrhosis of the
WC
QUES77ONS R
liver than w
more heart
possibly beca
analyses will
Did you c
ak mtalysisj t,
analyses can
also be misu!
Some aophis
analysu did yc
the more an;
consider? Hou
tor tries eno
tion, he or
untrue.
In caus,
nanalysir of .
independent
see if t+- re
P -d
or rea se
analysis or r
among auth
reasoned ar.
than the an,
In stud
knoLv or da6aplu~; o*
'
ments or tc:
teniews, ph
highly subjc
provement
quantify), ou
YY~s there sor
Iftwoo

QUFS"17ONS REPORTERS CAN ASK 57
liver than women because they drink more. They also have
more heart disease, possibly because they've smoked longer,
possibly because some hormones protect women. Only stratified
analyses willi bring out such differences.
Drd ymu do an analysis (a rrgsecsice or somr othu fvrm of nvltivari-
a1s mmlysis) ~ to by to identzfy the impor~ aiaiiable or cmrabdis? Such
analyses can often reveal the strongest associations. They can
also be misused, and they are not always needed or appropriate.
Some sophisticated questions, when appropriate: How many sush
mmlyses d:d you have to run to dmidr on the appropriate one? Sometimes
the more analyses, the worse the study. How many variables did you
consids.T How many of these did yau wind up reporting? If an investiga-
tor tries enough variables in a kind of statistical fishing expedi=
tion, he or she is almost bound to find something, true or
untrue.
In eause-and-effect and other studies, ask: Has there &rn any
rennalysic of the data.-' "Results, if possible, should be met?iod-
independent," Greenberg believes. "You should recalculate and
see if the results hold up."
A word of caution: Questions about multivariate analysess
or reanalyses can be tricky. Whether or notto do one kind of
analysis or reanalysis or none at all is often a matter of dispute
among authorities. Launch the subject with some humility. A
reasoned answer, afumative or negative, may tell you more
than the answer's precise content.
In studies of medical treatments or preventives: How d:dyvu
kiwm or dai& whne your patients uxn c7vad or rinproved> Wen there
arplruit; objactirx outcome eriterra.~ That is, were there firm measure-
ments or tesr results rather than physicians' observations in in-
terviews, physical examinations, or chart reviews, all techniques
highly subject to great obsenxr variation and inaccuracy? If im-
provement or relief from pain-a particularly soft (hard to
quantify) outcome measure-had to be judged by observers:
Was diere some systema,tic way of making an auessmmt?
If lrrwo or more groufis uxrr cnmjaradfor sunnug ' was d+eis starlirg

~
~
CHAYTER 5
point the same at onset? At diagnosis? At start of tnatment? Were thcy
Jpdged by'the same disease alefinitions' at the stmi and the same merssures of
seU[R~y ' afill ot3tcorAe?
Did the intenention have the good resultr that uxre intended? Has
there been an aaal>ration to sa whether it was a useful recull?
Investigators often report that a drug or other measure has
lowered blood cholesterol levels. Fine, but were t.hey able to
show that it reduced the number of heart attacks? Or was reduc-
tion of a supposed risk factor itself taken to mean the hoped-for
outcome? That may' often be necessary, but the issue should be
discussed.
Investigators once repotted that a new heart drug reduced
the number of recurrent myocardial infarctions (heart attacks),
fatal and nonfatal. But total mortality for all causes was higher
in the treated group than in a placebo group.
Public health officials may announce the success of a cam-
paign to take high blood pressure measurements: X number of
people were found to be hypertensive and were referred to their
doctors. But how many went to their doctors? How many of
those received optimum treatment? Were their blood ptr.ssuress
reduced? (If they were, the evidence is strong that they should
suffer fewer strokes.).
In short: What uxis the bottom line? Did you really do any good?
To whom do your ruults apply? Can thry 6'e generarizod to a larger
populhtion? Are your patieni,'t like the average dodor's patients? Is there any
baszt in these findings for any patienl to ask his or her doctnr fof a change in
treatment? Clinic populations, hospital populations, and the
'worst ca.ses" are not necessarily typical of patients in general4
and improper generalization is unfortunately common in the
medical literature.
Agarn and again, in many of the cases cited in dus chapter,
ask: Do other sradies 6ack,ynu up? AnyKnir nnvlts consistent with other
clinic.al and erldcrimertal ffndings? Have yoea ,eultr b+rrn erjGraled or
Qt:ESrnoras Rf
confirnud or suj,
thesereS1llLs? Virtually
studies add c
criteria and tl
in humans, a
One s4e
grab bag of s
cumstances.'
but consisten
John Bailar t
several low I i
integrating ii
than any on
Mostly
most impor
these: What
data neally.
_
late6 6-won
mad; x
Dbes tlu
and flarus in
the inrxctigak
Robert Bo:
audacity ar.
use gLalifiyv
bound to i::
Ask tl
Yrour vxmE b
rienced sa
ers genera
Frederii
COtnTDM W7AW
thmgho ocn:.

QUFSTIONS REPORTERS CAN ASK
=fvmd or suppn.kd by otheff rtudirs? Or loar onry}m bixg cdlr m grr
#UM .esutu?
Virtually no single study proves anything. Two or 4 or 15
studies add credence, especially if the diagnostic and outcome
criteria and the people studied are similar. Consistency of results
in humans, aaimals, and laboratory tests also adds credence.
One scientist warns, however, 'You have to be wary about a
grab bag of studies with different populations and different cir-
cvmstances.' To which Haazvard's Mostelltr adds, "Yes, be wary,
but consistency across such differences cheers me up' And Dr
John Bailar tells us that, despite possible pitfalls,,'mda-mraCysir of
several low power reports"-that is, statistically analyzing and
integrating their results-"may come to stronger eondusions
than any one of them alone' (italics mine).
Mostly just good-sense questions? Of course. Some of the
most important questions of all for a reporter to ponder are
these: TNhot do I tlunk? Do the cvnclusions make snue to me? Do the
data really justify the conclusions? If this person has extrapo-
lated beyond the evidence, has he or she explained why and
made sense?'
Does the irwatiqator fsankly' dawnent or dittuss the possibl'e biatts
mid jaws in the study? A good scientific paper should do so. Does
thr intxstigator admit that fhe coaclusian may be finlodue or euiuoca!? Dr.
Robert Boruch of Northwestern University says, 'It requires
audacity and some courage to say, 'I don't know.'" Do the wu11wn
rca qualtfying pAemre? If such phrases are important, we are
bound to indude them in any responsible story
Ask the investigators themsd+ves: How much uxighe should
yotv urosk be giuere? Is it mally fsrm? And how imporienO An expe-
rienced science reporter says, `I have found that good' research-
ers generally have an honest and proportionate view of their
'Frsderick Moudlv diugreea with my a.zsaional iefe+ence to good senx or
common aens. If .omething is a commonusue ideam he says. 'wr* all would have
dwughr of it. So it mun be uncammon .eiue after all.' He msia good'rn+e.,
M
®
®
®
0
®
®

®
so cKAF7ER 5
own work's importance." But there are many exceptions.
Ask others in the same field: How do other infnrmed pmpk
ngard this rrport - and lheu invustigators? Are they s fxaking ia their arvnm
area of'eoertise, or have tliry shown roal mastery f they have rxntufed
ouLtide it? Have theif paaY results generally held up.P And'ryliat an somr
good'guestimu I tan ask them:?'True, a lot of brilliant and original
work has been pooh-poohed for a time by others. Still, scientists
survive only by eventually convincing their colleagues.
More formally: H'as d6rrc been a nezricv of the data and cnnclusions
by any duinkrestcd pwtus? Some major cljnicali studies are re-
viewed~ by independent second~ parties or committees. Reports
of the National Academy of Sciences must pass muster by a
review conmrnittee.
Has there Earn prn nview of the matmal? That is, has it been
examined by referees who were sent the article by a journal
editor?
And, a very important question: Has the work bxn publishrd
or accelbatd by a raputab7c journal? If not, why not.? The Ntw England
formral of Malicirre prints only 15 percent of the papers submitted
to it (many, of course,, are rejected because they are not of
enough interest to the journal's readers). Many have been given
at medical or scientific meetings, yet do not pass peer reviewers'
or the editors''muster: Most are eventually published elsewhere,
many in good journals. But there are journals and' journals.
In science as a whole, including biology and often basic
rnedical i sciences, &ience and the British Ntrtwe are indispensable.
In general medicine and clinical science at the physician's level,
the best, most useful journals are probably New England Journal
of Medicine,, Joarnnl of the American Medua! Association, Annals of
brtaaal Mulu:'ne C'anadian Mediialjournal; Journal of Clfnual hurs-
tzgatiars, and the British Iaauer and Biitirh Med:ial Journal: There
are many equally good' specialty journals as well as mediocre
ones. In epidemiology, three good sources are Amencan Journal of
Epidemiology, Journal of Chronic Dr'smses, and FRer.mtirX Madicone.
Ask pe.ople in any field: What are the most reliable journals,
those where you would want your work published?
QuFSTtorvs ,cU
Some of t
are not jpurna
like Family Prm
mary articles f
free-circulatior
and medical rr
revenue, are
journals. The
journais print
ords of work
JournaPs Dr. f
Read the
the investigat
the article ha
library, whid
hospitals, an
cieties. Too r
conservative] i
further in in1
to go ~ ti.ei
rrv1eM ln
put yo. . g
read the arti
Most re
an arttcle, loor ~
Arnec
tician, and a
ysis a.nd its c
to detect tre:
at least assu
statistical an
times. Som, .
isn't identifi
tics.
Table
sions. Som

QUESTIONS REPOitTERS CAN ASK 61
Some of the most valuable joutnals to a medical reporter
art not journals of original publication but review publications
like Fcrnily Practuc and Hospital Practice, which mainly ptint sum
mary articles for practitioners. With some strong exceptions, the
5ce-cirtulation - also known as controlled-czrcvlation - jounaals
and medical tnagazines, which depend wholly on adverttsing for
revenue, are not as rigorousiy screened as the traditional
journaIs. They are often on top of the news, however. All
journals print clinkers sometimes. "Scientific joun-tals are rec-
ords of work, not of revealed truth; says the New England
forvnal's Dc Arnold Retman.'o
Read the entire journal article yourself, if there is one. Ask
the investigator for a copy or phone the journal. 0r, assuming
the article has already been published, look for it at a medical
library, which can be found at any medical college, most good
hospitals, and the headquarters of many county medical so-
cieties. Tioo many news releases tout artides that read far more
conservatively than the PR version. Many scientists go much
further in interviews or news conferences than they are w0ling
to go in their articles. A reporter asked a scientist, `Does peer
review of an article put you at case?" He said, "It should help
put you at greater ease, but nothing puts me at ease until I've
read the article"
Most reporters can't be scientific referees, but uAen,yrou read
an mtrclt, loakfor t1u Jollon,irT.:
A credit or footnote indicating eollaboration with a statis-
ticians and a paragraph describing the method of statistical anal-
ysis and its outcomes, such as Pvalue or confidence level, power
to detect treatment effects, and so on. If they're in place, you can
at least assume that some efforti was made to apply the rigors of
statistical'analysis. If they're missing, should you beware? Some-
times. Sometimes the statistician is a coauthor whose specialty
isn't identified. tlnd~ some investigators are well versed in statis-
tics.
Tables and figures that tell the same story as the conclu-
sions. Sometimes they don't. One statistician told reporters,
COMEMES&NO
®
M
M
M

62 CH.APTF.R 5
"Don't assume that someone can interpret his own data. You
may do better." And "muddle around in the footnotes and ap-
pendices;' Mosteller advises. `You might find a few horrors.
That's how people found out that a much publicized study of
public and private schools induded only about 12 private, non-
parochial schools."
Other things described in this chapter, such as the proto-
col and study design, the criteria for admitting and ~ randomizing
subjects, the therapy actually receive& (in contrast to that
planned in the protocol); blinding, complications, loss to follow-
up, follow-up time, and any discussion of reservations or
weaknesses..
Ask, when appropriate: Where did the money to support the study
come from? Many honest investigators are financed by companies
that may profit from the outcome. So arr some dishonest or sr1f=
delitding investigators. But the peddler of a biased point of view
is as likely to be an antiestablishment crusa.der-or an academic
ladder-climber-as a corporate darling. Perhaps the best ques-
tion to ask yourself is Is this investigator a scientist or a sales-
man? lm any case, the public should know any pertinent con-
nections.
`What proportion~ of papers will satisfy [all] the require-
ments for scientific proof and clinical applicability?" Sackettt
writes, "Not very many. ... After all; there arc only a handful
of ways to do a study properl'y but a thousand ways to do it
wrong.""'
Despite impeccable designy some studies yield answers that
turn out to be wrong. Some fail for lack of understanding of
physiology and disease. Even the soundest studies may provoke
contzoversy.No study settles anything for all time.
And according to Sackett, some "may meet considerable
resistance when they d'iscredit the only treatment currently
available.... Clinicians may still elect to do something, even if
it is of no demonstrable benefit. Study results may be rejected,
QUEST1ONS F
regardless o
hood of thei
Repon:
everything
some of the

rega:dltss of their merit, if they threaten the prestige or liveli-
hood of their audience."
Reporters need to tread a narrow path between betieving
everything and believing nothing. Also-we are reporters-
some of the controversies make important stories.

®
®
Tests and Testing
®
M
Testing iu often the only way to answer, our questions, but it doest'i produce
unauailable, universal truths that should be canved on stone tablets. Instead,
testing produces statisucs,, which must be interpreted.
Who knows when thou mayest be tested?
-Roben Hooke
-Ronald Arthur Hopwood
DO physicians always know what they're doing when they
admuuster tests? Stanford's Dr. Eugene Robin says many tests
'have not been properly evaluated' and in fact may be useless or
harmful." He asks, "Is it common practice in medicine to per-
form careful dinical trials before introducing tests that can affect
the welfare of masses of patients? Sadly the answer is no:"
A good test~ should~ detect both~health and disease and do so
with high accuracy. The measures of the value of a ck'nical rrst;
one used for medical diagnosis, are seruztiuity and specifuit}; or,,
simply, the ability to avoid faLte negatirrrs and false po.Til:m: Snuah'r;-
ity is how well a tesv identifies a disease or condition in those who
have it-how well it avoidr folsa rugatiua, or missed cases. If 100
people with a condition are tested and 90'tesv positive, the test's
sensitivity is 90' percent. Spiuzfuity is how well a test identifies
those who do not have the disease or condition -how well it ruL_c
out Jaltepositiucr, or mistaken identifications. If 100 healthy peo-
ple are tested and 90 test negative, the test's specificity is 90
percent.
Sau:'tiui~K in short, tells us about disazre present. Spai,ficity tells
us about diauase absent. A highly unspecific test will produce
many false positives; a highly insensitive test, many false nega-
64
TFSTs AND ~ T1
tives. Almost
qualities-suc
an overlap. 7
every c.a,seth
you willget.'
labeling, the :'
you wtll get.
As a bor
terms. ('So ~
.
comments.)
concept, the
fact that tests
person who t
this:
How ma
biom thu:~ H;
tests in the ]
medical con(
tried~ a as
some , i.
follow-up, 0
condition be
subjects, ana
tion frequer,
How %+
false positiv(
not to misss
sitivity to p
avoiding fa]
anyway, on
Doubt
because in :
acceptable "
short, therclli
uated hornC
detected prtW
.. . . ~:r. -

---

---
