Jump to:

Philip Morris

News & Numbers A Guide to Reporting Statistical Claims and Controversies in Health and Other Fields

Date: 19890000/P
Length: 73 pages
2023512442-2023512514
Jump To Images
snapshot_pm 2023512442-2023512514

Fields

Author
Cohn, V.
Mosteller, F.
Area
SCIENTIFIC AFFAIRS/BLACK LATERAL OLD S&T
Type
PUBL, PUBLICATION, OTHER
Master ID
2023512310/2514
Related Documents:
Document File
2023512309/2023512515/Ets Issue Binder: Epidemiology
Characteristic
EXTR, EXTRA
MARG, MARGINALIA
Litigation
Okag/Privilege Withdrawn
Okag/Produced
Named Organization
Harvard Univ
Library of Congress
Author (Organization)
Harvard Univ
Washington Post
Site
R529
Date Loaded
24 May 1999
UCSF Legacy ID
zjc02a00

Document Images

Text Control

Highlight Text:

OCR Text Alignment:

Image Control

Image Rotation:

Image Size:

Page 1: zjc02a00 Log in for more options!
® News~ Numbers A GUIDE 'IC7 REPORTING STATISTICALCLAIMS AND CONTROVERSIES IN HEALTH AND OTHER FIELDS Victor Cohn SENIOR WRRER AND COLUMNiST, FORMER SCIENCE EDf[DR. Wachingfan Flui FOREMP-D sv Frederick Mosteller ROGER G,LEE PROFESSOR EMERTMS OF MATHEMATICAL STA77STdCS, Hmrmd (Iniurney A1~-oject of the CentEr for Health Communication Harvard School of Public Halth Iff IOWA STATE UNIVERSITY PRESS / AMES ® 8 m ® ® ®
Page 2: zjc02a00 Log in for more options!
A Note tc ® 1989 Victor Cohn. All rights neserved Compoeed by Iowa State LJnivreisry Pness Printed in the United States of Americs No pan of this book may be reproduced in any form or by any ekcvonie or merlianical means, inrliiding information aorage and reuieval rystems, without written pennission from the publisher, except for brief passages quoted in a nview. First edition, 1989 Library of Congress Cataiogin;-in-Publication Data Cohn, Victoq 1919- News & numbers. 'A project of the Center for Health Communication, Harvud School of Public Health.' 1. Public health-Statiaia. 2: Environmental bealth-Statistics. 3. Vital .utiatio. I. Harvard School of' Public Health. Center for Health Communiea- uon:, II. Tide. III. Tide: News and numbers. RA407.Cb4 1989 362.1'021 88-6807 ISBN 0-8138-1442-1 ISBN 0-8138-1437-3 (pblc.) THE ruTe fied. They aF stated or iinF porting, busii This gui language of : about t},- ^na on sor.( P project of the health and th ciples and mz used by inqu: a scientific re environment weighandc shows how tJ N 0 N W U1 ~ W
Page 3: zjc02a00 Log in for more options!
A Note to Readers THE rules of statistics are the rules of good thinking, codi- fied. They apply to any kind of reporting in which numbers- stated or implied-are involved: political reporting, science re- porting, business, economics, sports, or whatever: This guide is an attempt to explain the role, logic, and language of statistics, so we reporters can ask better questions about the many alleged facts or findings that rest, or should rest, on some credible numbers. Because this manual began as a project of the Harvard School of Public Health, the reporting of health and the environment is the major example. But the prin- ciples and many of the suggested "questions for reporters" can be used by inquiring reporters in any field. They can help you read a scientific report or listen to the conflicting claims of politicians, environmentalists, physicians, scientists, or almost anyone and weigh and explain them. And the final chapter specifically shows how these principles apply in all areas. ! VICTOR COHN N 0 N Ca t!1 ~ N >~A tiP CA 0 0
Page 4: zjc02a00 Log in for more options!
~._ ~_--- - ~. Contents hQRF:WL)FCI) KY F':rdentk r4fnatel4., ix ACKNCXNLEDC:Iv1F.P1"I?5; xi 1. Facts and Figures-We Can Do Better, 3 2. The Certainty of Uncertainty, 8 3. The Scientific Way, 12 Probability, 14 'Power' and Numbers, 20 Bias and Confounden, 24 Vuiability, 30 4. Studies, Goo&and Bad; 35 Experiments versus Seductive Anecdotes, 37 Clinical Trials, 38: What Makes a Study Honest' 40 Epidcmiolcgy: Hippocrates to AIDS, 43 S. Questions Reporters Can Ask, 48 6. 'I'ests and Testing, 64 Drugs and Drug Triali, 68 Animals as Models for Us. 72 7. Vital Statistics: The Numbers of Life and Health, 74 Crude Rates versus Rates Tham Compare, 76 OtAer Ways to Compaee, 78 Rcponing Hoapita! Death Rate, 79 Cancer Rates and Cancer'Cutei , 86 The linportant Quetioru about Cuxer, 88 Shifes, Drifts, and Blip, 96 vii ® ® ® w a 0 ® 0
Page 5: zjc02a00 Log in for more options!
viii CON'1't:N'tS 8. The Statistics of Environment and Risk, 98 Who's Bdleva6le? 1U7' Qlleft1oT/5 ttl Ask, 108 Evaluating Envnnnmemal Huards, 116 Advice from kcponers, 121. 9. The Statistics of Politics, Economics, and' Democraey 126 The State of'the Nation's Sutittia, 146 The Bonom Lne„ 151 w r•1 t:R t-: '1 v r.EA R N moR r: A Bibleogapliy and orheT Sourcu, 153'. NO'1'F.S, 157 GLOSSARY/iNAEX, 165 Foreword REPORIT science to the accvracy.A1th! stories, the bic( presents speci2 misleading mt sistent," and 'y sults that ane layTnen' r definitio.__ -a~ftc siderable differ Science h such as biostat have been imp ertheless, they permanent for Victor Cc ual to'help all wants to give t facts or mystif Cohn's bo Science Policy Research and that faculty m have been able
Page 6: zjc02a00 Log in for more options!
/ < ® N Foreword REPORTERS play an essential role in communicating science to the public. In common with scientists, they desire accuracy Although health and medicine provide many exciting stories, the biostatistScs that scientists must use in their studies presents speciaI problems for reporters. It gives uncommon and misleading meanings to common~ words like "significant," "con- sistent," and "power." Mathematical statistics often produces re- sults that are disturbingly counterintuiti've, at least at first, to laymen and scientists alike. In vital statistics and epidemiology,, definitions often seem arbitrary, and slight changes make con- siderable differences in the findings. Science writers often take short courses in special topics such as biostatistics. I have taught in some of these courses and have been impressed by the seriousness of the participants. Nev- ertheless, they need some of this material in an accessible and! permanent form. Victor Cohn~ of the Y1rashington Post has prepared this man- ual to help all reporters cut through these statistical tangles. He wants to give them a guide to the ways that statistics can darify facts or mystify the reader. Cohn's book grew out of the Media Project of our Health Science Policy Working Group of the Division of Health Policy Researeh and Education at Harvard University. I am pleased that faculty members of the Harvard School' of Public Healtlt have been able to help him produce this book as a visiting fellow ® ix 0 ® t ®
Page 7: zjc02a00 Log in for more options!
x FOREWURD in 1978 and 1984 and as a contributor to the Health Science Policy Working Group. Through the Media Project, with the help of Jay Winstens we have also examined sources of pressures on the science writer.' In the future we wanr to use what we have learned through many discussions with science writers to advise scien, tists on their role in the media. By such e$brts, including this book, and by many similar efforts in this and other fields, scientists and writers may gradu- ally upgrade the whole communication system, scientific and journalistic: Thus we may clear the communication channel between science and~ the public. FREDSRICK MOSTELr _FR AcknowlE MY mai has been Ur. tus of mathe partments of Harvard Sch energy, and! } for the fact • approach rat: statcrr ~r Tt,, prc tions and by ing„ which p journalists, b Ididmt School of Pu Center for IF guide, and : Thomas A. l were Dts. C Kaiser orgar: and Peter Iv writings 116 Cass Pete.r,o Runkle, no : I aLso o, .
Page 8: zjc02a00 Log in for more options!
i i R ~ MY main mentor and guide in the preparetion of tlus book r has been Dr. Frederick Mosteller, Roger I. Lee professor emeri- s tus of mathematical statistics and former chairman of the de- s pamnents of Biostatistics and' Health Policy and'Managemenr, 2 Harvard School of Public Health. He gave so fully of his time, ; energy, and knowledge that he should be listed as coauthor but for the fact that I sometimes used a journalist's freewheding ~ approach rather than a statistician's rigor. This makes any mis- ;~ statements mine. The project was supported~ by the Russell Sage Founda- tion, and by the Council for the Advancement of Science Writ- ing, which pointed the way in holding seminars on statistics for journalists„ induding the first of its kind in 1964. I did much of the work as a visiting fellow at the Harvard~ School of Public Health, where Dt: Jay Winsten, director of the Center for Health Communication, was another indispensable guide, and Drs. John Bailar III, Nan Laird, Philip Lavin, Thomas A. Louis, and Marvin Zelen were valuable helpers. As were Dts. Gary D. Friedman and T homas M. Vogt of the Kaiser organizations, Michael Greenberg of Rutgers UniNersity, n. and Peter Montague of Princeton University (on all of whose Q writings I leaned); Lewis Cope of the Minneapolis Star Tribune; r, w", Cass Peterson of the Washington Post; and my daughter, Deborah ci Runkle, no mean statistician. ~J1 I also owe thanks to Harvard's Drs. Peter Braun, Harvey ~ ~ ©
Page 9: zjc02a00 Log in for more options!
Fineberg, Howarr] Frazier, Howatd' Hiatt, William Hsaio, Hetb Sherman, and William Stason. And to Drs. Stuart A. Bessler, Syntex Corporation; H. Jack Geiger, City University of New York; Nicole Schupf Geiger, Manhattanville College; Charjes Moertel, Mayo Clinic; Arnold Reltnan, New Eregland fourrusl ofil?edr<inc• Eugene Robin, Stanford Universiry;and Sid- ney Wolfe, Public Citizen Health Research Group. Also Katherine W2llman, Council of Professional Associations on Federali Statistics; Howard~ L. Lewis, American Heart Associa- tion; Philip Meyer, University of North Carolina; Mildred~ Spencer Sanes; Earl Ubell, WCBS-TV, New York City; and Philip Hilts, Cristine Russell, and Barry Sussman, Washington Po.rt. I am indebted to my editors at the Washington Post, particu- larly Abigail 1 Trafford, Ben Cason, Carol Krucoff, Len Downie, and Howard Simons for their understanding and support. The work was also aided by the Andrew W. Mellon Foun- dation. The American Cancer Society, American Heart Asso- ciation, Commonwealth Fund„ Gannett Foundation, Henry J. Kaiser Family Foundation, Mayo Medical Resources, Milbank Memorial Funds Pew Charitable Trusts, Philip ~ L. Graham Fund, Russell Sage Foundation, and John~ Cowles, Jr., have contributed to this manual's initial distribution.
Page 10: zjc02a00 Log in for more options!
a Facts and Figures - We Can Do Better Facts and Figures! Put 'em Down!'. -Chules Dick'ens (in 77r Chbnc) There are lies, there arr damtud lies, and thete axe statistics. -Duraeli Almost everyone has heard that 'tigures don't 6e, but liars can figute' We need statistics, but Uars give them a bad name, eo to be abk to tell the liars from the statisticiasu is crucial. , ® -Dr. Robert Hooke I I E journalists like to think we deal mainly in facts and ideas, but much of what we report is based on numbers. Politics comes down to votes. Budgets and dollais dominate government. The economy, business, employment, sport.s-all dtmand numbers. 'Ihe environment, pollutants, toxic chemicals. Again, we see counts and measurements and, most likely, widely varying estimates, some careful, some questionably high or low: An environmentalist says a nuclear power plant or toxic waste dump will cause so many cases of cancer. An industry spokes- man denies it. What are their numbers? Where did they get them? How valid are they? A doctor reports a promising, even exciting new treatment. Is the claim justified or based on a biased or unrepresentative sample? Or too few patients to justify any claim? Science, medi- cine, technology, the weather, intelligence- all are statistical. IN
Page 11: zjc02a00 Log in for more options!
i! CHAPTER 1 Science is observation, experimentation, measurement, and all these involve numbers, whether we reporters pay attention to them or not. Statistics are used or misused even~by people who tell us, "I don't believe in statistics," then claim that all of us or most people or many do such and such. The question for reporters is, how should we not merely repeat such numbers, stated or implied, but also interpret them to deliver the best possible picture of reality? We can be better reporters if we understand how the best statisticians-the best figumrs-figure. And if we learn a few questions to help us separate the wheat from the chaff: I do not say that telling the truth-describing reality-will'~ then become easy, for we are constantly bombarded'witli sweep- ing claims in convincing wrappings, and the disputed subjects are endless. Medical and~ surgical treatments, radiation, pesti- cides„ nuclear power, the probability of environmental disasters, the side effects of inedicines-almost nothing seems settled. Like it or not, we must wade in. Whether we will' it or not,, we have in effect become part of' the regulatory apparatus. Dr. Peter Montague of Princeton University tells us, The environ- mental and toiric situation is so complex, we can't possibly have enough officials to monitor it. Reporters help officials decide where to focus their activity" "f,ournalists opened up" the Love Canal toxic waste issue by "independent investigation," according to Cornell University's I1r: Dorothy Nelkin. The extensive press coverage contributed to investigations that eventually forced the re-staffing of the En- vironmental Protection Agency and the creation of a national toxic waste disposal! program:"' That very coverage, however, may also have stampeded public officials into hasty, ill-conceived studies that left un- answered ~ the crucial question: Did the Love Canal wastes ac- tually cause birth defects and other physical problems?2 The very way we report a medical or environmental controversy can affect the outcome. If we ignore a bad situation,, the public may f I FACTS Af:D FIG suffer. If we v. "no danger," tI& experimental i false hope. It isnot , National Can refuse to con think "carcinc persons proba cancers are er most inforrnec related main] and very pos percent ofL aL carcinogens- foods, air, an When it issues, or wl- making the si state or unde of he 1J statisti~ at terpretationm evident; you negative]. A sterile is mon that apple pi We also the space or news di:recto ~ story yet." EN done. In a r major southc traction afren who worked numbers fro
Page 12: zjc02a00 Log in for more options!
suffer. If we write "danger; the public may quake. If we write "no danger," the public may be falsely teassured. If we paint an experimental rnedicali treatment too brigtitly, the public is given false hope. It is not just what we write, it is what we emphasize. A National Cancer Institute survey indicated that many persons refuse to consider healthy changes in life-style because they think "carcinogens are everywhere in the environment." Such persons probably have read or heard again and again that most cancers are environmentally related, although, in the opinion of most informed scientists, most fatal "environmental" cancers are rdated~ mainly to individual behavior, outstandingly smoking, and very possibly diet. By various estimates, perhaps 5 to 15 percent of all cancers are related to exposures to man-made carcinogens -chemicals we have inserted into the workplace, foods, air, and water.' When it comes to such emotionally charged and complex issues, or when it simply comes to nanning for page one or making the six o'clock news, the best among us sometimes over- state or understate. Philip Meyer, veteran reporter and' author of Rairion ,Journalirm, writes, 'Journalists who m.isinterprett statistical data usually tend to err in the direction of overin- terpretation.... The tzason for this professional bias is self- evident; you usually can't write a snappy lead upholding [Ithe negative]1 A story purporting to show that apple pie makes you sterile is morr interesting than one that says there is no evidence that apple pie changes your life' We also work fast, sometimes too fast, with severe limits on the space or tirne we may fill. We find it hard to tell editors or news directors, "I haven't had; enough time. I don't have the story yet:' Even, a long-term project or special may be hurriedly done. In a newsroom "long-term" may mean a few weeks. A majpr southern newspaper had to print a long, front-page re- traction after a series of front-page stories alleged that people who worked at or lived~ near a plutonium plant suffered in excess numbers from a blood; disease. "Our reporters obviously had
Page 13: zjc02a00 Log in for more options!
FACTS AND FIGURFS: WE CAN DO BETTER 7 not patently absurd, it may not be the ltad you would go for a year later" We reporters are also subjecr to human hope and human fean A new `cure" comes along, and we want to believe it. A new alarm is sounded, and:we too tremble.. Alarrns also make news. We too often obey a sardonic maxim: Bad news is good news; good news is no news. Ih: H. Jack Geiger, a respected former science writer andnow a profes- sor of medicine, says, I know I wrote stories in which I explained' or interpreted the results wtvngiy. I wrote stories that didn't have the dixlAuners I should have written. I wrote stories under competitive pressure, when it became clear later that I shouldti t have written them. I wrote stories when I hadn't asked-because I didn't know enough to ask-Was your study capable of getting the answers you wanted? Could' it be interpreted to say something else? Did you take into acmunt possible confounding factors?'
Page 14: zjc02a00 Log in for more options!
6 CHAPTER 1 confused statistics and'scientific data;' the editor admitted. "We did not ask enough questions."s We tend to oversimplify We may report„ "A study showed that black is white" or "So-and-so announced'~t}iar ...," when a study merely suggested that there was some evidence that such might be the case. We may slight or omit the fact that a scientist calls a result 'preliminary." As scientific unsophisticates, we may confuse a study that merely suggests a hypothesis that should be investigated-very frequently the case-with a study that presents strong and~ condusive evidence. We often omit essential perspective, context, or back- ground! Dr. Thomas Vogt of the Kaiser Permanente Center for Health~ Research tells of seeing, the headline `Heart Attacks From Lack of 'C' " and then, two months later, 'People Who Take Vitamin C Increase Their Chances of a Heart Attack"a Both stories were based on limited, and'i far from condusive, animal studies. Scientists who do poor studies or overstate their results deserve part of the blame. But bad~ science is no excuse for bad journalism. We tend to rely most on "authorities" who are either most quotable or quickly available or both, and'they often tend to be those who get most carried away with their sketchy and unconfirmed but "excfting" data-or have big, axes to grind, however lofty their motives. The cautious, unbiased scientist who ~ says, "Our results are incondusive" or "We don't have enough data yet to make any strong statement" or "I don't know" tends to be omitted or buried' someplace down in the story. We are influenced too by intense and growing competition to tell~ the story first and tell it most dramatically, I was once asked by a Harvard researcher, "Does competition, affect the way you present a story?' I thought and had to answer, "We have too almost overstate. We have to come as dose as we can within the boundaries of truth to a dramatic, compelling statement. A weak statement will go no place" Another reporter said4 'he fact is, you are going for the strong [lead and story]. And, while FACTS AND FIC not patently a' year later.'" We repor fear. A new "c new alarm is ~ Alarms a maximc Bad r Jack Geiger, a sor of inediar I know I wrote wrongly. I wroi written. I wrot dear later that hadnh asked-1 capable of getti say something factors?' How car.. N © N W C11 ~ IV ~ CA j
Page 15: zjc02a00 Log in for more options!
The Cerlainty of Uncertainty Too muah of the Kornce reporting in the press [blurs] what we'tc sure of and' what we're not very sure of and what is incandusive. The notion of tentative- ness tends to diop out of much reporting. -Lk. Harvey Brooks The only trouble with a sure thing is the uncertainty. -Author unknown THE first thing to understand about science is that it is almost always uncertain. A scientist, seeking to explain or trn- derstand something-be it the behavior of an atom or the effect of the toxic chemicals at a Love Canal-usually proposes a hypothesis, then seeks to test it by experiment or observation. If the evidence is strongly supportive, the hypothesis may then become a theory or at some point even a law, like the law of gravity. A theory may be so solid that it is generallyy accepted. Example: the theory that cigarette smoking causes lung cancer, for which almost any reasonable person would say the case has been proved, for all practical i purposes. The phrase "'for all prac- tical purposes" is important, for scientists, being practical peo- plh, musr often speak at two levels: the strictly scientific level and'the leveJ of ordinary reason that we require for daily guid- ance. Example: In June 1985, 16 forensic experts examined the bones that were supposedly those of the "Angel of Death," Dr. Josef Mengelt. Dr. Lowell Levine, deltgated by the Depart- ment of Justice, then said; 'he skeleton is that of Josef 0 THE CERTAIhM Mengele withi.r cos Segne of th with ~ the law o cians." Pushed important mati of the patdiolof findings.' (Iat, But all' ar tainty in almos bilit}• that such Widely bc wholly or part) say" reports L , Jnurnal of Madi, help the public with an eltmer a probable nat not certainty. V best opin,'^*t att furure." ) Exa.--,.e: mended'that M cal cancer. Th three years fo:. Statistics had matter is still c changed again Scientists a failing. Whe sionalliy' shows right, the scier ing. The publi have a hard sions. We all todaN, and ano show discussir
Page 16: zjc02a00 Log in for more options!
f Mengele within a reasonable scientific certainty;` and Dr. Mar- cos Segre of the Uivversity of Sao Paulo, explained, "We deal with the law of probabilities. We are scientists and not magi- cians" Pushed by reporters' questions- after all, this was an important matter, and what should the public believe? -several of the pathologists said they had "absoliitely no doubt" of their findings.` (Later evidence made the case even stronger.) But all any scientist can scientifically say -say with cer- tainty in almost any such case-is, there is a very strong proba- ? bility that such and such is true. : Widd'y believed theories or conclusions are often proved ~ wholly or partly wrong. 'When it comes to almost anything we * say;r reports Dn Arnold Relman, editor of the New Ergland s fournal of Medici'ru, 'you, the reporter, must realize-and~ must ; help the public undetstand-that we are almost always dealing ; with an dement of uncertainry. Most scientific information is of a probable nature, and we are only talking, about probabilities, not certainty. What we are concluding is the best we can do, our t best opinion at the moment, and~ things may be updated in the t~ future. Example: Until 1980 the American Cancer Society recom- mended that women have an annual Pap smear to detect cervi- cal cancer. The recommendation was then changed to every three years for many women, after two initial' examinations. Statistics had shown that this would be equally effective.j The matter is still controversial, and the recommendation has been changed again in the light of new knowledge. Scientists are often wrong. In science this is not necessarily a failing. When new evidence disproves an old~ theory, or occa- sionally shows that some little believed, even kooky notion is rigfit, the scientific method is doing what it should. It is work- ing. The public, and even some reporters and especially editors, have a hard time understanding these sometimes drastic revi- sions. We all hear the question, Why do they say one thing today and another thing tomorrow? I was once on a radio talk show discvssing unsettled medical controversies when a testy
Page 17: zjc02a00 Log in for more options!
10 c}ihPTE.R 2 listener phoned in to exdaim, ` They say is a damned liar!" 'hey" of course may be different theys who arrive at dif- ferent conclusions about inconclusive evidence in a thousand' areas: the role of fats and~ cholesterol in the diet, the effects of low-level radioactivity; the cause of' the extinction of dinosaurs. Why so much uncertainty? Science is always a continuing story. Nature is compltx, and almost all methods of observation and experiment are imperfect. "There are flaws in all studies," says Harvard's Dr. Marvin 2,r1en.' There may be weaknesses, often unavoidable ones, in the way a study is designed or con- ducted'. Observers are subject to human bias and error. Subjects fluctuate. Measurements fluctuate. Many studies are thus inconclusive, and virtually no single study proves anything. "Fundamentally" writes Dr. Thomas Vogt, "all scientific investigations require confirmation, and un- til it is forthcoming all results, no matter how sound they may seem s are preliminary.' Medicine, in particular, is full of disagreement and con- troversy. "No clinical trial is ever perfect" Harvard's Dr. John Bailar observes. Unlike new drugs, medical treatments and tests and surgicali operations need, not even be subjected to experi- mental studies before being applied. `Most treatments escape and will' continue to escape rigorous evaluation" Bailar says.s The reasons are many: lack of funds to mount enoughh trials; lack of enough patients at any one center to mount a meaningful trial; the expense and difficulty of doing multicenter trials; the swift evolution and obsolescence of medical tech- niques; the fact that, with the best of intentions,, medieal data- histories, physical examinations, interpretations of tests, descrip- tions of symptoms and discases-arz notoriously inexact and vary from physician to physician; and the serious ethical obsta- dts to trying a new procedure when an~ old'~ one is doing some good, or to experimenting on children, pregnant women, or the mentally ill! While all studies have flaws, some have more flaws than others. Study after study has found that many artides in the most prestigious medical journals are replete with shaky statis- THE CERTAINTY tics and lack of tients' complica: up. Papers pres reported by thc mere progress : tive results that or criticism or uncertain findi The upshc organization's care is based a . . Seemingly doctrines, perp out to be suppK be found." In genera possible benef. that only a ra cancer. Only r less dra-"- trc om}; o-" t_t~ is rich in, tren or statistically discarded. Occasiom sults: More of data that contr tical methods ascribing fraui inmindtheo competence tc So some tainry need n survive on, th policy, to govc basis of incom can do so. N C N W CA ~ N ~ m O
Page 18: zjc02a00 Log in for more options!
tics and lack of any explanation of such crucial matters as pa- tients' complications and the number of patients lost to follow- f up. Papers presented at medical meetings, many of them widely reported by the media, are even Itss reliable. Many papers are mere progress reports on incomplete studies. Some state tenta- tive results that later collapse. Some are given to draw comment or criticism or get others interested in a provocative but still uncertain~ finding.b The upshot, according to Dr. Gary Friedman of the Kaiser organization's Permanente Medical Group: "Much~ of health care is based on tenuous evidence and incomplete knowledge. . .. Seemingly authoritative statements and accepted~ medical doctrines, perpetuated through textbook and lectures, often turn out to be supported' by the most meager of evidence, if any can be found.' In: general, possible risks tend to be underestimated'd and possible benefits overestimated. For decades surgeons swore that only a radical mastectomy was the treatment for breast cancer. Only recently were clinical trials mounted to show that less drastic treatments seem equally effective. Prefrontal lobot- omy, overstrict bed rest, drugs by the c.arload-medical history is rich in treatments that were given for years without question or statistically rigorous study, only to be proved wrong and discarded. Occasionally, unscrupulous investigators falsify their re- sults. More often, they may wittingly or unwittingly play down data that contradict their theories, or they may search out statis- tical methodfr that give them the results they wanr. Before ascribing fraud, says Harvard's Dr. Frederick Mostelltr, " keep ~ in mind the old saying that most institutions have enough in- competence to explain almost any results" So some uncertainty almost always prevails. But uncer- tainty need not stand in the way of good sense. To live-to survive on this globe, to maintain our health, to set public policy, to govern ourselves -we almost always must act on the basis of incomplete or uncertain information. There is a way we can do so.
Page 19: zjc02a00 Log in for more options!
Somehow the wortdrous promise of t}ie earthl is thai ithert are things beautiful in it, things wondrous artd alluring, and by virtue of your trade, you want to underswtd them. -Mitchell Feigenbaum Corref/ Uaite+tity physiciu md'rmd&~ The great tragedy of Sciener-the slaying of a bwutifullhypothesis by an ugi~ fAct. -'ITwmas Henry Huxlev TO neporters, the worid is full of true believers, peddling their "truths." The sincerely misguided and the outright fakers are often highly convincing, also newsy. How can we tell the facts, or the probable facts, from the chaH? We can borrow from science. We can try to judge all possi- ble claims of fact by the same methods and rules of evidence that scientists use to derive some reasonable guidance in ~ scores of unsetded issues. As a start, we cam ask these questions: How do yvu knom? Have the cfaims ban subjeckd'to any studies or experiments? Were the studies acceptable ones, by general agraenunt ? For exam- ple: Were they without any substantial bias?' Have nsulls been fairly consirteni from study to study? Have du fmdtngs >•uvlted in a coruenrtu among others in the same frcld ? Do at luast the majority of infornrrd ' fxrsonr agrec?' Or should ux unlhhald judgnrn! until there is more euidence? Always: Are the cancGut'aru backed by beGictzbk stasistrcal aiderrce.P 12 THE SCIEh „rlC ' And mhat it t/u c be? Obviousty, rather than nur that reporters c There art usefiil' ones: T}' interpreting da a way of' extr-ac( of mathematic: Statistics c and inexpert si be difficult for t possible. Unce in~ almost all. There are "Edison had it author. 'It doe lt did not take ton's et~ -.nt t. centuny . .,'9( until'then hac Overwhe probability, 4 and the use c called the on many events: women, yet t before it bec: develbp hear some years r was to: womt The bes line (for exar a study is ac
Page 20: zjc02a00 Log in for more options!
® ® ® ® THE SCI&VIIFIC WAY 13 And uAet is tke degna of ccriainty or unartuiruy.~ How sure can you be? Obviously, much of statistics involves attitude or policy rather than numbers. And much, at least much of the statistics that reporters can most readily apply,, is good sense.. There are many definitions of statistics as a tool. A few useful ones: The science and art of gathering, analyzing, and interpreting data; a means of deciding whether an effect is real; a way of extracting information from a mass of raw data; a set of mathematical, processes derived from probability ttteory. Statistics can be manipulated by chaiiatans, seif-deluders, and inexpert statisticians. Deciding on the truth of a matter can be difficult for the best statisticians, andsometunes no decision is possible. Uhcertainry will ever rule in some situations and lurk in~ almost all. There are rare situations in which no statistics are needed.. "Edison had it easy," says Dr. Robert Hooke, a statistician and author. "It doesn't take statistics to see that a light has come on."' It did not take statistics to tell 29thrcentury physicians that Mor- tons ether anesthesia permitted painltss surgery or to tell 20th- century physicians that the first antibiotics cured infections that until then had' been highly fatal. Overwheltningly, however, the use of statistics, based on probability, is called the soundest method' of decision making, and the use of large numbers of cases, statistically analyzed, is called the only means for determining the unknown cause of many events. Birth control pills were tested on several hundred women, yet the pills had to be used for several years by millions before it became unequivocally dear that some women would develop heart attacks or strokes. The pills had to be used for some years more before it became dear that the greatest risk was to women who smoked and women over 35. The best statisticians, let alone practitioners on the firing line (for example, physicians), often have trouble deciding when a study is adequate or meaningfL Most of us cannot become N ® ® ® ® .,...-ow. *:. ® ®
Page 21: zjc02a00 Log in for more options!
14 CHAPTER 3 statisticians, but we can at least learn that there are studies and' studies, and the unadorned c]aim~°1Ne made a study" or "We did' an experiment" may not mean much. We can lcarn to ask more pointed questions if we understand some basic concepts and other facts about scientific studies. These are some bedrock statistical concepts: • Probability • 'Power" and numbers • Bias and confounders • Variability Probability Scientists cope with uncertainty by measuring probabilities. Since all i experimental results and all events can be influenced by chance and almost nothing is 100 percent certain in science and medicine and life, probabilities sensibly describe what, has happened and should happen in the future under similar condi- tions. Aristotle said, 'he probable is what usually happens," but he might have added that the improbable happens more often, than most of us realize. The accepted numerical expression of probability in evalu- ating scientific and medical studies is the P(or probab:lf y) ~value. The P value is one of the most important figures a reporter should look for. It is determined by a statistical formula that takes into account the numbers of subjects or events being com- pared in order to answer the question, could a difference or result this great or greater have occurred by chance alone.7'By more precise definition, the P value expresses the probability that an observed relationship or effect or result could have samrd to occur by chance f there had aceually ban no rral efict. A low P value means a low probability that this happened,, that a medical treatment, for example, might have been declared beneficial when in truth it was not. Here is why the P value is used to evaluate results. A THE SCIENTIFIC scientific invest commonly sets h}potlu_sis; that back the origin pothesis. The number or as "gneater than' pened, that th r -,.nanoe -or, , to carialrnn. •By com only 5 or fewe pened by char usually called . ues are used). ally implies th • A Alker statistically sig result is due tc In ~~ the shoL dinary logic. replaces 'it ca Why the People have ; purposes. Anc Mosteller telll class and afte: cious going o: the fifth heac chance in 16 that five heacr there is some neighborhooc Another late a cnnfsde.
Page 22: zjc02a00 Log in for more options!
THE SC18fPI7FIC WAY 15 ® scientific investigator first forms a hypothesis. Then he or she cotnmonly sets out to try to drsprorx it by what is called the wd h,Ybnthuczr.• that there is no effect, that nothing wi1 happen.. To back the original hypothesis, the results must rtjad the null hy- pothesis. The P value, then, is expressed either as an exact number or as <.05, say, or >.05, meaning less than' or 'greater than" a 5 percent probability that nothing has hap- pened, that the observed result could have happened just by e$ance-or, to use a more elegant statistician's phrase, by mndom = canation: • By convention, a P c+aluc of . 05 or 14u; meaning there are ` only 5 or fewer chances in 100 that the result could have hap- ` pened by chance, is most often regarded as low. This value is ' usually calltd'statirtically s~mufrcant (though sometimes other val- 5 ues arc used), The unadorned term 'statistically significant" usu- _ ally implies that P is .05 or less. _ • A higher P cnlire, one graater than . 05, is usually seen as not statistically significant. The higher the value, the more likely the ~~ result is due to chance. t ~ In common language, a low chance of chance alone calling the shots replaces the ~it's certain" or 'dose to certain' of or- dinary logic. A strong chance that chance could have ruled replaces "it can't be" or 'almost certainly can't be." Why the number .05 or less? Partly for standardization. People have agreed that this is a good cutoff point for most purposes. P.rnd partly out of old friend common sense. Frederick Mosteller tells us that if you toss a coin repeatedly in a college dass and after each toss ask the class if there is anything suspi- cious going on, 'hands suddenly go up all over the room' after the fifth head or tail in a row. There happens to be only 1 chance in 16-.0625, not far from .05, or 5 chances in 100- that five heads or tails in a row will show up in five tosses, 'so ~ there is some empirical evidence that the rarity of events in the 0 neighborhood of .05 begins to set peoples teeth on ed'ge" N Another common way of reporting probability is to calcu- ~ late a confrdenu 1a~1; as well as a confdnr,c interpal (or c»nf:dimc edimce ~ ~ ~ Ll'1 e 0 M ® M
Page 23: zjc02a00 Log in for more options!
36 CHAPTER 3 limits or rrnege)'. This is what happens when a politiral pollster reports that candidate X would~ now get 50 percent of the vote and thereby lead candidate Y by 3 percentage points, 'with a 3- percentage-point margin of error plus or minus and a 95 per- cent confidence level.' In other words, Mr. or Ms. Pollster is 95 percent confident that X's share of the vote would be someplace bet+ween 53' and'47 percent. Similarly, candidate Y's share might be 3 percentage points greater (or less) than the figure predicted. In a close election, that margin of error could obviously turn a predicted defeat into viaory: And that sometimes happens. An~impottant point in looking at the restilts of political polls (and any other statements of eonfidence): In the reports we read, the plus or minus 3 (or whatever), percentage points is often omitted, and the pollster merely mentions a'3-point margin of error.° This means thete is actually a 6-point range within which the truth probably lurks. The more people who are questioned in a political poll or the larger the number of subjects in a medical study,, the greater the chance of a high confidence level and a narrow, and there- fore more reassuring, confidence interval. No matter how reassuring they sound, P values and confi- dence statements cannot be taken as gospel, for .05 is not a guarantee, just a number. There are several important reasons for this. • All that P values measure is the probability that the results might have been produced by some sneaky random process. In 20 results where only chance is at work, 1, on the average, will have a reassuring-sounding but misleading P vali,e of <.05. One, in short, may be a false positive. Dr. Marvin Zelen points ouo that there may be 6,000 to: 10,000' clinical (medical) trials of cancer treatment under way today, and if the conventional value of .05 is adopted as the upper permissible limit for false positives, then every 100 studies with no actual i benefit, may, on ~ average, produce 5 false-positive results. Hence, we may expect 50 false positive results, on THE SC]£'.TIFIC V average„forever fact has said', "W. chemotherapy in therapies in the paths. Arrtaangly; tected. Scientists negative results. them. Nor are se ing studies that firmatoryy studie • Statistical i cause and effect member the roo Uriless an associ thatthecaseisc ing more study To statistic ference betweer there is r ' -as 1 conelatinn .tn • If the nw value may sim detect somethin jects. Highly "si ble differences i. • An impr other variable not taken into • Statistica cal l- that is, m rienced reporte and jump to t1 called their stu tween two !larF
Page 24: zjc02a00 Log in for more options!
® THE SCIEMIFIC WAY 17 average, for every 1,000 trials with no beneficial effects! Zden in fact has said, 'We may now have reached an impasse in cancer chemotherapy in which there are large numbers of false-positive therapies in the clinic;' leading physicians down many false paths. Amazingly, most false positives probably remain unde- tected. Scientists do not profit much professionally by reporting negative results. journal editors are not keen on publishing them. Nor are scientists keen on doing costly and time-consum- ing studies that merely confirm someone else's work, so "con- firmatory studies are rare," Zelen reports. • Statistieal significance alone does not mean there is a cause and effect. Corrrlation or associatioa is not causation. Re- member the rooster who thought his crowing made the sun rise?' Unless an association is so powerful and so constantly repeated that the case is overwhelming, association is only a due, mean- ing more study or confirmation is needed. To statisticians, incidentally, there is this important dif ference between correlation and association: Auoaation means therc is at least a possible relation between two variables. A comfation is a measure of the association. • If the number of subjects is too small, an unimpressive P value may simply mean that there were too few subjects to detect something that might have shown an effect in more sub- jects. E-iighly "significant" P values can sometimes adorn negiigi• ble differences in large samples. • An impressive P value might also be explained by some other variable or variables -other conditions or associations - not taken into account. • Statistical significance does not mean biological, dini- cal-that is, medic.af-or practical significance, though inexpe- rienced reporters sometimes see or hear the word "significant" and jump to that condusion„ even reporting that the scientists called their study "significant." Example: A tiny difference be- tween two large groups in mean hemoglobin concentration, or 0 M CH W 0 ® 0 ®
Page 25: zjc02a00 Log in for more options!
]8 CHAPTER 3 red blood count (say, 0.1 g/100 mL, or a tenth of a gram per 100 rnilliliters)i may be statistically significant yet medically meaningiess:' • Eager scientists can consciously or unconsciously manip- ulate the P value by failing, to adjust for other factors, by choos- ing to compare different end points in a study (say, condition on Itaving the hospital' rather than length of survival), or by choos- ing the way the P value is calculated or reported. There are several mathematical paths to a P value, such as the chi-square ()?), t, F,, r, andpaired t tests. All may be legiti- mate. But be wanned; Dr. David Salsburg of Pfizer, Inc., has written in the Ameican Statistuii of the unscrupulous practi- tioner who "engag,rs in a ritual known as 'hunting for P values' " and finds ways to modifiy the original data to "produce a rich collection of small' P values" even if those that result from simply comparing two treatments 'never reach the magical .05 "' "If you look hard enough through your data," contributess an investigator at a major medical center, "if you do enough subset analyses, if you go through 20 subsets, you can find one"-say, "the effect of chemotherapy on premenopausal women with two to five lymph nodes"-'with a P value less than. .05. And people do this" "Statistical tests provide a basis for probability statements," writes Dr. John Bailar,, "only when the hypothesis is fully devel- oped before the data are examined.... If even the briefest giance at a study's results moves the investigator to consider a hypothesis not formulated before the stUdy was started, that glance destroys the probability value of the evidence at hand':" (At the same time, Bailar adds, 'review of data for unexpected .. dues ... can~be an immensely fruitful source of ideas" for new hypotheses "that can be tested' in the correct way" And occa- sionally 'findings may be so striking that independent confirma- tion ... is superfluous.")° A rather sophistieated-and possibly touchy - line of ques- tioning that some reporters might want to try if they're skeptical: How did yare mrirx at yow P oaGw? Did wu use the tet planntd in THE SCIEh'T'[FIC ~ k advance t n y(ltlf pro: report tha brst-soun, An6you~ m, The laws of even impossible- We've all i tal and bumped ini don't know, but work, the chanca 1,024. Yet I woi, year period. W? statistiaans call' few people wi& cover,, there will birth defects thE in a great while In a large unusual. They and ofter µ duce unr_~olf evidence. 'he large number c occurred. They itry, are wrong.' 'We [repo dence,". Philip and we are rif mind our read from a few in member. The A statistic people or, a st; whom such ar The chance of oping leukemi
Page 26: zjc02a00 Log in for more options!
i THE SCIEMFIC WAY 19 adramut in your protarol or study derigre; or did you apply srrxral Grsts, then report the best-souwrdiitg one? And you may think of other questions. The laws of probability aL#o teach us to apaY some unusual, even impossible-sounding events. We've all taken a trip to New York or London or someplace and bumped into someone from home. The chance of that?' I don't know, but if you and I tossed for a drink every day after work, the chance that I would ever win 10 tunes in a row is 1 in 1,024. Yet I would probably do so sometime in a four- or five- year period. What I like to call ttu Law of Unusual Events- statisticians calliir the Law of Small Probabilities-tells us that a few people with apparently fatal illnesses will inexplicably re- cover, ttiere will be some amazing clustets of cases of cancer or birth defects that will have no common cause, and I may once in a great while bump into a friend far from home. In a large enough population such coincidences are not unusual. 1liey are the rule. They produce striking anecdotes and often striking news stories. In the medical world they pro- dutx unreliable, though often cited, testimonial or anecdotal evidence. 'he world is large," Vogt notes, "and one can find a large number of people to whom the most bizarre events have occurred. They alli have personal explanations. The vast major- ity are wrong.7 'We [reporters] are overly susceptible to anecdotal evi- dence," Philip Meyer writes. `Anecdotes make good reading,, and we arc right to use thern.... But we often forget to re- mind our readers-and ourselves-of the folly of generalizing from a few interesting cases.... The statistic` is hard to re- member. The success stories are not." A statistic to ask about is the drnomirralor-the number of people or, a statistician would say, the populalion or domain -in whom such an event might happen. Zden cites this example: The chance of any youngster between ages five and nine devel- oping leukemia is 3 in 100,000 per year. In a school with 100 ® e 8 © ®
Page 27: zjc02a00 Log in for more options!
children ~ of this age group, we would expect only 3 cases in 100 years. But in this nation with thousands of schools, we would occasionally-such is chance-firld schools with 3 or more cases in a single year. 'Then one is faced with the problem of interpre- tation," Zden says. "Is this one of those rare events that is surely going to be observed? Or is it due to some causal factor?" A reporter in tiis instance might ask a statistician at the National Cancer Institute or a medical center, What is the chance of such an event in such a population? How many similar unusual events are probably never reported? 'Tower" arad"hjumbers This gets us to another statistical concept: pouxr. Statisti, cally, 'powet' means the probability of finding something if it's there. Example: Given that there is a true effect, say a difference between two medical treatments or an inarase in cancer caused by a toxin in a group of workers, how likely are we to find it? .Samplc siu confers power. Statisticians say, "Funny things can happen in small samples without meaning very much" ... "There is no probability until the sample size is there" ... "Large numbers confer power" ..."Large numbers at least make us sit up and take notice."' All this concern about sample size can also be expressed as the lau of lnrgc numbers, which, says that as the number of cases increases, the probable truth of a condusion or forecast in- creases. The vaMity(truth or accuracy) and relinbility (reproduci- bility) of the statistics begin to converge on the truth. We already learned this when we talked about probability. 'There u another unrrlated uac of the wotd 'pawer, 5oenuns rnrrunoniy epeak of inocsing or 'raiang" some quantity by a puar of 2 or 3 or 100 or w}iatever: 'Powef hec mrina the product you get when you muluply a number by itarlf one or more umes. 7htu, in 2 x 2= 4, 4 is the ¢condpower of 2, or to put it ano[her way, there are two 2's in your equation. This is oommonly written 2' and known as 2 to the seoond power or,iust 2to the aecond. In 2 x 2 x 2= 8, 2 his been ruted to the third power. Whrn you think abour 21D„you we the need for the shorthand. But by thinkin. both sample si too affects the p if the number , shift from succc cally decrease t, If six patit rate, the shift success rate to any case that t valid or aceur, not have relia' samples. The no fatal biases would have ir I have m\ dairn, T' k a• finding '3c example, zoume Would it aersn Or if they 100 percent ir total and subtr changed4 rxcei analysis. But 1~ times try , thre problem or er
Page 28: zjc02a00 Log in for more options!
0 THE SCIENTIFIC WAY 21 But by thinking of power as statisticians do-as a function of both sample size and the accuracy of measurement, since that too affects the probability of finding something-we can see that if the number of treated patients is small in a medical study, a shift from success to failure in only a few patients could dramati- cally decrease the success rate. If six patients have been treated with a 50 percent success rate, the shift to the failure column of just one would cut the success rate to 33 percent. And the total number is so small in any case that the rtslilt has little reliability. The result might be valid or accurate, but it would not be generalizable - it would not have reliability until confirmed by careful studies in larger samples. The larger the sample, and assuming there have been no fatal biases or other flaws, the more confidence a statistician would have in the result. One canny science reporter,L,ewis Cope, says, I have my own "rule of two." If someone makes some numerical claim, I look at the numbers, then see how much I might change the finding by adding or subvacting two from any of the 5gures. For example, someone says there ate five cases of cancer in a community: Would it seem meaningful if there were three? Or if there were eight cases this year but four the year befotz-a 100 percent increase-I ask myself, "If I add two cases to last year's total and'subtraa two from this yeat's, is there a chance things haven't changed, except by chance?" This approach will never supplant neftned analys;s. But by playing around with the nurnbers this way-I some- times try three instead of two- a reporter can often spot a potential problem or error. A statistician says, "I'his can help with small numbers but not large ones" Mosteller contributes "a little trick I use a lot on counts of any size." He explains, "Let's say some political unit has 10,000 crimes or deaths or accidents this year. Has some- thing new happened? The minimum standard deviation [see ® ® a ® M M M . .;<=_ - .. k
Page 29: zjc02a00 Log in for more options!
22 cHAPm 3 page 33] for a number like that is 100-that is, the square root of the original number. That means the number may vary by a minimum of 200 every year without even considering, growth, the business cycle, or any other effect. This will supplement your ttportet's approach" Looking for error in reported results, statisticians try to spot both false positives and false negatives: The folse pontirx (or Type I or alpha errvr in statistical language you may see) is to find a result or effect where there is none. The fa1ct negatiue (or Type II or beta error) is to miss an effect where there is one. The latter is parvcularly common when thenc are small numbers. 'I'hene are some very well conducted studies with small numbers, even five patients, in which the results are so dear-cut that you don't have to worry about power," says Dr. Relman. You still have to worry about applicability to a larger population, but you don't have to doubt that there was an effect. When results are nega- tive, however, you have to ask, How large would the effect have to be to be discovered?" Many scientific and medical studies are underpowered - that is, they include too few cases.'Whenever you see a negative result," another scientist says„ you should ask, What is the power? What was the chance of finding the result if there was one?" One study found that an astonishing 70 percent of 71 well-regarded clinical trials that reported no effect had too few patients to show a 25 percent diSerence in outcome. Half of the trials could' not have detected a 50 percent difference.' A statistician scanned an article on colon cancer in a lead- ing journall "If you read~ the artic3e carefully," he said,,'you: will ~see that if one treatment was better than, the otlier-if it would increase median survival by 50 percent, from five to seven and a half years, say-they had only a 60 percent chance of finding it out. That's little better than tossing a coin!" The weak power of that study would be expressed numeri- cal]y as .6; or 60 percent. Scan an article's fine print or foot- notes, and you willsometimes find such a poux- sratement: Most THE SCIENriFIC authors still dc cially when rea How largc lated that a tri~ percent chancf Sometime ltind'of cancer pect that the r X, you woulc excess rate to significance. 'I suffer a myoci oral contracep cent sure of ot you would ha Even ~ the zero numeratc treated 14: ltu] lAr dysfunctioi remains. how any re ~n. may be unall Al] this n 1Nhat's the .cizc 20 individual! persons woul. Always try to The mosthem. When numbers and people, or ev< •And imow t people 5tauricall one or morc pari rorrs, or pMyad~ tarm iwcnsr for a
Page 30: zjc02a00 Log in for more options!
t THE setENntIc WAY authors still don't report one, but the practice is growing, espr cially when results are negative. H'ow large is a large enough sample? One statistician calcu- latedthat a trial has to have 50 patients before there is even a 30 percent chance of finding a 50 percent difference in results. Sometimes large populations indeed are needed'.10 If some kind of cancer usually strikes 3 people per 2,000, and you sus- pect that the rate is quadrupled in people exposed to substance X, you would have to study 4,000 people for the observed excess rate to have a 95 percent chance of reaching statistical significance. 'Ihe likdihood that a 30-to-39-year-old woman will suffer a myocardial infarction,, or heart attack, while taking an oral contraceptive is about 1 in 18,000 per year. 'I'o be 95 per- cent sure of observing at least one such event in a one-year trial, you would have to observe nearly 54,000 women." Even the lack of an effect-statistically sometimes called a zero numerator-can be a trap. Say, someone reports, "1Ne have treated 141eukemic boys for five years with no resulting te:sticu- lar dysfiunction"-that is, zero abnormalities in 14. The question remains, how many cases would they have had to treat to have any real chance of seeing an effect? The probability of an effect may be small' yet higtily important to know about. All this means you must often ask, Whai's ymcr dmominntor? iWJrat's the siza of your pop'ulalinn?' A disease rate of 10 percent in 20 individuals may not mean much. A 10 percent rate in 200 persons wvuld be more impressive. A rate is only a figure. Always try to get both the numerator and the denominator. The most important rule of all about any numbers: Ask for them. When anyone makes an assertion that should include numbers and fails to give them, when anyone says that most people, or even X percent, do such and such, you should ask, •And know t}ut to a rtacrocian a populauon dos not nsa.uily mean a group of pwpie. S~y, a p~ is any, group or mLfecti°n of pertinan units-urun wiih one or moce perunent char.ctaiwa in aawnoo-pmpk.,evena, objav. reeMda, ew .cma, or physiblogical values (likr blood prenure readings). Stanstxan• allo use the tertn owavsr 6or a whok group of peopk or unita under nu* 4+ I ® ® ®
Page 31: zjc02a00 Log in for more options!
M ® 24 cRAFTER3 What mr .yow nonbas? After aIl, some researchers reportedly announced a new treatment for a disease of chickens by saying, °33.3 percent were cured, 33.3 percent died, and the other one got away." Bias and Con f ounders One scientist once said that lefties are overreptesented among basebalI's heavy hitters. He saw this as 'a possible result: of their hemispheric lateralization, the relative roles of the two sides of the brain.' A critic who had seen more ball games said some simpler covariables could explain the difference. When they swing, left-handed hitters are already on the move toward first base. And most pitchers are right-handers who throw most often to right-handed hitters.l' Scientist A was apparently guilty of bias, meaning the intro- duction of spurious associations and: error by failing, to consider other influential factors. The other factors may be called QorhWia- blcs, uozmialzs, rnterrurung or conhib111ing wnables, aon, found:ng amra- bks, or confounders. A simpler term may be "other explanations" Statisticians call bias 'the most serious and pervasive prob- lem in the interpretation of data from clinical trials" ..."the central issue of epidemiological rescarch" ..."the most com- mon cause of unrellable data' Able and conscientious scientists try to eliminate biases or account for them in some way. But not everybody who makes a scientific, medical, or environmental claim is that skilled. Or that honest. Or that all-powerful. Some biases are unavoidable by the very difficulty of much research, and the most insidious biases of all, says one statistician, are "those we don't know exist." Some biases may be uncovered'by assiduous investigation. A father noticed that every time one of' his I 1' kids dropped a piece of bread on the flbor, it landed with~ the buttered~ side up~ "I'his utterly defies the laws of chance," he exclaimed. Close examination disdosed the cause: The kids were buttering their bread on both sides. THE SCIEhTIFU, I told thi called about ~ prizes in a cht}iat this could bought nearly He had o tist and repon factm5?. Not even human failing 'I wouldn't h investigators ( maybesoe) overr-rosy hue Other pc motion and p scious or unc bias. Dr. 'Ihc New Y-^i^ te: firm, 1 )ic main statisti, though not sc drugs for diz prrviously pt acknowledge, known to tht In contrr dru g firnn bi signed by in( side board 1e outcome. `Itt iJiterzsi in bic disdbsed so ~ Even a Johns Hopki~ with prisrns
Page 32: zjc02a00 Log in for more options!
® ® ® THE SCIENIYFIC WAY 25 I told this story to one statistician, who said, "I was once called about a person who had won first, second, and third prizes in a church lottery. I was asked to assess the probability that this could have happened. I found out that the winner had bought nearly all the tiekeu." He had of course asked the obvious question for both scien- tist and reporters: Could the rdatranship dcsc7s'btd be orplairud by other fwto,.~ Not everyone will tell you, of course, for bias is a pervasive human fairig. As one candid scientist is said to have admitted, "I wouldn't have seen it if I hadn't believed it" Enthusiastic investigators often tell us thar findings are exciting. But they may be so exciting that the investigators paint the results in over-rosy hues. Other powerful human dtives-the race for academic pro- motion and prestige, financial connections -can also create con- scious or unconscious conflicts of interest or attitudes that feed bias. Dr. Thomas Chalmers of Mount Sinai Medical Center in New York tells of a drug trial~ financed' by a pharmaceutical firm, in which both the head of the study committee and the main statisticians and analysts were the firm's employees, though not so identified in any credits. He tells of a study of oral drugs for diabetes in which the fact that the first author had previously published 14 artid s on the subjecr, and in 7 had acknowledged support by the ?~vg manufacturers, was "not known to the reader" In contrast, Chalmers describes a study also financed by a drug firm but with a contract specifying a study protocol de- signed by independent investigators and monitored' by an out- side board less likely to be influenced by a desire for a favorable outcome. 'It is never possible to eliminate" potential conflicts of interest in biomedical' research, he concludes, but they should be disclosed so others can evaluate them. "' Even a genius may be biascd! Horace Freeland Judson of Johns Hopkins University tells how Isaac Nervton experimented with prisns and ltnses and developed a theory of color, light, Is M n iiim ® i ® M w
Page 33: zjc02a00 Log in for more options!
and the solar spectrum. He did not report seeing some dark lines-absorption lines, which mark varying wavelengths-that his instruments must have shown. A modern scientist argues that I`lewton's theory, not his instruments, had no place for that evidence: 'To the observing scientist, hypothesis is both friend and~ enemy'" For years technicians making blood counts were guided by textbooks that told them two or more 'properly" studied samples from the same blood should not vary beyond narrow "'allowable" limits. Reporte& counts always stayed inside those limits. A Mayo Clinic statistician rechecked and found that at least two thirds of the time the discrepancies exceeded the supposed limits. The technicians had' been seeing what they had been told to expect and'discounting any differences as mistakes. This also saved them from~ the additional labor of doing still more count- ing. Both the biaced obsenrr and the biared .eubjat are common in medicine. A researcher who wants to see a treatment result may see one. A patient may report one out of eagerness to please the researcher. There is also the powerfiil plaubo ffict. Summarizing many studies, one scientist found that half the patients with headaches or seasickness-and a third of those suffering from coughs, mood changes, anxiery, the common cold, and even the disabling chest pains of angina pectoris - rrponed relief' from a "nothing pill."" A placebo is not truly a nothing pill;, the mere expectation of relief seems to trigger important effects within ~ the body. But in a carefW study the placebo should not do as well as. a test medication; otherwise the test medication is no~better than a placebo. Sampling bias is the bugaboo of both political polls and medi, cal i studies. Say you want to know what proportion 1 of the popu- lace has heart disease, so you stand on a corner and ask people as they pass. Your sample is biaaed'; if only because it leaves out those too disabled to get around. Your problem, a statistician would say, is sefatioa. A politiaal pollster who fails to build a valid probability sample, easy when questioning only a thousand or THE SCIEhTIF so people fror. A doctor patienr popul, average-ma tion ~ as a who] treat rrlativel- the dispropor cally seek out Cleveland or ber of di>bcu] a$luent and , werr valuablt the samples ( men and woi An inve distorting, a otherwise `th, in those disc omits those v people n are dn ) they came d< away, they d had unfavor Mostelle ous anestheti' hospitals. Urr dead had be plained by t: wound up w The pre tected, when of patients tr treated conm compared. I randomized
Page 34: zjc02a00 Log in for more options!
THE scIENnFic wAr 27 so people from coast to coast, has equally poor selection." A doctor in a clinic or hospital with an unrepresentative patient population-healthier or sicker or richer or poorer than average-may report results that do not represent the popula- tion as a whole. Veterans Administration hospitals, for example, treat relatively few women; their condusions may apply only to the disproportionate number of lower-income men who typi- cally seek out the VA hospitals' free care. A celebrated Mayo or Cleveland or Ochsner clinic sees both a disproportionate num, ber of difficult cases and' a disproportionate number of patientss affluent and well enough to travel. The famed Kinsey reports were valuable revelations of sexual behavior but flawed because the samples consisted disproportionately of upper middle-class men and women and of those willing to talk. An investigator may also introduce bias by comutrainirtg, or distorting, a sample-by failing to reveal norverporrse or by otherwise "throwin.g away data.' A surgeon cites his success rate in those discharged from the hospital after an operation but omits those who died during or just after the procedure. Many people drop out of studies-sometunes they just quit-or they are dropped for various teasons: They could not be evaluated, they came down with some *irre]evant" disorders, they moved away, they died. In fact, many of those not counted may have had unfavorable outcomes had they stayed in the study. Mosteller tells of a nationwide study of a possibly danger- ous anesthetic. The investigators n-lied' on autopsy results at 38 hospitals. Unfortunately, only about 60 percent of the relevant dead had been autopsied, and "anything could have been ex- plained by the missing 40 percent, so that part of the study wound up with a handful of nothing" The presence of significant nonresponse can often be de- tected, when reading, medical papers, by counting the number of patients treated! versus the number of untreated or differently treated controls-patients with whom the treated patients are compared. If the number of controls is striking•iy greater in a randomized clinical trial (though not necessarily in an epidemio- ® ®
Page 35: zjc02a00 Log in for more options!
® 28 GHAP7ER'3 logical or environmental study), there were probably many dropouts. A well'-conducted study should describe and account for them. A study that does not may report a favorable treat- menrresult by ignoring the fate of the dropouts-a confounding variable. Age, gender, occupation, nationality, race, income, so- cioeconomic status, health status, and powerful behaviors like smoking, are all possible confounding-and frequently ig- nored-variables. In the 1970s, foes of adding fluoride to city water pointed to crude cancer mortality rates in two groups of 10~U.S. cities. One group had added'fluoride to water, the other had not, and from 1950 to 1970 the cancer mortality rate rose faster in the fluoridated cities. The National Cancer Institute pointed out that the two groups were not equal: The diference in cancer deaths was almost entirely explained by differences in age, race, and sex. The age-, race-, and sex-adjusted di$erence actually showed a small, unexplained lower mortality rate in the fluoridated cities:" If you look carefully at the fate of women taking birth control pills, you find that advancing age and smoking arr the two great eonfounders. You must take both into account to find the greatest clusters of ill effects. Smoking has been an important confounder in studies of industrial' contaminants like asbestos,, in which, again,, the smokers suSer a disproportionate number of ill eSects.1e A 1947 survey of Chicago lawyers showed that those who had mere high school diplomas before entering legal training earned 6.3 percent more, on the average, than college gradu- ates. The confounder here-the real explanation-was age. In 1947' there were still many older lawyers without college de- grees, and they were simply older, on the average, and hence more established." Occupational studies often confront another seeming para- dox: The workers exposed to some possible adverse effect turn out to be healthier than a control group of persons without suchh exposure. The confounder: the well-known henllhy-uer,Ies effect: ® ® THE SCIEL'TIFIC. Workcrs tend t~ in gencrall Some stu increase in cas gens. It took a They commo; were emitted. seratr,rficd, or br Such findings genetics, whei ing or ruling blcs - are om put blacks in , ent'rMriabl¢„ th "Inatw plains, "one which affects that more pe( seen as the t incide, ) t of cours-, so stantly expos than others. I the black wo one indepenc portant undt may be that each other, coworkers, t cold weather dry.-ing nasal viruses. The sea pursuits of t physician wl any student
Page 36: zjc02a00 Log in for more options!
® 0 THE SCIENIIFIC WAY Workers tend to be healthier and live longer than the population in general. Some studies of workers in steel mills showed no overall increase in cancer, despite possible exposures to various carcino- gens. It took a look at black workers albne to find excess cancen They commonly worked~ at the coke ovens, where carcinogens were emitted. This was a case where the population had to be stbatifug or broken up in some meaningful way, to find the facts. Such findings in blacks often may be falsely ascribed to race or genetics, when the real or at least the most important contribut- ing or ruling variables-to a statistician, the indepnudent raricr- bles-are occupation and the social and economic plights ttiat put blacks in vulnerable settings. The excess cancer is the depmd- ent aanahle„ the result. "In a two-variable tdationship," Dr. Gary Friedman ex- plains, "one is usually considered the independent variable, which affects the other or dependent variable.''O Take the fact that more people get colds in winter. Here weather is commonly seen as the underlying, or independent variable, which affects incidence of the commoncold, the dependent variable. Actually, of course, some people, like children in school who are con- stantly exposed to new viruses, are more vulnerable to colds than others. In the case of these children, then, as in the case of the black workers at the coke ovens, there is often more than one independent variable. Also, some people think that an im- portant underlying reason for the prevalence of colds in winter may be that children are congregated in school, giving colds to each other, thence to their families, thence to their families' coworkers, thence to the coworkers' families, and so on. But cold weather-and home heating?'-may still figure, perhaps by dzw,; nasal passages and making them mote vulnerable to viruses. The search for tsw rmrabla is obviously one of the main pursuits of the epidemiologist; or disease detective-or of any physician who wants to know what has affected a patient, or of any student of society who seeks true causes. Like colds, many ® 0 e
Page 37: zjc02a00 Log in for more options!
cHAYrzR3 medical conditions, such as heart disease, cancers and probably mental illness, have multiple contributing factors. Where many knowns measurable factors are involved, statisticians can use mathematical teclutiques-the terms you willisee include malteplc regrrssion, rnaltivariatc analysit„ and discriminmtt analysis and fnctof cGcrter, path, and twa stc~c ldzrt-squarrs mial ysv - to relate all the variables and try to find which are the truly important predic- tors. Yet, some situations, like the striking decline in U.S. heart disease monality in recent years, defy such analyses. These years have seen several major changes in American life that may play a role: less smoking among men, consumption of a leaner diet, more tea,eational exercise (though more sedentary work). Medical care is far better, including the treatment of hypertension, which disposes people to heart disease. Many of these variables cannot be well measuredi and' the effect of' some is debatable, so-a common situation in science-the truth re- mains uncertain. variabzti y Doctors always say, 'Most things are better in the morning," and they're mostly, right. Most chronic or recurring conditions wax and wane. We tend to wake up at night when the condition is at its worst. Then, no matter what is done by way of treat- ment the next day, the odds are that we'll feel better. This is regression towmd t1u moan: the tendency of aIl values in every field of science-physical, biological, social, and eco- nomic-to move toward the average. Tall! fathers tend to have shorter sons, and short fathers, taller sons. The students who get the highest grades on, an exam tend to get, somewhat, lower ones the next time. The regression effect is common to all repeated measurements. Regression is part of an even more basic phenomenom raarintinn, or aoiability. Virtually everything that is measured var- ies from measurement to measurement. When, repeated, every experiment has at least slightly different results. Take a patient's THE Sc[EM1'77FIC blood pressurE row, and the r different times vary gready. The impo also measuren and observer doctors will re be gnossly diff( . heart mutine: hearing to det one time to th cancer resean usual'rcgulari too well and t} enough varial Biological physiology ar tients, T-act di$er i_ jr lations, and- within the sa Every pK each with m: such as heigh and-if we \ tion-we mu We can't get need singie \ Enter ht nYdian, and r some idea o' properties, o' When n maan or cn11u number of v
Page 38: zjc02a00 Log in for more options!
® ® THE SCIENTIFIC WAY 31 blood pressure, pulse rate, or blood count several times in a row, and the readings will be somewhat different. Take them at different times of day or on different days, and the readings may vary greatly. The important: tr.asons? In part, fluctuating physiology, but also measurement errors, the limits of measurement aauracy, and observer variation. Exarnining the same patient, no two doctors wi1I' report exactly the same results, and the results may be grossly different. If six doctors examine a patient with a faint heart murmer, only one or two may have the skill or keen hearing to detect it. Eicpcrimental results so typically di$er from one time to the next that scientific and medical fakers -a Boston cancer researcher, for ezampie-have been detected by the un- usual rcgulariry of their reported results, with numbers agreeing too well and the same results appearing time after time, with not enough variation from patient to patient. Biolqgical umiation is the most important cause of variation in physiology and medicine. Different patients, and the same pa- tients, react differently to the same treatment. Disease rates di$er in diferent parts of the country and among different popu- lations, and-alas, nothing is simple-there is natural variation within the same population. Every population, after all, is a collection of individuals, each with many charat.~teristirs. Each characteristic, or nana6le, such as height, has a dirtrihrtion of values from person to person, and-if we would know something about the whole popula- tion -we must have some handy summaries of the distribution. We can't get much out of a list of 10,000 measurements, so we need singie values that summarize many measurements. Enter here the familiar awqe or, more exactly, the rneme, madicn, and mode. These and a few other measures can give us some idea of the look of the whole and its many measurable properties, or parameurs. When most of us speak of an average, we mean simply the mami, or milMnetic arxm~r, the sum of all the values divided by the number of values. The mean is no mean tool; it is a good way © ® ® ® ® 0
Page 39: zjc02a00 Log in for more options!
32 CHAPTER J to get a typical number, but it has limitations, especially when there are some extreme values. There is said to be a memorial in a Siberian town to a fictitious Count Smerdlovski, the world's champion at Russian roulette. On the average he won, but his actual record was 73 and 1.1' If you look at the average salary in a hospital, you wo not know that half the personnel i may be working for the minimum wage, whilc a few hundred persons make $100,000 or more a year. You may learn more here from the median, the figure that divides a population into two equal halves. The median can be of value when a group has a few members with extreme values, like the 400-pounder at an obesity clinic whose other patients weigh from 180 to 200 pounds. If he leaves, the patients' mean weight might drop by 10 pounds, but the median might drop just l pound.11 The most frequently occurring number or value in a distri- bution is called the modc. Wheni the median and the mode are about the same, or even more when means median, and mode are roughly equal, you can feel comfortablt about knowing the typical value. You still' need to know something about the exceptions, in short, the disper.rion (or spread or scatter) of the entire disuibu- tion. One measure of spread is the range. It tells you the lowest and highest values. It might inform you,, for example, that the salaries in that hospital range from f1'0,000 to =250,000. You can also divide your values into 100:percerrtars„ so you can say someone or something fall5 into the 10th or 71st per- centile, or into quartitrs (fourths) or quirntilrs (fifths). One useful measure is the interqumtile range, the interval between the 75th and 25th percentiles-this is the distribution~ in the middle, which avoids the extreme values at each end. Or you can divide a distribution into n+bgroupr-those with incomes from s10,000 to $20,000, for example, or ages 20 to 29, 30 to 39; and' so on: Al1 these values can easily be plotted. With many of the things that scientists, economists, or others measure-1Qs, for example, and other test scores-we typically tend to see a famil~ THE SCtEhTIFlC iary bel]Lshaped end, or taif. TY 19th-century C But you may i clusters, a Gime• A widely i great deal. No tance from the range, this har how spread ou In what one st in most sets c being measum average by m more than 2 < than 2.57 star "Once yo shaped distrib the whole pict cvrve wl variatik ht the more sprc "Dhrni, nM d.-prncling on tlir diBrrvntts bn.ac: numbrr ol .quares of e pnpulriKm r.u tc.uh A, in Sgimi,ima. ~r~ i man. Ix•in~ ~ an ~u. ua••rdm.e~rmn~iK nc AuItFu»Ln N0 N C.1) CJ1 ~ ~ CO N
Page 40: zjc02a00 Log in for more options!
® ® THE SCIENTIFIC WAY 33 iar, bell-shaped ; rwnnal distributorq high in the middle, low at each end, or 1ail. This is the classic CCouuian currx, named after the 19th-century German mathematician Karl Friedrich Gauss. But you may also find that the plot has two or more peaks or dusters, a 6imodal or multimoda!'dirhibution. A widely used number, the stmtdard dcviation, can reveal a great deal. No matter how it sounds, it is not the average dis- tance from the mean but a more complex figure. ` Unlike the range, this handy figure takes full account of every value to tell how spread out things are-how dispersed the measurements. In what one statistician calls a truly remarkable generalization, in most sets of measurement "and without regard; to what is being measured" only I measurement in 3 will deviate fiom the average by more than 1 standard deviation, only 11 in 2& by more than 2' standard deviations, and only 1 in 100 by more than 2.57 standard deviations. "Once you know the standard deviation in a normal, bell- shaped disu ibution,, according to Thomas Louis, 'you can draw the whole picture of the data. You can visualize the shape of the curve without even drawing the picture, since the larger the variation of the numbers, the larger the standard deviation and the more spread out the curve -and vice versa.n 'Tlcrv i.s nrrn• than mw way tu cakulLic it, and thcrv are avrrd vanatNxu, d'ependint; on the statiwwian:a hurpa•: A uwnmun wK ib to aJcJ the squates of the di/lercnces betw.xn each number uxd the mean, then divide that number by the totat number ot squerts, otten rekinil io aa the am>axr (minus I if vou're toohing at a sampk ot a population rather than the whok population). Then cakulate the squarr eoa ot the n-wL. A., in Snma-timc, vatisuiciana cakulate the uaW"vd'druimmn of 1M nWn-this because ttx• mcan, hring an a%vraW; is ks. .aria6lc than cinyl.• nnawnrnwm.. 5,rtk call thts tM ilmd®d nror cK aandmd'efw o, tM mew A. In, All ttn- jIx;vcarc nwanun. 14 di.lw•rnun.. 2N W ~ N ~ (b cla ®
Page 41: zjc02a00 Log in for more options!
This is the part I always hate. Sit down before fact as a little child, be prepared to give up every prernnceived notion, fbllow humbiy wherever and'to whatever abysses nature leads, or you shaD learn nothing. 4 -Jotin Hunter 1lkA-aawy BrioiA aawniu -Thomas Henry Huxley. THERE is no disease that strikes older people more tragi- cally than Alzheimet's disease, which makes a useless tangle of the brain. At a prestigious New England university a researrh team imaginatively inserted catheters into the skulls of four pa- tients aged 64 to 73 to deliver a continuous infusion of either a theoretically promisimg drug or, altrmately; an ineffectual saline solution for comparison. After 18 months the investigators published a paper saying that according to observations by the patients' families, three patients showed marked improvement and the fourth at least held his own. Favi*+Ating, of course. Some reporters learned of the work and began inquiring. The investigators let a'I'W crew do a story and also held' a news conference, with one patient
Page 42: zjc02a00 Log in for more options!
Example: If the average score of all students who take the SAT college entrance test is relatively low and the spread-tlie standard deviation-relatively large, this creates a very long- tailed, low-humped curve of test scores, ranging, say, fromm around 300 to 1500. But if the average score of a group of brighter students entering an elite college is highs the standard' deviation of the scores will be less and the curve will' be high- humped and' short-tailed, going from maybe 900 to 1500. "If I just told you the means of two such distributions, you might say they were the same," another scientist says. "But if I reported the means and the standard deviations, you'd know theyy were different, with a lot more variations in one" From~ a human standpoint, variation tells us that it takes more than averages to describe individualk: Biologist Stephen Jay, Gou1d learned in 1982 that he had a serious form of cancer. The literature told him the median survival was only eight months after discovery. Three years later he wrote in Discmxr, "All evolutionary biologists know that means and medians are the abstractions," while variation is "the reality," meaning "half the people will live longer" than eight months. Since he was young, since his disease had been diagnosed early, and since he would'~ neceive the best possiblt treatment, he decided he had a good chance of being at the far end of the curve. He calculated that the curve must be skewed well to the right, as the leh half of the distribution hadto be "scrunclied up between zero and eight months, but the upper right half [could] extend out for years." He conduded, "I saw no reason why I shouldn't be in that small tail.... I would have time to think, to plgn and to fight." Also, since he was being placed on an experimental new treatment, he might if fortune smiled "be in the first cohort of a new distribution with . . . a right tail ex- tending to death by natural causes at advanced old age.'" Statistics cannot tell us whether fortune will smile, only that such reasoning is sound. f Studie: Good ( Why think? Why Sit down befors fa notion; follow hur shatl kam nothinc This is the part Ii T HERE is cally than ATzI the brain. At , team imaginat tients aged 64 theoretically p: solution for o0 After 18'r that accordinf patients showr held his own. the work and '. do a story an
Page 43: zjc02a00 Log in for more options!
cttAPTER4 brought forth for on-camera testimonials. Except for some newspapers that decided to print nothing, the story flew far and wide. The head' investigator, a chief resident in neurosurgery, cautioned that the results, though encouraging, were 'very early" and "certainly do not prove this is an effectiwe treat7rtent" He advised healthy skepticism. But headlines unequivocally read: "Alzheimer's Test Found~ Successfiil," "Alz}ieimer's:. A New Pmmise,° "First Breakthrough Against Alzheimer's;:' "Pump Of- fers Hope,' 'Possible Alzheiinet's Cure' Within two months the medical center logged 2,600 phone calls, mainly from desperate families, and critics began asking why a press conference had been held, since a study of only four patients-with unblinded investigators getting their assessments from hopeful families-meant; little. Harvard's Dr. Jay Winsten conduded t}iat 't}ie decision to hold a press conference ... far outweighed in impact the mod- ulating effect of the investigators' qualif}ling language. The vis- ual impact of [one] patient's on-camera testimonials all but guaranteed that TV coverage would oversell the researchs de- spite any qualifying language"" When dubious daims are made - about Alzheimer's, a new cancer drug, a possible AIDS cure-and'the daims get widely reported, there is commonly a lot of postmortem ducking and soulLsearrhung among reporters and'editors. Then someone else makes some sensational' clairrt, and the same thing may happen all over again. The biggest error in mediaaJ science, according to Dr. Thomas Chalmers, is "the uncontrolled pilot study in which the investigators try a treatment on 10 patients, and if it seems to work ... are tempted'to report~ it" to fellow scientists, let alone the media.' All science is only a stab at the truth. Even with the best of statistics, "We scientists don't know how to tell the whole tiuth," Mosteller reminds us.' Outside this honest limitation lie vast realms of inadequate science with plausible-sounding yet shaky, ® UP111=11CMI STUDIES. GOOD A' statistics. A Fren~ said 150 years ac,, the numerical m, time than the tr often give it. `Sa every idiot in th( program thinks The big pi have little to do do with judgme to conduct it, tt fnenzied media many chanoes ' calls for sophist hope of' telling repon? A fundarr ducted study k indude rs and to L J e methods„ and th'is kind of a9N This is n, there is much of numbers a EXpaiM . Student, credit-rated, what has be studies carr) Science as generaiiz tured into science. Ob
Page 44: zjc02a00 Log in for more options!
STUDIES, GOOD AND BAD 37 statistics. A French physician, Pierre Charles Alexandre Louis, said 150 years ago, "I'he only reproach which can be made to the numerical method" is that it 4rtquires much more labor and time than the most distinguished members of our profession" often give it. `Some days," says one modern statisticians "I; think every idiot in the country who can put his hands on a computer program thinks he's a statistician" The big problems of statistics, say its best practitioners, have little to do with computations and formulas. They have to do with judgment, we're told, with how to design a study, how to conduct it, then analyze and' interpret the results. In a day of frenzied media competition for the public's eye and ear-and many chances to do harm~ by shaky reporting-journalism too calls for sophisticated judgment. How, then, can we have some hope of telling which studies seem credible, which we should report? A fundamental principle is that every conscientiously con- ducted study has a careful design: a method or plan of attack to include the right kind and number of patients or petri dishes and to try to eliminate bias. Dif'erent problems require different methods, and one of the most basic questions in science is, Can tlus kind of apxrirnent„ thir dest~n, yield the ansuxrl 'I'his is not a simple question for a reporter to answer, but tliere is much we can know. What kinds of studies, what kinds of numbers and controls and methods, should we look for? Experim.ents versus Seductive Anecdotes Students and eggs can be graded, citizens and cities can be credit-rated, and scientific evidence can be weighed according to what has been called a hierarchy of evidence. Some kinds of studies carry little weight, some more, some a grean deal. Science and medicine started witli mucdola, unreliable as far as generalization is concerned, yet provocative. Anecdotes ma- tured~ into systematic o6scruatiors, the most ancient form of science. Observation told the ancients much about the stars, it ® ® ® ® M
Page 45: zjc02a00 Log in for more options!
told~ the pharaohs' physicians much about the sick, and it is still important, for simple "'eyeballing'' has developed into deta collex- tion and the recording of case hirforics. These are respectable, yea, indispensable methods yet still only one part of science. Case histories may not be typical, or they may reflect the beholder. Medicine continues to be p1agued by Big Authorities who insist„ "T know what I see" There can be useful, even inspired, observation and analy- sis of natural cperiments: Excess fluoride in some waters hardened teeth, and this observation led to fluorid'ation of drinking water to prevent tooth decay. There are also man's inadvertent experi- ments, disastrous and benign, to be studied. Hiroshima trig- gered wide analysis of the effects of nuclear radiation, invaluable yet frustrating because there were no good~ measures of exposure levels, a gap that has caused confusion and controversy ever since. In 1585 or so, Galileo dropped those weights from a tower and'he]ped invent the aeiintifu aperiment; a study in which the experimenter aonvvls the conditions-controlled conditions are the heart of the experimental method-and records the efect. Experiments on objects, animals, germs, and people matured into the modem aprrimenlal study,, in~ which the experimenter typically changes only one or some other planned number of variables to see the outcome. Clinical Trials The experimentali method is the essence of experimental medicine's current "gold standard":' the controUed, randdmized clini- cal trial: At its best, the investigator tests a treatment or drug or some other intervention by randomly sdeeting at least two com- parable groups, the ezpe,irnental group that is tested or treated and a control group that is observed' for comparison. True clinical trials are expensive and difficult. It has been estimate& that of 100 scheduled trials, 60 are abandoned; not , SCUDIFS,, GOOD ~ implemented, o culty in rernri6 lems, or, some (making contirr group unethica sults, and jµst : theless are callc to evaluate m Randomized c heart attadc de atrokes, and th No doctor~ ob! shown these ti Types of • Among similar group~ no treaunent. • In cia ssr ments in suco, ~ contro&,' observL treatmen.. Tl outcome of tz between stud become mor health-cons6 patients in z studies eithe, cholesterol a some of the fewer fats-; • Invcst son with ol percent, sa~ uienul oonbr
Page 46: zjc02a00 Log in for more options!
implemented, or not completed, whether for lack of funds, diffi- culty in, recruiting or keeping patients, toxicity or other prob- lems, or, sometimes, rapid evidtnce of a difference in effect (making continued denial of effective treatment to a control group unethical). Another 20 trials produce no noteworthy re- suits, and just 20, results worth publishing. Clinical trials none- theless are called the strongest, most, precise, most decisive way to evaluate medical interventions and learn true causation. Randotnized clinical' trials proved that new drugs could cut the heart attack death rate, that treating hypertension could prevent strokes, and that polio, measles, and hepatitis vaccines worked. No doctor, observing a limited number of patients, could have shown these things. Types of clinical studies include the following: • Among the most reliable are prrra!!d stLdia comparing similar groups given different treatments, or a treatment versus no treatment. But such studies are not always possible. • In onssover scudus the same patients get two or more treat- ments in succession and act as their own controls. Similariy, scl, f-controlW studiu evaluate an experimental treatment by control observations during periods of no treatment or of some standard treatment. There are pitfalls here. Treatment A might affect the outcome of treatment B, despite the usual use of a uaashout prriad between study periods. Patients become acclimated: They may become more tolerant of pain or side effects or, now more health-conscious, may change their ways. The controls-the patients in a control group-don't always behave in parallel studies either: In one large-scale trial of methods to lower blood- cholesteroL and risk of heart disease, many controls adopted some of the same methods-quitting cigarette smoking, eating fewer fats-and reduced their risk too. • Investigators often use hirtoncal condvAs (meaning compari- son with old records: historically the cure rate has been 30 percent, say, and the new therapy cures 60 percent) or other exle.nal contw(r (such as comparison with other studies). These
Page 47: zjc02a00 Log in for more options!
Q, cHArrER4 controls art often misleading-the groups compared are fre- quently not comparable, the treatments may have been given by different methods-but they are still at times useful. What Makes a Study Honest? Obviously„ all studies, including the best, have potential pitfalls: • Lack of'adcquatc controls is fatal if you really want to put the results in the bank. • The group or smnph studicd, 10 people or 10;000; must be lasgr enough to get a valid result and repse,sentatirx enough to apply to a larger population. Because people vary so widely in their reactions, and a few patients can fool you, fair-sized groups of patients are usually neededl And enough of the right kind of subjects arc needed for a suitable sample. Picking patients for a medicali study is no different from picking citizens to be ques- tione& in a political poll. In, both, a sample i's studied, and inferencrs-the outcome of an election, the results in patients in general-are made for a larger population. To get a large enough sample, medicall researchers more and more try to conduct m+iltrcenter triaLs, which are appealing because they can include hundreds of patients, but expensive and tricky because one must, try to maintain similar patient selection and quality control at 10'or 100 institutions. Suceessful multicenter trials established the value of controlling hyperten- sion to prevent strokes. They demonstrated the strong probabil- ity that less extensive surgery is as effective as more drastic surgery for many breast cancers. • The smnfiili should be rmidomizcd-divided by some random method into comparable experimental and control groups. Ran- domization can easily be violated. A doctor assigning patients to treatment A or B may, seeing a particular type of patient, say or think, "I'his patient will be better on B." If treatment B has been established as better than A, there should be no random study in the fusr: place and certainly no I STUDIES, GOOI study of that c 'the trial's gua one critique. E are often assit puter-generat( •Tocomb and get answf study popul,~ groups by ag, stratify can h tampons in t cases were bi~ The ide can be trick, mav fail to s< But some p stronger pat treated imm We repc or di! nomia major newsl parity with i vantaged gr page„ did t•i older peopl incomes be' are still ma • 'To cn blirrded-to bl;nded, so I a treatmen know whet ineffective 1 a good~ res There is i
Page 48: zjc02a00 Log in for more options!
S'IUDIFS, GOOD AND BAD 41 study of that doctor's patient. When randomization is violated,. "the triaPs guarantee of lack of bias goes down the drain," says one critique. As a result, patients who consent to randomization are often assigned to study groups according to a list of com- puter-generated random numbers. • Ta combal'bias-the influence of confounding variables- and get answers applicable to various populations, the sample or study population must often be siratfW ' or separated into groups by age, sex, socioeconomic status, and so on. Failure to stratify can hide true associations. The role of high-absorbency tampons in toxic shock syndrome was darified only when the cases were broken down by precise type of tampon used, The identification of important subcategories of patients can be tricky indeed. A study of open-heart surgery patients may fail to separate out those who had to wait for their surgery But some patients die waiting, and those left are relatively stronger patients who do better, on the average, than those treated immediately after diagnosis. Wt reporters may also fail to pay attention- to stratification, or distribution. In early 1985 the Ptesident's Council of Eco- nomic Advisers reported that-to quote the page-one lead in a major newspaper-"elderlyArnericans have achieved economic parity with the rest of the population and no longer are a disad- vantaged group" Not for several1 paragraphs, now on an inside page, did the story note that "there's a lot of variability;' and older people are also 'nore l"ikely ... to have members with incomes below the average of their age group "' In short, there are still many elderly trapped in poverty. • To cnmbal bias in inuxstigators or patiertts, studies should be blinded- to the extent feasible, sing(e-, double-, or, best of all, triple- blindid, so that neither the doctors nor the nurses administering a treatment nor the patients nor those who assess the results know whether today's pill is treatment A, treatment B, or an imeffective placebo: Otherwise, a doctor or patient who yearns for a good result may see or feel one when the `right" drug is given: There is a tale of an overualous receptionist who, knowing
Page 49: zjc02a00 Log in for more options!
42' CIiAFI'ER Z which patients were getting the rr.al drug and not the placebo; was so encrwraging,to these patients that they began saying'they felt good, wiIly-nilly.' Barring observant receptionists, the use of a plaeebfl-from the Latin meaning "I shall1 please"-may help maintain blind- ness. Placebos actually give some relief in a third of all patients, on the average, in various conditions. The effect is usually tem- porary, howevu, and a tnily effective drug ought to work sub- stantially better tltan~ the placebo. Blinding is often impossible or unwise. Some treatments don't lend themselves to it, and some drugs quickly trveal i themr selves by various effects. But an unblinded test is a weaker test. • Finally, what makes a study honest is honesty, John Bailar warns of deliberate or careless deceptions that seem to be uni'- versally accepted today, practices that sometimes have much value but at other times are "inappropriate and improper and, to the extent that they are deceptive, unethical." Among them: the selective reporting, of findings, leaving out some that might not fit the conclusion; the reporting of a single study in multiple fragments, when the whole might not sound so good; and' the failure to report the low power of some studies, their inability to detect a result even if one existed'.' Dr. Charles Moead of the Mayo Clinic says, Probably the majority of cancer patients treated with chemotherapy today art receiving regimens that have not been proved e$ective by randomized trial! ... Many artides,publishe.d in our major journals make claims for fantastic therapeutic accomplishments with no ran- domiz,ed'contralk. ... Many, if not most, of the randomized'studies . .. are of such poor quality that their Izsiilts are unbelievable...., Ptrcious few have withstood the sautiny of carefully designed confirmatory scientific study. He calls a multitude of poor methods statistical legerde- main: 'tfie games we play, trying to squeeze out that little bit of breakthrough" Why the pressure to play them? 'Salvation," Dr. Si'SJDIFS. C',OOD. David Salsburf; prestige, invitr references in t! Epi&emiok Clinical s populations, v demiology set a population ra!' innertigatior Epidemi, ies-aome sn same pitfallss the right ans< goes, an epic sex. Epideni epidemics of miolo,r" .to we liv X, the heaatnies the first en healthier to today's enviu may' have b he might he wealthier ar In 174 ess b y succ , don's c}tinu Ij to: soot-br rette. A oe I cases on a drinking w Street pun,
Page 50: zjc02a00 Log in for more options!
/ David Salsburg answers. "Ftuit in this world (increases in salary, prestige, invitations to speak) and beyond this life (continual references in the citation index) "' Epidentiolo,~y:.. I~ippocrates to AIDS Glinical studies deal with patients. Epidemiology deals with populations, which sometimes are large groups of patients. Epi- demiology seeks the causes of both health and disease by placing a population under its own kind of microscope, the epidenu'oCqgi- cal irsvGrtigation. Epidemiological studies in many ways parallel' c]inieal stud- ies-some studies are both-and are subject to many of the same pitfalls and rules, like avoiding bias and stratifying to get the right answers about the right subgroups. An old'saw, in fact, goes, an epidemiologist is a physician broken down by age and sex. Epidemiology in its early days was concerned wholly with epidemics of typhoid, smallpox, and other infections. But epide- miolbgists today also ask, "What should we eat and how should we live to stay healthy?" and they study large groups to see how the healthiest and unhealthiest live. Hippocrates has been called the first environmentalist because he observed that it was healthier to live in high places than in low ones. Anticipating today's environmentalists, he blamed bad air and bad water and may have been partly right. But he failed to stratify; otherwise he might have noticed that the people who lived' high were also wealthier and better nourished than those who lived low.' In 1740 Percival Pott scorrd a famous epidemiological success by observing the high rate of scrotum cancer in Lon- don's chimney sweeps and correctly blaming it on their exposure to soot-burned organic material, much like a smoked ciga- rette. A century later, John Snow, plotting London cholera cases on a map and noting a duster around one source of drinking water, removed the handle from the now famed Broad' Street pump and helped end a deadly epidemic. The 19th-
Page 51: zjc02a00 Log in for more options!
STUDIFS, GOOD , century French advocate of statistical methods, Pierre Louis, observed hospital patients and helped stop the use of bleeding as a treatment. Ignaz Semrnelweis showed that doctors' dirty hands t.ransmitted deadly childbed fever to mothers. Modem epidemiologists successfully indicted smoking as a cause of lung cancer and heart disease and identified the associa- tion of fats and cholesterol with dogging of the arteries. They evaluate vaccines, assess new methods of health care delivery, and track down the causes of new scourges like AIDS, toxic shock syndrome, and Legionnaires' disease, all by several methods. AIl are valuable. All are fuIl of traps. • Epidemiology, like all of' science, started with obsmatianal studies, and these remain important. They are weak and uncer tain, we have noted, when it comes to determining cause and effecr. Yet observation is how we firsr learned of the unfortunate effects of toxic rain,, Agent Orange, cigarette smoking, and many sometimes helpful, sometimes harmful; medications-and of certain sexual! practices and addicts' use of dirty needles on AIDS. • Some observational studies are simply drseriptirx-describ- ing the incidence, prevalence, and mortality rates of various diseases, for example. Other, analytu studies seek to analyze or explain: the Seven-Country Study, for example, that helped associate high meat and dairy fat and cholesterol consumption with excess risk of coronary heart disease. Ecological studies look for links between environmental conditions and illness. Human migrations-like that of the Japanese who come to the United States, eat more fat, and get~ more disease than they did in Japan-are among valuable natural erfxrzmrnts. • The simplest observational measurement is a count. Samn- ph'rg is just a more sophisticated kind of count. You can't count or ques6on everybody, so you seek a sample that~ represents the whole. Many epidemiological sunxys rely on samples-among thems government surveys of health and nutritional habits. Samples and surveys often use guestionnaisa to get information. A sample or survey is never more than a snapshot of the scene at the mo unless fiequentl than the q,ualin compared patie with those their altnost half of eh of a year. And people tend to often say both } A survey may get accurate in • Epidemi control studirs, or or cross•sational' look at the ratf effects by age, study: A cross few days. A o vnt a disea }]: examineo gro> drorne, mairtl case Control I tients, or cases their families ries that cover group is then comp- ,lrrmup, and other c1ii The resu the case-cont tively eacy' Ic semble clues may test sorr use of tampo O as ttie main ~ N W CJl ~ N ~ ~ ~A
Page 52: zjc02a00 Log in for more options!
STUDIES, GOOD AND BAD 45 scene at the moment; it can't portray an ever-changing picture unless frequently repea.te& Questionnaires may be no better than the quality of the answers, written or verbal. One survey compared patients' reporting of their current chronic illnesses with those their doctors recorded. The patients failed to mention almost half of the conditions the doctors detected over the course of a'year. And whether it comes to illness, diets, or drinking, people tend to put themselves in the best possible light. They often say both yes and no to the same question in different form. A survey may stand or fall on the use of sophisticated ways to get accurate information. • Epidemiologists' studies may also be p.er.vlencr studies, uisr conbol stLdicr, or cohort studtes. A prraalence study; also called a ewmit or cran-sartimral study is a wide-angle snapshot of a population: a look at the rate of disease X or at toxic agent X and its possible effects by age, sex, or other variables. A political poll is such a study: A cross section of the nation is examined in a period of a few days. A carr-corebvd study examines caus and contrvlr for a close-up of a disease's relationship to other factors in a smaU, intensively examined group. The nation hears of cases of toxic shock syn- drome, mainly in young women. The federal Centers for Dis- ease Control launches a J,icld in~n to find a series of pa- tients, or rasa, confirm the diagnosis, then interview tbern and their families and other contacts to assemble careful case histo- ries that cover, hopefully, all possible causes or associations. This group is then compared with a randomly selected; but matched com,bar group; or control group, of healthy young women of like age and other characteristics. The results need to be interpreted with great caution, but the case-control study is often a quick, highly useful and rela- tively easy, low-cost first approach or fishing expedition to as- semble dues about causes or even a working hypothesis. Or it may test some hypothesis. A case-cantrol' study pinpointed the use of tampons (later found: to be certain high-absorbency ones)) as the main villain in toxic shock. The relationship of cigarette © a ® iu ® ®
Page 53: zjc02a00 Log in for more options!
smoking to lung cancer, the association of birth control pills withh blood vessel problems, and the transmission ~ patterns of AIDS were identified~ in case-control studies that pointed to the need for broader investigation. f'.dimff or incidencr stud:is are motion pictures. They pick a group of people, or cohort -a cohon was a unit of a Roman legion-oken stratify or divide them into subgroups, then follow them over time, often for years,, to see how some disease or diseases develop. These studies are costly and difficult. Sutbjects drop out or disappear. Large numbers must be studied to we rare events. But cohort studies can be powerful instruments and substitutes for randomized' experiments that would' be ethically impossible. You can't ethically expose a group to an agent that you suspect would cause a disease. You can watch a group so eacposed. The noted Framingharn study of ways off life that might be associated with developing heart disease has followed more than 5,000 residents of that Massachusetts town since 1948. The American Cancer Societ/s 1952-55 study of 187,783 men aged 50 to 69, with 11,780 of them dying during that period, did much to establish that cigarette smoking was strongly associated with developing lung cancer.1O' • Many epidemiological, as well as clinical, studies are handicapped because they must be retrorpectirac. T}iey lbok back in time-at medical records, vital'statistics, or people's recollec- tions (for example, those collected in interviews in a case-control study). People who have a disease are questioned to try to find common habits or exposures. Women with cervical cancer are interviewed to see how many took possibly guilty hormones and how many did' not. People who live around a Love Canal are asked if they have been ill. Retrospective studies are notoriously unreliable. Memories fail or play tricks. Old records are poor and misleading. Defini- tions of diseases and methods of diagnosis vary sharply over the years. The patients you find may not be representative. A retro- spective study, however intriguing, generally only says that there may be something here that ought to be investigated. STUDIES, GOOD f (There are excel, tive study can I lected!in the pw was a retrosper' • pA p%rprcr the American C sharply on a se statistical and r ford tells how fc the accuracy e: adequate prosF ward looks we: • Epidemi experiments of cally inLmeLtio, tion; somethir The mas! Salk polio vac trial' too j with to ~ eithr va placebc divided betwt~ first- and thiparuetpating counted all a those who h. In the placel the vaccinatc subjects late7 shot." Anothea tablished' th, tooth decay. not. Blindin tal caries th:& cebo effect.
Page 54: zjc02a00 Log in for more options!
('I'here are exceptions. Dr. Gary Friedman writes, "A retrospec- tive study can be quite reliable if based on data caiefWly co1- lected in the past. A revealing study of mortality in radiologists was a retrospective cohort study based on good data") * A pmpaarx sdudy, in contrast-like the Framingham and the American Cancer Society studies-looks forward. It focuses sharply on a selected group who are all followed by the same statistical and medical techniques. Dr. Eugene Robin at Stan- ford tells how four separate retrospective clinical studies affirmed the accuracy of a test for blood dots in the lungs. When an adequate prospective clinical trial was done, most of the back- ward looks were proved' wrong." • Epidemiology also includes arperirr+rnlal rtudies; the dassical experiments of science on a larger human scale. These are typi- cally tntcruentwn studia. Zhere is some intervention or manipula- tion; something is done to some of the subjects. The massive and hugely successful 1954 field trial of the Salk polio vaccine was a classic intervention trial and a clinical trial too, with 401,974 first- to t3tird-graders assigned at random to either a vaccinated group or a control group injected with a placebo, or dummy shot-and another 947,171 children divided' between vaccinated second-graders and unvaccinated first- and third-graders acting as controls. In addition, in all participating states or counties, the investigators studied and counted all cases of polio in a grand total of 1,829,916 children: those who had~ taken part in the study and those who had not. In the placebo areas, the study was also triple-blinded: neither the vaccinators, the subjects, nor the doctors who examined the subjects later for polio knew which children got which kind of shot. `'_ Another successful intervention study, a conmunity~ bial, es- tablished the value of fluoridating water supplies to prevent tooth decay. Some towns had their water fluoridated; some did not. Blinding was impossible, but the striking difference in den- tal caries that resulted could not have been caused by any pla- cebo effect. /
Page 55: zjc02a00 Log in for more options!
Just bccausc Dr. Famous or Dr: Bigshot says this is what hc fbund dor.Yn i mean it is neccsurilj+ so: -th. Amold Rclman. Ask to see the numbers, noa jusa the pretty coiors. -Dr. Richard Muoin tiarxvwl . /n,trauan aJ Mmhb, ikverihin}; R! I.una w rtponcn, WHAT questions should we reporters ask -to make our news solid, to report the more valid claims and ignore the weak and phony? When a scientist or physician or anyone else says, Tve discovered that ...," what should we ask? In 1949, a year after Britain's National Health Service- "socialized medicine'- was launched, my editors sent me to Britain to see how it was working. A bit stumped, I asked Dr. Morris Fishbein, the provocative genius who long edited the fournal of the Arrurican Mr,dical Association, "How can~ I, a reporter, tell whether a doctor is doing a good job?" He immediately said, "Ask him~ how often he has a patient take off his shii•t." His lesson was plain: No physirali examination is complete unless the patient takes off his or her dothes. Most reporters are not skilled statisticians, but we can ask some similgrly revealing questions. Many of these arz not even statistical, just, simple ones that, like Fishbein's, probe soft spots and often disclose either a conscientious approach or one that can't be trusted. We can learn here from one method of science. We said 49 QUESTIONS RFPi:)t earlier that a prc seeking trutli, oft A is no better tha sees whether or r much like the lav cutor to prove ' guilt}°: A reporte should be equall words or thougf If an invest case, you may Y since a good sci, for you. The n something. Here are sc p1e and obviow want to ask the How do }m, mCnt? YITi 't i Answer 'I've seet. ~0 block. . . ' rn, gation, may bc am•t}iing like c Wh'a1 kirrG'd dcsi,gn' And a f 1470 wa s casr-eonttol, ~'irn. ter for kinds people just sc conclusion wi medical! edito studj? l4that' s; mcrwer?'
Page 56: zjc02a00 Log in for more options!
( earlier that a properly skeptical scientist, starting a study and seeking truth, often begins with a nvll hypotlvsis-that tieannent A is no better than treatment B, that there's nothing there - then sees whether or not the evidence disproves it. This approach is much like the lavJs presumption of innocence: It is for the prose- cutor to prove beyond reasonable doubt that the suspect is guilry. A reporter, without being cynical and believing nothing, should be equally skeptical and greet every claim by saying, in words or thought, 'Show me." If an invrstigator or claimant is competent and has a good case, you may have to ask none or very few of these questions, since a good scientific presentation should' answer most of them for you. The need for a lot of questions could itself tell you something, Here are some possible questions, then, some of them sim- ple and' obvious ones, a few more terhnical1 for those who might want to ask them. How do you know.? Have you doru a study.) Was thac an apni-. merit? I1'lrat is the aidew? Or is the approach just anecdotal? Answers like "In my experience .... " "In my hands . . . ,' "I've seen 20 cases ...' and "Ihere are four cases in our block ...' may be interesting, may' be worth scientific investi- gation, may be worth a cautious news story, but there is not yet anything like certainty. What kind of study ulas i!? Was there a rystematic rrsaarch plan or drsigre? And a prowcol or set of rukr? What uw the study deszgn or mtdod.` obsns.ntional, alxrimenlal, carrco.rbol, prasperttUr, rdrnspeceive, or wheL? (See the previous chap- ter for kinds of studies and their uses and limits. )"A lot of people just scrounge around and try to come up with some conclusion without any real plan or design at the start," one medical editor reports. Was the dksign diauhr befmr you smrtnd ;rvev sdidy ? What sperfte' quatiorcs or hypotiresa a'id yoe sd out to test or aarurl?
Page 57: zjc02a00 Log in for more options!
Why did you do it that way ? Do you think it uxis the right kind of study to get the answer to this guestion or problern? Was it a trnrc human rxperiment, fpossible, with comfiiarabla groups picked at random for comparison? If' not, why not? And what was the subititute ? If an investigator patiently - you hope-tells you about an acceptable-sounding design, that's worth a brownie point. If'the answer is "Huh?" or a nasty one, that may tell you something, else. Are yon presenting preliminary data or something fairy eonclusn'irr? Are you prrsrnting a conclusion or a hypotlesis for ftatM study? "Pre- liminary" and "interesting" can mean 'unproved" If'the result is not ruuonab y concltcsirx, should there be further stvd:us and ' what Aznd? How many su~ects„ patients, cases, or penp'te are you taLting about? Are thae nwnbers lnrge enough; statistically ngorous enough, to get the aruuxrs you u.iant.1 Was there an adequate number of patients to show a di,&*nncr between trtatments? Why are you calling a press conference too rrporl'on foro patients? Small!numbers can sometimes carry weight. And they may sometimes be the only ones possible. 'Sometimes small samples an° the best we can dos one researcher says. But larger numbers arc always more likely to pass statistical muster, The number studied can also depend on the subject. A thorough physiological study of five cases of some difficult disor- der may be important. One new case of smallpox would' be a shoc.ker in a world in which smallpox has supposedly been elimi- nated. In June 1981 the federal Centers for Disease Control' reported that five young men, all active homosexuals, had been treated for Pneumocystis =inii pneumonia at three Los Angeles hospitals.'' This alerted the world to what soon became the AIDS epidemic. Who were your subjects? How were theyy sulGetrd? what were your crzteriafor abnission to the stud.y? Werr rignrout laboratory tests used to. QU£STIONS REPOR' &finC the PQ17CntSi or Was the auigr randomizcd ~ Randc cent chance of bei armed study (one ttd to thr study btf How was the randc If'thr subjes z "If it is a nonrar some extraordinz Was there a c always be weakeison.P'In.other wo what are you carnlt control'group simi/c' siudicd ? Vogt calls 4 bly ... th• -~nE ular liter D Do you hane atiur of the grnera the disease or cmu long way towar~ an the rrsultc apJ . Ifv= gm important fwfiulc statistical adJustr sPa'~.rw gr°upsi m ple„ to make z nearly compar bility and strar Was the it treatment ' with a
Page 58: zjc02a00 Log in for more options!
define the patientr, or uaen chnical'diagnoses (nxeuari[y less nliablc) used,' Was the assignment of subjeids to b'mtrnent or other v~n randomizeV Randomization should give every patient a 50 per- cent chance of being assigned' to one group or the other of a two- armed study (one comparing two groups). Were the patients admit- kd to the study before the randomizatiort? This helps elimiinate bias. How uxis the rmidomizataon done? If the subjatr uxnnt randomizad, why not?'Qne statistician says,. "If it is a nonrandomized study, a biascd investigator can get some extraordinary results by carefulIy picking his subjects" ! Was thcre a control or comparison group? If not, the study wiIl always be weaker Who or what wen}+our contmis or bQUS fvr compmi- son? In other words: When you say ynu have such and such a result, what are}+ou comparing it' with? Are thc study or poturtt grocrp and thc contml group similar in all raabacts but the traatrnent'or other variable being stndird.? Wogt calls "comparison of non-comparable groups proba- bTy ... the singie most common error in the medical and' pop- ular literature on healthh and disease." lb}pu have rauon to brli'eue yo=n sublacts and contsols waa represent- atirx of the general pnpelation? Or the paatrtular population-thau with the disease or condition you are int~ in? The answers here go a long way toward answering these questions: To what populations are the rrsults applicable? Would the association hold fpr other groups? If.yoiv groups are not comparable to the grneral populhtiorc or some importarrt populatim, have pou taken steps b adjutt for thir? Eith'er stdtirtical adjustrrient or stratifuat:on of your sample to fiad out about spwfugrmcps, or both?'Samples can be adjusted for age, for exarn- ple, to make an older- or younger-than-average sample more neariy comparable to the general populace. (More on applica- bility and stratification after a bit. ) : Was the strrd}r blind-' In a study companng diugs or other f6rnw of brntinenl wilh a placebo or a'unvny tnattnent; did (I) those arbTSinisteing
Page 59: zjc02a00 Log in for more options!
® 52 CHhPTER 5 the ftatmnd, (2)'tlwse gctt:ng d; and (3) those assessing the outcome know who uaar g+dtrng what, or were th~y inderd blindcd; lnounng only that they were comparing A' and B(or A, B, and C, perhapr)? Could those gunng or Betting the treatment huve emtily, gguessed which was which by a d:'ffereencc in naction or tnste or other rusultt? Not every study can be a blind study. One tzsearcher says, 'hete can be ethical problems in not telling patients what drug they're taking and the possible side effects. People are not guinea pigs" True enough,, but a blinded study will always carry more comaction. Were there other acapted'qualtty controls? For example, making sure (perhaps by counting pills or studying urine samples) tliatt the patients supposed to take a pill really took it. Were you abLe to foflow}nur protocol or study plnn?' If there were questionnaires, interviews, or a survey: Were the querions likely to eGiit atttcale, reliable answers? Was i1 really possible to get aawatr answers to these questions? Sampling is as common in mediaal studies as in~ political polling. Every study examines a sample, not the whole popula- tion, The sample must be reasonably accurate to~ give valid results. But badly worded questions can also distort the results. Respondents' answers can~ differ sharply, depending on~ how questions am asked. Exarnple: In one study 1,153 subjects were asked which is safer, a meatment that kills 10 percent of every 100 patients or a treatment with a 90 percent sutvival' rate? More people voted for the seaond' way of saying precisely the same thing.' People commonly give inaccurate answers to sensitive questions, such as those about sexual behavior. They are noto- riously inaccurate in reporting their own medical histories, even those of recent months. Ask: Ihd you pretest your qursturns for e,~'ectiueners befo>•e do:ng your actual surury.? Also: What was your nonrerporue rate? Do you report it? ® QUFSTIOT:S REPOR" In any studyy toursc., Do you aam. Every study David Sackett saN masons. Rather, recover, die, or tf ability." If an, inve dropped out, it a died of "other cau1 being investigate< after all', they dii treatment look b. deaths in every t SaeFiett add originall inocptiot more are not ac worth reading"' "Gtnerally true, Professor V few relate d containI all'.. J sometimes been . . incJuding or: what attnt has data? ... It is : data to: make t}. How long u' i1), survicr wi.fhlreall}^. k,ww the o And: N'ou biasis-a dise: made by findiJ but a cure waa "It does pay tc
Page 60: zjc02a00 Log in for more options!
® QUESTIONS REPORTERS CAN ASK In any study: How mm,v of yorn sdrdy subjia tr cmnpAtted the ernvst.~ Ik you aeemrnt fvr those who drvppad out cnd re11 usrey they did.' Every study has dropouts. McMaster Ihtiversity's Dr. David Sackett says, 'atients do not disappear ... for trivial reasong. Rather, they leave ... because they refuse therapy, tzcover, die, or retire to the Sunbelt with their permanent dis- ability.' If an investigator ignores those who didn't do well and dropped out, it can make the outcome look better. If those who died of "other causes" are listed among `survivors" of the disease being investigated-this is sometimes done on the theory that, after all, they didn't die of the target cause - it can make a treatment look better unless there are equatnumbers of such deaths in every branch of the study. Sackett adds, 'The loss to follow-up of 10 per cent of the original inception cohort is cause for concern. If 20 per cent or more are not accounted for, the results ... are probably not worth reading'" (On which Dr. Thomas Vogt oomments,. "Generally tnie, but utterly dependent on the situation:") Professor Warren Burkett of the Universiry of Texas adds a few related and'pointed questions; "Does the paper or pubGcation contain all roultr of all apnir+rentr.? Support for a hypothesis has sometimes been made to seem stronger by selective reporting . .. including only the data that most dosely fit the theory. To what edeni has the data of fered ...&en smoothid ,/ttme the raw data? . . . It is not unknown for researchers to dip and round data to make them fit [their] predicted resuits" (italics mine):' Hout lomg wac the sddy'r fodlow-up? How long do patientt ordinar- ily szvuiuc rwidi this disense.?' Were your patientr follorcad long mough to set111y bww the outcomes, , good or bod?And: How thorough uaas the fivllary-up? In one report on ame- biasis - a disease caused by an amoeba- the diagnosis was made by finding the amoeba in one of three consecutive stools, but a cure was declared after observing just one negative stool. 'It does pay to read with care,' a medical professor observes. W ® ® I N
Page 61: zjc02a00 Log in for more options!
® Ct:t,,Pt1R 5 Could yotcr nsults har.r ornvrrad just by chance? Haue any statirtical lcttr bem appl'ied to tcst thir?' Did you calculatc a P raaluc? Was it fauorablc-.05 or less? (Re- ported as <.05; see Chapter 3.) P values and confidence state- menu need not be regarded as straitjackets, but like jury ver- dicts, they indicate reasonable doubt or reasonable certainty Remember that positive findings are more likely to be re- ported and published than negative findings. Remember that a favorablt-sounding P value of <.05 means only that there is just I chance in 20, or a 5 pettxn.t probability, that: the statistics could have come out this way by pure chance tahen there uas actually no~efect-so I in every 20 statistically significant results may be a misleading false positive. There are also ways and ways of arriving at P values. For example, an investigator may choose to report one of several end points, death, length ~ of survival, blood pressure, other measurr ments, or just the patient's condition on leaving the hospital. All can be impottant, but a P value can ~ be misleading if the wrong one is picked or emphasized. You might want to ask: Are all tlic imporiant end points aruf their P vali/rs rcpflrtcd? Also: Was the tesi giving the P value the appropriate test; as planned in your anrtkn protocol, or dul yrou fsnally do more than one lcind af test? (And perhaps report only the best answer?) What uxrr the other values? DId you collaborate with a siatistinan in both' yotv dcrign and }rour analysis?'A statistician s collaboration often may be indicated in a credit or footnote. In studies seeking cause wtd cJfat, remember that associatSon~ is not necessarily causation. Rutgers' Dr. Michael Greenberg reminds us, "Mathematical methods cannot establish proof of cause and e$ect. They can indicate the probability that a rela- tionship occurred by chance, can sometimes quantify the exist- ing relationship between actions and efects, and ~ can under the best circumstances be used to predict the impact of actions even~ ® ® ® QUES111ONS RE if the comple. . View ml skepticism." A true cx prove cause i and chemistn association in experiment) i ria that you: c Is the auo different plac How ~ stro describing a ;, ralio? The wc lt mainly me ing the outur A rdatiu one by the ot (see pavr 46> 55 to iL 188 pc. 10( smokers we: cancer-thei Is there curve or gra agent, or ca deed at gre smokers at F, is an unsert] only after sc Anothe conclahon Qor the associati tion, betwe( straight, ste a straight I ® 0 M
Page 62: zjc02a00 Log in for more options!
if the complex phenomena driving them are not understood. ... View mathematical associations with a healthy degree of skepticism." A true experiment, controlling all variables, can sometimes prove cause and effect aUnost surely This is easier in physics and chemistry than in human biology When, then, does a dose association in an observational study (rather than a controlled experiment) indicate causation? There are several possible crite- ria that you~ can ask about: ls the association consistmt? Are similar results usually found in different places and by different research methods? Haw strong is the association? If risk is an appropriate way of describing a particular situation: Wluratt is the relaticr rtsk; or the risk ratio? The word "strong" is used here in its mathematical! sense. It mainly means the magiitudr of an effect or risk, the odds favor- ing the oattome of interest versus no such outcome. A relative risk, or risk ratio, compares two rates by dividing one by the other. In an American, Cancer Society smoking study, (see page 46); the lung cancer mortality rate in nonsmokers aged' 55 to 69 was 19 per 100,000 per year; the risk in smokers was 188 per 100,000. Since 188 divided by 19 equals 9.89; the smokers were about 9.9 times more likely to die from lung cancer-their relative risk was 9.9.' That's strong! Is there an impressive dase-raporue, or casesc and-rffect; cww- a curve or gradient that shows that the greater the exposure to the agent, or cause, the greater the effect?' Heavy smokers are in- deed at greater risk than moderate smokers, and moderate smokers at greater risk than 6ght smokers. (In some cascs-tfiis is an unsettAed matter- therc may be a ttueshold effect, an effect only after some minimum dose.) Another way of asking about risk and response: What is tha corrrltrtion coeffuieru-the extent to which a set of measurements of the association is linear? A perfect linear relationship, or correla- tion4 between two observations or variables would show up as a straight, steadily rising set of data poir~tr-in everyday language, a straight line on a graph. A perfect positive correlation or, t
Page 63: zjc02a00 Log in for more options!
® ® ciLkPTER 5 linear relationship, is given the value +1; +.5 would be a lesser but still interesting relationship;, -1 or any negative figure indi- cates an; inrxrx or rugatiix rrlvtionrhs'p, such as a runner's speed going down as his weight goes up. A correlation of zero means no consistent association. How spaific is the associatiori? Does a supposed cause lead to many supposed effects? Or does an effect depend on many sup- posed causes? Sucli associations are less specific, and thus more suspect, until' positive evidence piles up. Smoking indeed causes many effects. A lung disease, asbestosis, is most common when there is exposure to both asbestos and cigarette smoke. Does the supposed cause pra-.edc the did? Is a supposed beo ogical association epidemiologically. plausibk? One strong argument for a cause-and-effect rdationship between high consumption of satu- rated fats and cholesterol and coronary heart disease is that populations on such diets generally develop more such disease than those on leaner diets. Does the arsonctfon make biological sensP Does it agree with current biological and physiological knowledgc?'You can't follow this test out the window. Much biological facr is ill understood. Also, Mosteller watns, "Sonuoie nearly always will clgim to see a [biological', or physiological] association. But the people who know the most may not be willing to." Finally, look for the real why., Ask: Are there other possible aplanntions?'Ded you ldok for other aplanatiorzs-confounders; or con- fnundi'ng aariables; that may be producing or helping produce the association? Sometimes we read that married people live longer than singles. Does marriage really increase life span, or may medicaL or other problems make some people less likely to marry and also die sooner? Maybe the Dutch thought storkss brought babies because better-off families had morr chimneys, more storks, and more babies. Did you tnke steps to avnt'rol or adjust for other possible aplmiatio+u? Did you do a stratifud analyst;s-a breakdown of the data by strata like sex, race, socioeconomic status, geograp}ncal' area, occvpa' tion? Men commonly have more bronchitis and cirrhosis of the WC QUES77ONS R liver than w more heart possibly beca analyses will Did you c ak mtalysisj t, analyses can also be misu! Some aophis analysu did yc the more an; consider? Hou tor tries eno tion, he or untrue. In caus, nanalysir of . independent see if t+- re P -d or rea se analysis or r among auth reasoned ar. than the an, In stud knoLv or da6aplu~; o* ' ments or tc: teniews, ph highly subjc provement quantify), ou YY~s there sor Iftwoo
Page 64: zjc02a00 Log in for more options!
QUFS"17ONS REPORTERS CAN ASK 57 liver than women because they drink more. They also have more heart disease, possibly because they've smoked longer, possibly because some hormones protect women. Only stratified analyses willi bring out such differences. Drd ymu do an analysis (a rrgsecsic„e or somr othu fvrm of nvltivari- a1s mmlysis) ~ to by to identzfy the impor~ aiaiiable or cmrabdis? Such analyses can often reveal the strongest associations. They can also be misused, and they are not always needed or appropriate. Some sophisticated questions, when appropriate: How many sush mmlyses d:d you have to run to dmidr on the appropriate one? Sometimes the more analyses, the worse the study. How many variables did you consids.T How many of these did yau wind up reporting? If an investiga- tor tries enough variables in a kind of statistical fishing expedi= tion, he or she is almost bound to find something, true or untrue. In eause-and-effect and other studies, ask: Has there &rn any rennalysic of the data.-' "Results, if possible, should be met?iod- independent," Greenberg believes. "You should recalculate and see if the results hold up." A word of caution: Questions about multivariate analysess or reanalyses can be tricky. Whether or not•to do one kind of analysis or reanalysis or none at all is often a matter of dispute among authorities. Launch the subject with some humility. A reasoned answer, afumative or negative, may tell you more than the answer's precise content. In studies of medical treatments or preventives: How d:dyvu kiwm or dai& whne your patients uxn c7vad or rinproved> Wen there arplruit; objactirx outcome eriterra.~ That is, were there firm measure- ments or tesr results rather than physicians' observations in in- terviews, physical examinations, or chart reviews, all techniques highly subject to great obsenxr variation and inaccuracy? If im- provement or relief from pain-a particularly soft (hard to quantify) outcome measure-had to be judged by observers: Was diere some systema,tic way of making an auessmmt? If lrrwo or more groufis uxrr cnmjaradfor sunnug ' was d+eis starlirg
Page 65: zjc02a00 Log in for more options!
~ ~ CHAYTER 5 point the same at onset? At diagnosis? At start of tnatment? Were thcy Jpdged by'the same disease alefinitions' at the stmi and the same merssures of seU[R~y ' afill ot3tcorAe? Did the intenention have the good resultr that uxre intended? Has there been an aaal>ration to sa whether it was a useful recull? Investigators often report that a drug or other measure has lowered blood cholesterol levels. Fine, but were t.hey able to show that it reduced the number of heart attacks? Or was reduc- tion of a supposed risk factor itself taken to mean the hoped-for outcome? That may' often be necessary, but the issue should be discussed. Investigators once repotted that a new heart drug reduced the number of recurrent myocardial infarctions (heart attacks), fatal and nonfatal. But total mortality for all causes was higher in the treated group than in a placebo group. Public health officials may announce the success of a cam- paign to take high blood pressure measurements: X number of people were found to be hypertensive and were referred to their doctors. But how many went to their doctors? How many of those received optimum treatment? Were their blood ptr.ssuress reduced? (If they were, the evidence is strong that they should suffer fewer strokes.). In short: What uxis the bottom line? Did you really do any good? To whom do your ruults apply? Can thry 6'e generarizod to a larger populhtion? Are your patieni,'t like the average dodor's patients? Is there any baszt in these findings for any patienl to ask his or her doctnr fof a change in treatment? Clinic populations, hospital populations, and the 'worst ca.ses" are not necessarily typical of patients in general4 and improper generalization is unfortunately common in the medical literature. Agarn and again, in many of the cases cited in dus chapter, ask: Do other sradies 6ack,ynu up? AnyKnir nnvlts consistent with other clinic.al and erldcrimertal ffndings? Have yoea ,eultr b+rrn erjGraled or Qt:ESrnoras Rf confirnud or suj, thesereS1llLs? Virtually studies add c criteria and tl in humans, a One s4e grab bag of s cumstances.' but consisten John Bailar t several low I i integrating ii than any on Mostly most impor these: What data neally. _ late6 6-won mad; x Dbes tlu and flarus in the inrxctigak Robert Bo: audacity ar. use gLalifiyv bound to i:: Ask tl Yrour vxmE b rienced sa ers genera •Frederii COtnTDM W7AW thmgho ocn:.
Page 66: zjc02a00 Log in for more options!
QUFSTIONS REPORTERS CAN ASK =fvmd or suppn.kd by otheff rtudirs? Or loar onry}m bixg cdlr m grr #UM .esutu? Virtually no single study proves anything. Two or 4 or 15 studies add credence, especially if the diagnostic and outcome criteria and the people studied are similar. Consistency of results in humans, aaimals, and laboratory tests also adds credence. One scientist warns, however, 'You have to be wary about a grab bag of studies with different populations and different cir- cvmstances.' To which Haazvard's Mostelltr adds, "Yes, be wary, but consistency across such differences cheers me up' And Dr John Bailar tells us that, despite possible pitfalls,,'mda-mraCysir of several low power reports"-that is, statistically analyzing and integrating their results-"may come to stronger eondusions than any one of them alone' (italics mine).• Mostly just good-sense questions? Of course. Some of the most important questions of all for a reporter to ponder are these: TNhot do I tlunk? Do the cvnclusions make snue to me? Do the data really justify the conclusions? If this person has extrapo- lated beyond the evidence, has he or she explained why and made sense?'• Does the irwatiqator fsankly' dawnent or dittuss the possibl'e biatts mid jaws in the study? A good scientific paper should do so. Does thr intxstigator admit that fhe coaclusian may be finlodue or euiuoca!? Dr. Robert Boruch of Northwestern University says, 'It requires audacity and some courage to say, 'I don't know.'" Do the wu11wn rca qualtfying pAemre? If such phrases are important, we are bound to indude them in any responsible story Ask the investigators themsd+ves: How much uxighe should yotv urosk be giuere? Is it mally fsrm? And how imporienO An expe- rienced science reporter says, `I have found that good' research- ers generally have an honest and proportionate view of their 'Frsderick Moudlv diugreea with my a.zsaional iefe+ence to good senx or common aens. If .omething is a commonusue ideam he says. 'wr* all would have dwughr of it. So it mun be uncammon .eiue after all.' He msia good'rn+e., M ® ® ® 0 ® ®
Page 67: zjc02a00 Log in for more options!
® so cKAF7ER 5 own work's importance." But there are many exceptions. Ask others in the same field: How do other infnrmed pmpk ngard this rrport - and lheu invustigators? Are they s fxaking ia their arvnm area of'eoertise, or have tliry shown roal mastery f they have rxntufed ouLtide it? Have theif paaY results generally held up.P And'ryliat an somr good'guestimu I tan ask them:?'True, a lot of brilliant and original work has been pooh-poohed for a time by others. Still, scientists survive only by eventually convincing their colleagues. More formally: H'as d6rrc been a nezricv of the data and cnnclusions by any duinkrestcd pwtus? Some major cljnicali studies are re- viewed~ by independent second~ parties or committees. Reports of the National Academy of Sciences must pass muster by a review conmrnittee. Has there Earn prn nview of the matmal? That is, has it been examined by referees who were sent the article by a journal editor? And, a very important question: Has the work bxn publishrd or accelbatd by a raputab7c journal? If not, why not.? The Ntw England formral of Malicirre prints only 15 percent of the papers submitted to it (many, of course,, are rejected because they are not of enough interest to the journal's readers). Many have been given at medical or scientific meetings, yet do not pass peer reviewers' or the editors''muster: Most are eventually published elsewhere, many in good journals. But there are journals and' journals. In science as a whole, including biology and often basic rnedical i sciences, &ience and the British Ntrtwe are indispensable. In general medicine and clinical science at the physician's level, the best, most useful journals are probably New England Journal of Medicine,, Joarnnl of the American Medua! Association, Annals of brtaaal Mulu:'ne„ C'anadian Mediialjournal; Journal of Clfnual hurs- tzgatiars, and the British Iaauer and Biitirh Med:ial Journal: There are many equally good' specialty journals as well as mediocre ones. In epidemiology, three good sources are Amencan Journal of Epidemiology, Journal of Chronic Dr'smses, and FRer.mtirX Madicone. Ask pe.ople in any field: What are the most reliable journals, those where you would want your work published? QuFSTtorvs ,cU Some of t are not jpurna like Family Prm mary articles f free-circulatior and medical rr revenue, are journals. The journais print ords of work JournaPs Dr. f Read the the investigat the article ha library, whid hospitals, an cieties. Too r conservative] i further in in1 to go ~ ti.ei rrv1eM ln put yo. . g read the arti Most re an arttcle, loor ~ •Arnec tician, and a ysis a.nd its c to detect tre: at least assu statistical an times. Som, . isn't identifi tics. • Table sions. Som
Page 68: zjc02a00 Log in for more options!
QUESTIONS REPOitTERS CAN ASK 61 Some of the most valuable joutnals to a medical reporter art not journals of original publication but review publications like Fcrnily Practuc and Hospital Practice, which mainly ptint sum• mary articles for practitioners. With some strong exceptions, the 5ce-cirtulation - also known as controlled-czrcvlation - jounaals and medical tnagazines, which depend wholly on adverttsing for revenue, are not as rigorousiy screened as the traditional journaIs. They are often on top of the news, however. All journals print clinkers sometimes. "Scientific joun-tals are rec- ords of work, not of revealed truth; says the New England forvnal's Dc Arnold Retman.'o Read the entire journal article yourself, if there is one. Ask the investigator for a copy or phone the journal. 0r, assuming the article has already been published, look for it at a medical library, which can be found at any medical college, most good hospitals, and the headquarters of many county medical so- cieties. Tioo many news releases tout artides that read far more conservatively than the PR version. Many scientists go much further in interviews or news conferences than they are w0ling to go in their articles. A reporter asked a scientist, `Does peer review of an article put you at case?" He said, "It should help put you at greater ease, but nothing puts me at ease until I've read the article" Most reporters can't be scientific referees, but uAen,yrou read an mtrclt, loakfor t1u Jollon,irT.: • A credit or footnote indicating eollaboration with a statis- ticians and a paragraph describing the method of statistical anal- ysis and its outcomes, such as Pvalue or confidence level, power to detect treatment effects, and so on. If they're in place, you can at least assume that some efforti was made to apply the rigors of statistical'analysis. If they're missing, should you beware? Some- times. Sometimes the statistician is a coauthor whose specialty isn't identified. tlnd~ some investigators are well versed in statis- tics. • Tables and figures that tell the same story as the conclu- sions. Sometimes they don't. One statistician told reporters, COMEMES&NO ® M M M
Page 69: zjc02a00 Log in for more options!
62 CH.APTF.R 5 "Don't assume that someone can interpret his own data. You may do better." And "muddle around in the footnotes and ap- pendices;' Mosteller advises. `You might find a few horrors. That's how people found out that a much publicized study of public and private schools induded only about 12 private, non- parochial schools." • Other things described in this chapter, such as the proto- col and study design, the criteria for admitting and ~ randomizing subjects, the therapy actually receive& (in contrast to that planned in the protocol); blinding, complications, loss to follow- up, follow-up time, and any discussion of reservations or weaknesses.. Ask, when appropriate: Where did the money to support the study come from? Many honest investigators are financed by companies that may profit from the outcome. So arr some dishonest or sr1f= delitding investigators. But the peddler of a biased point of view is as likely to be an antiestablishment crusa.der-or an academic ladder-climber-as a corporate darling. Perhaps the best ques- tion to ask yourself is„ Is this investigator a scientist or a sales- man? lm any case, the public should know any pertinent con- nections. `What proportion~ of papers will satisfy [all] the require- ments for scientific proof and clinical applicability?" Sackettt writes, "Not very many. ... After all; there arc only a handful of ways to do a study properl'y but a thousand ways to do it wrong.""' Despite impeccable designy some studies yield answers that turn out to be wrong. Some fail for lack of understanding of physiology and disease. Even the soundest studies may provoke contzoversy.No study settles anything for all time. And according to Sackett, some "may meet considerable resistance when they d'iscredit the only treatment currently available.... Clinicians may still elect to do something, even if it is of no demonstrable benefit. Study results may be rejected, QUEST1ONS F regardless o hood of thei Repon: everything some of the
Page 70: zjc02a00 Log in for more options!
rega:dltss of their merit, if they threaten the prestige or liveli- hood of their audience." Reporters need to tread a narrow path between betieving everything and believing nothing. Also-we are reporters- some of the controversies make important stories.
Page 71: zjc02a00 Log in for more options!
® ® Tests and Testing ® M Testing iu often the only way to answer, our questions, but it doest'i produce unauailable, universal truths that should be canved on stone tablets. Instead, testing produces statisucs,, which must be interpreted. Who knows when thou mayest be tested? -Roben Hooke -Ronald Arthur Hopwood DO physicians always know what they're doing when they admuuster tests? Stanford's Dr. Eugene Robin says many tests 'have not been properly evaluated' and in fact may be useless or harmful." He asks, "Is it common practice in medicine to per- form careful dinical trials before introducing tests that can affect the welfare of masses of patients? Sadly„ the answer is no:" A good test~ should~ detect both~health and disease and do so with high accuracy. The measures of the value of a ck'nical rrst; one used for medical diagnosis, are seruztiuity and specifuit}; or,, simply, the ability to avoid faLte negatirrrs and false po.Til:m: Snuah'r;- ity is how well a tesv identifies a disease or condition in those who have it-how well it avoidr folsa rugatiua, or missed cases. If 100 people with a condition are tested and 90'tesv positive, the test's sensitivity is 90' percent. Spiuzfuity is how well a test identifies those who do not have the disease or condition -how well it ruL_c out Jaltepositiucr, or mistaken identifications. If 100 healthy peo- ple are tested and 90 test negative, the test's specificity is 90 percent. Sau:'tiui~K in short, tells us about disazre present. Spai,ficity tells us about diauase absent. A highly unspecific test will produce many false positives; a highly insensitive test, many false nega- 64 TFSTs AND ~ T1 tives. Almost qualities-suc an overlap. 7 every c.a,se„th you willget.' labeling, the :' you wtll get. As a bor terms. ('So ~ . comments.) concept, the fact that tests person who t this: How ma biom thu:~ H; tests in the ] medical con( tried~ a as some , i. follow-up, 0 condition be subjects, ana tion frequer, How %+ false positiv( not to misss sitivity to p avoiding fa] anyway, on Doubt because in : acceptable " short, therclli uated hornC detected prtW .. . . ~:r. -
Page 72: zjc02a00 Log in for more options!
---
Page 73: zjc02a00 Log in for more options!
---

Text Control

Highlight Text:

OCR Text Alignment:

Image Control

Image Rotation:

Image Size: