Jump to:

Philip Morris

Test of the Linear - No Threshold Theory of Radiation Carcinogenesis

Date: 10 May 1993 (est.)
Length: 19 pages
2501171317-2501171335
Jump To Images
snapshot_pm 2501171317-2501171335

Fields

Author
Cohen
Type
SCRT, REPORT, SCIENTIFIC
BIBL, BIBLIOGRAPHY
CHAR, CHART, GRAPH, TABLE, MAPS
LIST, LIST
Area
REIF,HELMUT/OFFICE
Attachment
2501171179/2501171407
Site
E5
Request
Stmn/R2-038
Named Organization
Epa, Environmental Protection Agency
US Bureau of Census
Named Person
Greenland
Morgenstern
Robins
Author (Organization)
Univ Pittsburgh
Master ID
2501171179/1407
Related Documents:
Litigation
Stmn/Produced
Date Loaded
05 Jun 1998
UCSF Legacy ID
yet32e00

Document Images

Text Control

Highlight Text:

OCR Text Alignment:

Image Control

Image Rotation:

Image Size:

Page 1: yet32e00 Log in for more options!
Test of the linear-no thershold theory of radiation carcinogenesis Prof. Bernard L. Cohen
Page 2: yet32e00 Log in for more options!
TEST OF THE LINEAR-NO THRESHOLD THEORY OF RADIATION CARCINOGENESIS Bernard L. Cohen University of Pittsburgh Pittsburgh, PA 15260, U.S.A. We recently completed a compilation of radon measurements from available sources which gives the average radon level, r, in homes for 1730 counties, well over half of all U.S. counties and comprising about 90% of the total U.S. population. Plots of age-adjusted lung cancer mortality rates, m, vs these r are shown in Fig. 1 a, c where, rather than showing individual points for each county we have grouped them into Intervals of r (shown on the base-1ine. along with the number of counties in each group) and we plot the mean value of m for each group, its standard deviation, and the first and third quartiles of the distribution. We see, in Fig. 1 a, c, a clear tendency for m to SLgcrease with increasing r, In sharp contrast to the increase expected from the fact that radon can cause lung cancer, shown by the line labelled "theory". One obvious problem is migration: people do not spend their whole life and receive all of their radon exposure in their county of residence at time of death. However, it is easy to correct the theoretical predication for this, and the "theory" , :
Page 3: yet32e00 Log in for more options!
2 lines in Fig. 1 have been corrected. As part of this correction, data for Florida, California, and Arizona, where many people move after retirement, have been deleted, reducing the number of counties to 1601. (This deletion does not affect results.) A more serious problem is that Fig. 1 is what epidemiologists call an "ecological study". Epidemiologists normally study the relationship between mortality risks to Individuals, m', vs their personal exposure, r', whereas an ecological study like ours deals with the relationship between the average risk to groups of individuals (populations of counties) and their average exposure. It is well known to epidemiologists that, in general, the average dose does = determine the average risk, and to assume otherwise is called "the ecological faliacy". However, it is easy to show2 that, In testing a linear-no threshold theory, "the ecological falfacy" does not apply; in that theory, the average dose ~= determine the average risk. This is widely recognized from the fact that "person-rem" determines the number of deaths. Dividing person-rem by population gives average dose, and dividing number of deaths by population gives mortality rate. Because of the "ecoiogical fa1lacy", epidemiology textbooks often state that an ecological study cannot determine a causal relationship between risk and exposure. That may be true, but it is Irrelevant here because the purpose of our study is = to determine a causal relationship; it is rather to test the linear-no threshold dependence of m on r. s
Page 4: yet32e00 Log in for more options!
3 Apart from "the ecological fallacy", other potential problems with ecological studies have been pointed out by Morgenstern, Greenland, and Robins'•`•'5 but these have been shown not to be applicable to our work2-°•'. The most obvious potential explanation for Fig. i Is that there is a strong negative correlation between the percentage of adult population that smokes. S, and radon exposure, r; i.e. that counties with low r tend strongly to have high S, and vice- versa. This effect is most easily handled by use of the BEIR-IV theorya which can be shown to give m/ =a{i +br') (1) where ml is the lung cancer mortality risk to an individual, r' is that individual's radon exposure, and a and b are constants with a given separately for smokers and non- smokers (a,, a„) and for males and females. If we sum over all individuals in a county and divide by the population, Eq. (1) reduces to m = [Sa, + ('1-S)a„] ('1 +br). (2) Applying our correction for migration and inserting numerical values for a, and a„ then leads to9 m/m, = 1 + Br (3) where me = 9 + 0.995 for males m, = 3.7 + 0.32S for females (4) B = +7.3 _%j © ~ ~ Q
Page 5: yet32e00 Log in for more options!
4 with B in units of percent per pCi/L of average radon level, and m, in units of deaths/year-9 00,000. In Eq. (3), m/mo may be thought of as the lung cancer mortality rate corrected for smoking prevalence. Problems in determining S wiil be discussed below. Using our best values to calculate m, from Eq. (4) for each county leads to results shown in Fig. 1 b, d. We see that correcting for smoking does little to improve the unexpected behavior. Fitting the data to mlm,=A+Br (5) to determine A and B gives B=-7.3 t 0.6 for males and B=-8.3 t 0.8 for females, as compared with the Eq. (4) theory prediction B = + 7.3, a discrepancy of about 20 standard deviations. We refer to this as "our discrepancy", and the remainder of this paper deals with our attempts to explain it, each section treating a different approach. Uncertainties in radon data Our radon data derives from three independent sources, our own measurements, EPA measurements, and studies by agencies in various individual states. Various checks for consistency among these three sources give satisfactory resuits'. Data from each of these three sources alone gives results for B very similar to those from our combined data set. We conclude that uncertainties In our r-values are not responsible for any significant part of our discrepancy. In fact the simplest correction for these uncertainties would '!ncrease our discrepancy by about 8%. Outlvers and samoling issues N Un s L7 ~ .... ~
Page 6: yet32e00 Log in for more options!
5 The effects of outlying points in our analyses of data on m/mo vs r was investigated by using five of the most popular statistical tests to discard either 10 or 20 outlyers. In all cases, for both males and females, this ~n_creased our discrepancy. Outlyers were not discarded. Ten different random samples each of 200, 400, and 800 or our 1601 counties . were analyzed independently. In all cases, results for B were quite similar to those for our entire data set, B =-7.3 for males and -8.3 for females. For example, for our ten random sets of 200 counties, all B values were between -5.0 and -8.5 for males, and between -4.8 and -i 2.7 for females. Our study might therefore be considered equivalent to eight independent studies, each giving roughly the same discrepancy with theory. One might wonder how unexpected it is to find such a strong and statistically robust correlation between m and r as we find for lung cancer in Fig. 1. To investigate this, we studied the regression of m on r for the 33 principal cancer types. The number of standard deviations by which the slope B differs from zero was 2.7 times larger for lung cancer than for any other type, and with just two exceptions it was at least 4 times larger. Double regressions on r and S gave similar results; as expected, the rn-S correlation is very large and positive for lung cancer, and the m-r correlation is large (two-thirds as large as m-S). The only unexpected result was that the m-r correlation is negative rather than positive. We conclude that the strong observed correlation between m and r for lung cancer is quite unique and remarkable. Uncertainties in smoking prevatence. S , NJ LM M ~ ~ _%J 0 ~ ~ ~
Page 7: yet32e00 Log in for more options!
6 Our S values were derived from a 1985 surveyi0 of smoking prevalence in states, S', corrected for variations with time in national smoking prevalence" under the assumption that the ratio of S' for various states did not vary with time. It was then assumed that S values for the counties within a state are due only to urban-rural differences. That is, we take S = S'('1 +kPU)I{i +kPU'j, where PU is the percent of the population that lives in urban areas for the county, PU' is the same quantity for the state as a whole, and k is a constant determined from regressions of m on PU (k was found to be similar for all geographic regions). An alternative method for determining S' values for states was by use of cigarette sales tax coilections12 which are available for every year. This has the advantage of giving data for the relevant time periods and also reflects the number of cigarettes smoked rather than just the number of smokers, although it also has some recognized disadvantages. When these values of S' were used, our discrepancy was !n_creased. They were not used further. As an approach to getting direct data on S for counties in the relevant time period with due consideration for intensity of smoking (e.g. inhalation, cigarettes per day), we developed a smoking variable S derived from lung cancer mortality data. We utilized socioeconomic variables (SEV) listed in Table 1 plus S'to predict m-values in a manner independent of radon levels, r. We stratified on r into six separate groups of counties, and for each group independently, studied multiple regressions of m on SEV. We were able to derive a linear combination of S' plus five SEV with coefficients independent of r, which predict rn-values about as well as they can be .
Page 8: yet32e00 Log in for more options!
7 predicted from SEV. When S values derived from this process are used to calculate ma from Eq. (4), and these are then used to fit the data to Eq. (5), B values are changed from -7.3 to -6.0 for males, and from -8.3 to -6.3 for females. Since this represents only a modest reduction in our discrepancy, and since it is questionable to use S-values derived from rn-values to predict rn-values, these S-values were not used in our other studies. But this exercise Indicates that the obvious problems in our derivation of S-values are not the cause of our discrepancy. As an entirely different approach to evaluating effects of uncertain S-values, we then set out to determine how strong a negative r-S correlation would be needed to explain our discrepancy. We re-assigned S-values for our 1601 counties in perfect reverse order of their r-values, and used these S-values in our analysis. This "perfect" negative r-S correlation reduced our B-values essentially to zero (+ 0.7 for males, -0.3 for females), only cutting our discrepancy in half. The problem is that our distribution of S-values is rather narrow - for males, mean - 51.7, SD - 6.9, minimax - 25170. If we arbitrarily double the width of this distribution by doubling the difference from the mean for each county to give mean - 51.7, SD - 13,8, minlmax - 0/88, we are able to eliminate our discrepancy by reassigning S-values in a manner that gives the coefficient of correlation (CORR) between S and r to be -0.90. We then consider the question of how strong an r-S correlation is credible. Since any such correlation must arise from confounding by socioeconomic variables, we studied correlations of our 54 SEV (Table 1) with r. The largest I CORR-rI for any of our SEV is 0.37, the second largest is 0.30, and for 49 of our 54 SEV, CORR-r is 4
Page 9: yet32e00 Log in for more options!
8 less than 0.23. For the S-values we are using, CORR-r is -0.28 for males and -0.19 for femates. It therefore seems incredibie that the true r-S correlation can be of the magnitude necessary to explain our discrepancy, even if coupled with a large error in the width of our distribution of S-values. We conclude that uncertainties in S-values are not a major cause of our discrepancy. Confounding by SEV and factors that correlate with them If a particular socioeconomic variable (SEV) is an important confounding factor (CF}, stratifying our data on it into subsets and analyzing each subset separately would greatly reduce the problem as all counties in a given subset would have approximately the same value of that SEV. The average of the B-values obtained from the various subsets would then give a value of B free from the effects of confounding. The data were stratified into five quintiles of 1 fi01 /5 =320 counties on the basis of each of our 54 SEV in turn. This gave 540 subsets (including both sexes), and for all 540 of them, B was found to be negative. Thus, the negative slopes in Fig. 1 b, d are found if we consider only the most urban counties, or if we consider only the most rural; if we consider only the richest, or only the poorest; if we consider only those with the best medical care, or only those with the poorest medical care; etc for our 54 SEV. They are also found if'we consider any of the strata in between. Following up on our method of averaging B-vaiues over the five quintiles to obtain B-values free of confounding gives, for our 54 SEV, results ranging between -5.6 and -7.7 for males, and between -5.4. and -9.1 for females, reasonably close to S , ; ,
Page 10: yet32e00 Log in for more options!
9 our values for the entire data set, -7.3 and -8.3. We conclude that confounding by any one of our SEV can do little to explain our discrepancy. This also excludes factors that correlate strongly with SEV as potential CF. For example, air pollution correlates strongly with several of our SEV (e.g. population) and therefore cannot be an important CF. Confounding by combinations of SEV This still leaves open the possibility that some combination of SEV can explain our discrepancy. The best way to investigate this is through multiple regression analysis, fitting our data to m/mo = A + Br + c,X, + c2X2 +... + cS4X6, (7) where Xj...X54 are our 54 socioeconomic variables and A, B, c,...cS4 are constants used to fit the data. With 1601 data points, there is no difficulty in deriving statistically robust estimates of these 56 constants. The results are B =-3.1 t0.fi for males, and B=-3.5 t 0.9 for females, reducing our discrepancy by 29% and 31 % respectively. However, the statistics community generally takes a dim view of using multiple . regression on many variables to quantify the causal relationship of one particular variable. In our case, the strong negative correlation between m and r would cause any variable strongly correlated with m to have a correlation of opposite sign with r. In fitting Eq. (7), its term will therefore drain away some of the strength of the Br term, reducing the value of B.

Text Control

Highlight Text:

OCR Text Alignment:

Image Control

Image Rotation:

Image Size: