Philip Morris
Test of the Linear - No Threshold Theory of Radiation Carcinogenesis
Fields
- Author
- Cohen
- Type
- SCRT, REPORT, SCIENTIFIC
- BIBL, BIBLIOGRAPHY
- CHAR, CHART, GRAPH, TABLE, MAPS
- LIST, LIST
- BIBL, BIBLIOGRAPHY
- Area
- REIF,HELMUT/OFFICE
- Attachment
- 2501171179/2501171407
- Site
- E5
- Request
- Stmn/R2-038
- Named Organization
- Epa, Environmental Protection Agency
- US Bureau of Census
- Named Person
- Greenland
- Morgenstern
- Robins
- Morgenstern
- Author (Organization)
- Univ Pittsburgh
- Master ID
- 2501171179/1407
Related Documents:- 2501171179-1183 Is the Concept of Linear Relationship Between Dose and Effect Still A Valid Model for Assessing Risk Related to Low Doses of Carcinogens?
- 2501171184-1186 the Causes and Prevention of Cancer
- 2501171187-1194 How Biologically Based Models May Help Extrapolating Cancer Risk to Low Doses
- 2501171195-1213 A Critical Study of Methods of Assessment of Effects of Low Doses
- 2501171214-1258 Do Rodent Studies Predict Human Cancers?
- 2501171259-1262 the Delaney Clause - Linchpin of the Environmental Policy Edifice
- 2501171263-1269 Toxic Policy at Dead End: the Case of Arsenic
- 2501171270-1286 the Asbestos Example
- 2501171287-1301 the Case of Chlorine and Derivated Products (Vcm)
- 2501171302-1316 the Ddt : Example
- 2501171336-1354 Bladder Cancer in Rats Fed Sodium Saccharin - Mechanistic Data and Their Application in Risk Analysis
- 2501171355-1384 Environmental Tobacco Smoke and Lung Cancer Approaches to Risk Management
- 2501171385-1389 Endeavouring New Shores in the Estimation and Assessment of the Cancer Risk by Environment Materials (Abstract)
- 2501171390-1404 Health Effects of Historical Exposures to Asbestos
- 2501171405-1407 Exposure - Response : Asbestos and Mesothelioma
- Litigation
- Stmn/Produced
- Date Loaded
- 05 Jun 1998
- UCSF Legacy ID
- yet32e00
Document Images
Test of the linear-no thershold theory
of radiation carcinogenesis
Prof. Bernard L. Cohen

TEST OF THE LINEAR-NO THRESHOLD THEORY OF RADIATION CARCINOGENESIS
Bernard L. Cohen
University of Pittsburgh
Pittsburgh, PA 15260, U.S.A.
We recently completed a compilation of radon measurements from available
sources which gives the average radon level, r, in homes for 1730 counties, well over
half of all U.S. counties and comprising about 90% of the total U.S. population. Plots
of age-adjusted lung cancer mortality rates, m, vs these r are shown in Fig. 1 a, c
where, rather than showing individual points for each county we have grouped them
into Intervals of r (shown on the base-1ine. along with the number of counties in each
group) and we plot the mean value of m for each group, its standard deviation, and
the first and third quartiles of the distribution. We see, in Fig. 1 a, c, a clear tendency
for m to SLgcrease with increasing r, In sharp contrast to the increase expected from
the fact that radon can cause lung cancer, shown by the line labelled "theory".
One obvious problem is migration: people do not spend their whole life and
receive all of their radon exposure in their county of residence at time of death.
However, it is easy to correct the theoretical predication for this, and the "theory"
,
:

2
lines in Fig. 1 have been corrected. As part of this correction, data for Florida,
California, and Arizona, where many people move after retirement, have been deleted,
reducing the number of counties to 1601. (This deletion does not affect results.)
A more serious problem is that Fig. 1 is what epidemiologists call an "ecological
study". Epidemiologists normally study the relationship between mortality risks to
Individuals, m', vs their personal exposure, r', whereas an ecological study like ours
deals with the relationship between the average risk to groups of individuals
(populations of counties) and their average exposure. It is well known to
epidemiologists that, in general, the average dose does = determine the average risk,
and to assume otherwise is called "the ecological faliacy". However, it is easy to
show2 that, In testing a linear-no threshold theory, "the ecological falfacy" does not
apply; in that theory, the average dose ~= determine the average risk. This is
widely recognized from the fact that "person-rem" determines the number of deaths.
Dividing person-rem by population gives average dose, and dividing number of deaths
by population gives mortality rate.
Because of the "ecoiogical fa1lacy", epidemiology textbooks often state that an
ecological study cannot determine a causal relationship between risk and exposure.
That may be true, but it is Irrelevant here because the purpose of our study is = to
determine a causal relationship; it is rather to test the linear-no threshold dependence
of m on r.
s

3
Apart from "the ecological fallacy", other potential problems with ecological
studies have been pointed out by Morgenstern, Greenland, and Robins'`'5 but these
have been shown not to be applicable to our work2-°'.
The most obvious potential explanation for Fig. i Is that there is a strong
negative correlation between the percentage of adult population that smokes. S, and
radon exposure, r; i.e. that counties with low r tend strongly to have high S, and vice-
versa. This effect is most easily handled by use of the BEIR-IV theorya which can be
shown to give
m/ =a{i +br')
(1)
where ml is the lung cancer mortality risk to an individual, r' is that individual's radon
exposure, and a and b are constants with a given separately for smokers and non-
smokers (a,, a) and for males and females. If we sum over all individuals in a county
and divide by the population, Eq. (1) reduces to
m = [Sa, + ('1-S)a] ('1 +br). (2)
Applying our correction for migration and inserting numerical values for a, and a then
leads to9
m/m, = 1 + Br (3)
where
me = 9 + 0.995 for males
m, = 3.7 + 0.32S for females (4)
B = +7.3
_%j
©
~
~
Q

4
with B in units of percent per pCi/L of average radon level, and m, in units of
deaths/year-9 00,000. In Eq. (3), m/mo may be thought of as the lung cancer
mortality rate corrected for smoking prevalence.
Problems in determining S wiil be discussed below. Using our best values to
calculate m, from Eq. (4) for each county leads to results shown in Fig. 1 b, d. We
see that correcting for smoking does little to improve the unexpected behavior. Fitting
the data to
mlm,=A+Br (5)
to determine A and B gives B=-7.3 t 0.6 for males and B=-8.3 t 0.8 for females, as
compared with the Eq. (4) theory prediction B = + 7.3, a discrepancy of about 20
standard deviations. We refer to this as "our discrepancy", and the remainder of this
paper deals with our attempts to explain it, each section treating a different approach.
Uncertainties in radon data
Our radon data derives from three independent sources, our own
measurements, EPA measurements, and studies by agencies in various individual
states. Various checks for consistency among these three sources give satisfactory
resuits'. Data from each of these three sources alone gives results for B very similar
to those from our combined data set. We conclude that uncertainties In our r-values
are not responsible for any significant part of our discrepancy. In fact the simplest
correction for these uncertainties would '!ncrease our discrepancy by about 8%.
Outlvers and samoling issues
N
Un
s L7
~
....
~

5
The effects of outlying points in our analyses of data on m/mo vs r was
investigated by using five of the most popular statistical tests to discard either 10 or
20 outlyers. In all cases, for both males and females, this ~n_creased our discrepancy.
Outlyers were not discarded.
Ten different random samples each of 200, 400, and 800 or our 1601 counties
. were analyzed independently. In all cases, results for B were quite similar to those
for our entire data set, B =-7.3 for males and -8.3 for females. For example, for our
ten random sets of 200 counties, all B values were between -5.0 and -8.5 for males,
and between -4.8 and -i 2.7 for females. Our study might therefore be considered
equivalent to eight independent studies, each giving roughly the same discrepancy
with theory.
One might wonder how unexpected it is to find such a strong and statistically
robust correlation between m and r as we find for lung cancer in Fig. 1. To
investigate this, we studied the regression of m on r for the 33 principal cancer types.
The number of standard deviations by which the slope B differs from zero was 2.7
times larger for lung cancer than for any other type, and with just two exceptions it
was at least 4 times larger. Double regressions on r and S gave similar results; as
expected, the rn-S correlation is very large and positive for lung cancer, and the m-r
correlation is large (two-thirds as large as m-S). The only unexpected result was that
the m-r correlation is negative rather than positive. We conclude that the strong
observed correlation between m and r for lung cancer is quite unique and remarkable.
Uncertainties in smoking prevatence. S
, NJ
LM
M
~
~
_%J
0
~
~
~

6
Our S values were derived from a 1985 surveyi0 of smoking prevalence in
states, S', corrected for variations with time in national smoking prevalence" under
the assumption that the ratio of S' for various states did not vary with time. It was
then assumed that S values for the counties within a state are due only to urban-rural
differences. That is, we take S = S'('1 +kPU)I{i +kPU'j, where PU is the percent of
the population that lives in urban areas for the county, PU' is the same quantity for
the state as a whole, and k is a constant determined from regressions of m on PU (k
was found to be similar for all geographic regions).
An alternative method for determining S' values for states was by use of
cigarette sales tax coilections12 which are available for every year. This has the
advantage of giving data for the relevant time periods and also reflects the number of
cigarettes smoked rather than just the number of smokers, although it also has some
recognized disadvantages. When these values of S' were used, our discrepancy was
!n_creased. They were not used further.
As an approach to getting direct data on S for counties in the relevant time
period with due consideration for intensity of smoking (e.g. inhalation, cigarettes per
day), we developed a smoking variable S derived from lung cancer mortality data. We
utilized socioeconomic variables (SEV) listed in Table 1 plus S'to predict m-values in
a manner independent of radon levels, r. We stratified on r into six separate groups
of counties, and for each group independently, studied multiple regressions of m on
SEV. We were able to derive a linear combination of S' plus five SEV with
coefficients independent of r, which predict rn-values about as well as they can be
.

7
predicted from SEV. When S values derived from this process are used to calculate
ma from Eq. (4), and these are then used to fit the data to Eq. (5), B values are
changed from -7.3 to -6.0 for males, and from -8.3 to -6.3 for females. Since this
represents only a modest reduction in our discrepancy, and since it is questionable to
use S-values derived from rn-values to predict rn-values, these S-values were not used
in our other studies. But this exercise Indicates that the obvious problems in our
derivation of S-values are not the cause of our discrepancy.
As an entirely different approach to evaluating effects of uncertain S-values, we
then set out to determine how strong a negative r-S correlation would be needed to
explain our discrepancy. We re-assigned S-values for our 1601 counties in perfect
reverse order of their r-values, and used these S-values in our analysis. This "perfect"
negative r-S correlation reduced our B-values essentially to zero (+ 0.7 for males, -0.3
for females), only cutting our discrepancy in half. The problem is that our distribution
of S-values is rather narrow - for males, mean - 51.7, SD - 6.9, minimax - 25170. If
we arbitrarily double the width of this distribution by doubling the difference from the
mean for each county to give mean - 51.7, SD - 13,8, minlmax - 0/88, we are able
to eliminate our discrepancy by reassigning S-values in a manner that gives the
coefficient of correlation (CORR) between S and r to be -0.90.
We then consider the question of how strong an r-S correlation is credible.
Since any such correlation must arise from confounding by socioeconomic variables,
we studied correlations of our 54 SEV (Table 1) with r. The largest I CORR-rI for any
of our SEV is 0.37, the second largest is 0.30, and for 49 of our 54 SEV, CORR-r is
4

8
less than 0.23. For the S-values we are using, CORR-r is -0.28 for males and -0.19
for femates. It therefore seems incredibie that the true r-S correlation can be of the
magnitude necessary to explain our discrepancy, even if coupled with a large error in
the width of our distribution of S-values. We conclude that uncertainties in S-values
are not a major cause of our discrepancy.
Confounding by SEV and factors that correlate with them
If a particular socioeconomic variable (SEV) is an important confounding factor
(CF}, stratifying our data on it into subsets and analyzing each subset separately
would greatly reduce the problem as all counties in a given subset would have
approximately the same value of that SEV. The average of the B-values obtained from
the various subsets would then give a value of B free from the effects of confounding.
The data were stratified into five quintiles of 1 fi01 /5 =320 counties on the
basis of each of our 54 SEV in turn. This gave 540 subsets (including both sexes),
and for all 540 of them, B was found to be negative. Thus, the negative slopes in Fig.
1 b, d are found if we consider only the most urban counties, or if we consider only
the most rural; if we consider only the richest, or only the poorest; if we consider only
those with the best medical care, or only those with the poorest medical care; etc for
our 54 SEV. They are also found if'we consider any of the strata in between.
Following up on our method of averaging B-vaiues over the five quintiles to
obtain B-values free of confounding gives, for our 54 SEV, results ranging between
-5.6 and -7.7 for males, and between -5.4. and -9.1 for females, reasonably close to
S
,
;
,

9
our values for the entire data set, -7.3 and -8.3. We conclude that confounding by
any one of our SEV can do little to explain our discrepancy.
This also excludes factors that correlate strongly with SEV as potential CF. For
example, air pollution correlates strongly with several of our SEV (e.g. population) and
therefore cannot be an important CF.
Confounding by combinations of SEV
This still leaves open the possibility that some combination of SEV can explain
our discrepancy. The best way to investigate this is through multiple regression
analysis, fitting our data to
m/mo = A + Br + c,X, + c2X2 +... + cS4X6, (7)
where Xj...X54 are our 54 socioeconomic variables and A, B, c,...cS4 are constants
used to fit the data. With 1601 data points, there is no difficulty in deriving
statistically robust estimates of these 56 constants. The results are B =-3.1 t0.fi for
males, and B=-3.5 t 0.9 for females, reducing our discrepancy by 29% and 31 %
respectively.
However, the statistics community generally takes a dim view of using multiple .
regression on many variables to quantify the causal relationship of one particular
variable. In our case, the strong negative correlation between m and r would cause
any variable strongly correlated with m to have a correlation of opposite sign with r.
In fitting Eq. (7), its term will therefore drain away some of the strength of the Br
term, reducing the value of B.
