**Page 1: wpd61e00**

A Study of the Models Used in the
Analysis of Certain i4edical Data
Ingram 01kin
Stanford University
1. An Overview
In the early years of statistical analysis.of biological data,
a number of single-variable models were studied in great detail -
e.g., the normal, logistic, gamma, negative_binomial, etc. More
recently, with the advent of computer technology and because most
data involves many variables, it has become more common to use
multivariate models, rather than univariate models. This led to
the development. of multivariate analogues of well known univariate
distributions. It is important to recognize that a single, unique
multivariate extension to a univariate model does not exist. In
fact, there may be many s'uch distributions. Very little research
has been done to compare some of the possible multivariate exten-
sions. Yet, each such extension could yield different interpre-
tations of actual data.
The multivariate normal distribution is the most widely accepted
extension of the univariate normal distribution. Several multi-
variate extensions of the exponential distribution, the gamma dis-
tribution and the beta distribution have been studied by Marshall
and 01kin (1967a), (1967b), and by 01kin (1964). In the context
of medical studies, Cornfield (1962) provides a multiple cross-
classification model. A general survey of multivariate distri-
butions is provided by Johnson and Kotz (1972). _
The purpose of the present study is to review this general '
field of multivariate models and their relation to epidemiological
studies.
2. Some Detailed Comments -- The Logistic Model
.
In studies such as the Frareingham heart-disease study a common
problem is to describe the way in which a set of variables X,...
h influences a binary variable Y. Here each X. represents a des-
n. Z
criptive health statistic (such as cholesterol), while Y assumes
the value 1 if the patient contracts the disease, or 0 if not.
On the assumption that the two populations (for Y=O and Y=1) are
nor-,:ially distributed, the risk is described by a logistic function.
This function has also been widely used by biologists studying
gro:l-ths of populations and species interaction and by economists.
Unfortunately the assumption of normality is rarely satisfied,
even approximately. Hence other models describing the influence
of the X variables on Y must be considered.
0
There are several possible approaches for modification and
~E';_LLIralization. One approach is to continue along the lines which
lead tc the logistic model i-,ithout the assumption of normality.

**Page 2: wpd61e00**

r
Another approach is that the logistic is a distribution in one
variable and hence a multivariate logistic may be the correct form
for several variables. A few multivariate logistic distributions
have been proposed, but little is known about their properties.
Finally, because"the risk of contracting a disease should increase
with the level of certain health measurements (such as cholesterol
count) this fact should be taken into account when examining the
data. One method for doing this is called isotonic regression and
nay prove quite useful in analyzing the data of the Framingham
heart study. -
The problem of constructing general joirit distributions of
variables is an important one. The key point is how to build into
the model the dependency between the variables. Obviously, if one
chooses convenient mathematical functions, the result may not
conform to reality. Thus, one needs to abstract properties from
applications and to then construct a connection that maintains these
properties. Another procedure is to use a physical model to
generate the dependence. Both -procedures need to be investigated
in developing a model for a jo.int distribution.
Part II, Supplement No. 11, 58-61. -
2. Johnson, N. L. and Kotz, S. (1972) Distributions in-Statistics:
Continuous Multivariate Distribution.
3. Marshall, A. W. and.0lkin, I. (1967) A multivariate exponeaLtial
distribution. J. Amer. Statist. Assoc. 30-44.
r -
4. Marshall, A. W. and 01kin, I. (1967) A generalized bivariate
exponential distribution. 3. Applied Prob. 291-302.
5. 01kin, I.
pendence (1964) Multivariate beta distributions and inde-
properties of the Wishart distribution. Ann.Math.
Statist. 35, 261-269.
l. Cornfield, J. (1962) Joint dependence of risk of coronary
heart disease on serum cholesterol and systolic blood pressure-
a discriminant function analysis. Federation Proceedings 21,