Inputs:
The CDP software requires a
single parameter file to specify hyperparameters, file locations,
initial values, etc. This file is called parameters.txt.
To get a default parameters.txt file, from a command
prompt in the same directory as the executable, issue the command
"cdp -default", where cdp is the name of your CDP
executable. Note that this file will be called
default.parameters.txt. You will need to rename the file
parameters.txt in order to conduct an analysis.
The format of each line of the file is simply NAME =
VALUE, and lines beginning with a # sign are ignored as
comments. The order of the entries does not matter, and the values are
not case sensitive.
In your own analyses, it is only necessary to specify
values for parameters whose values you want to change from the default
setting appearing in the default.parameters.txt file (see the Examples page for examples of this). In
particular, we recommend using the default settings for the sampling
steps (all parameters sampled) and initial
values sections (initial values are obtained by sampling from the
prior distributions).
The values to be specified are the following:
Data Section
- N: Integer, the number of observations/data points
- D: Integer, the dimension of each observation
- DataFile: String, the path to a file containing the data. Each
row of this file should correspond to a single observation. Entries
should be separated by spaces, not commas. For example, if you have
10 observations of a 2-dimensional response, the file should have 10
rows with each row consisting of two numbers separated by a space.
Prior Section
- J: Integer, the maximum number of clusters, i.e. the truncation
point of the countably infinite top-layer mixture.
- T: Integer, the maximum number of normal components per cluster
(i.e. the truncation points of the countably infinite bottom-layer
mixtures)
- m0: D Doubles separated by spaces, the mean of the
normal prior on cluster location parameters m_j (i.e. m_j ~ N(m0,Phi0))
- phi0: Double, specifies the diagonal entries of the
covariance of the normal prior on cluster location parameters m_j
(i.e. Phi0 = phi0*I)
- lambda0: Double, specifies the diagonal entries of the
scale matrix parameter of the Wishart prior on cluster shape parameters Phi_j
(i.e. Phi_j ~ Wishart(nu0+D,lambda0*I/(nu0+D)). Note that under this
parametrization of the Wishart distribution, lambda0*I is the expected
value of each Phi_j).
- nu0: Integer, the positive degrees of freedom of the
Wishart prior on cluster shape parameters Phi_j.
- gamma: Positive Double, part of the
normal-inverse-Wishart prior on bottom level component locations and
shapes. Specifies how spread out about
the cluster location m_j the components within cluster j can be:
mu_j,t ~ N(m_j,gamma*Sigma_j,t)
- nu: Integer, the positive degrees of freedom of the
Inverse Wishart prior on bottom level component shape parameters
Sigma_j (i.e. Sigma_j,t ~ Inv-Wishart(nu+2,nu*Phi_j). Note that under this
parametrization of the Inv-Wishart distribution, Phi_j is the expected
value of Sigma_j,t)
- e0: Double, shape parameter of the gamma prior on
top-level DP scale parameter alpha0 (alpha0 ~ Gamma(e0,f0)). Note
that higher values of alpha0 result in greater numbers of clusters.
- f0: Double, scale parameter of the gamma prior on
top-level DP scale parameter alpha0 (alpha0 ~ Gamma(e0,f0)). Note
that higher values of alpha0 result in greater numbers of clusters.
- ee: Double, shape parameter of the gamma priors on
bottom-level DP scale parameters alpha_j (alpha_j ~ Gamma(ee,ff)). Note
that higher values of alpha_j result in greater numbers of normal
components in cluster j.
- ff: Double, scale parameter of the gamma priors on
bottom-level DP scale parameters alpha_j (alpha_j ~ Gamma(ee,ff)). Note
that higher values of alpha_j result in greater numbers of normal
components in cluster j.
MCMC Section
- burnin: Integer, the number of initial MCMC iterations to be discarded
- iter: Integer, the number of MCMC iterations to be collected after
the burnin phase.
- seed: Integer, the random number seed (for repeatability)
The following two sections of the file contain advanced/debugging
options that most users will not need to alter. The Sampling steps
section allows you to specify which parameters are to be sampled,
and which are to be held at fixed values. The initial values section
allows you to specify initial values for all sampled parameters.
This is useful for extending a previous MCMC.
Sampling Steps Section
- samplem: Binary Integer (1|0), specifies whether (1) or
not (0) the top level location variables should be sampled.
- samplePhi: Binary Integer (1|0), specifies whether (1) or
not (0) the top level shape variables should be sampled.
- samplew:Binary Integer (1|0), specifies whether (1) or
not (0) the top level component membership variables should be sampled.
- sampleq:Binary Integer (1|0), specifies whether (1) or
not (0) the top level component weights weights should be sampled.
- samplealpha0:Binary Integer (1|0), specifies whether (1)
or not (0) the top level DP scale parameter shoudl be sampled.
- samplemu:Binary Integer (1|0), specifies whether (1) or
not (0) the bottom level location variables should be sampled.
- sampleSigma:Binary Integer (1|0),specifies whether (1) or
not (0) the bottom level shape variables should be sampled.
- samplek:Binary Integer (1|0), specifies whether (1) or
not (0) the bottom level membership variables should be sampled.
- samplep:Binary Integer (1|0), specifies whether (1) or
not (0) the bottom level component weights should be sampled.
- samplealpha:Binary Integer (1|0), specifies whether (1)
or not (0) the bottom level DP scale parameters should be sampled.
Initial Values Section (see the Outputs page for the proper formatting of
these files)
- Alpha0file: String, file containing an initial value for
the top level DP scale parameter.
- Mfile: String, file containing initial values for the top
level location parameters.
- Phifile: String, file containing initial values for the
top level shape parameters.
- Wfile: String, file containing initial values for the top
level membership variables.
- Qfile: String, file containing initial values for the top
level component weights.
- qVfile: String, file containing initial values for the
top level strick breaking parameters.
- Alphafile: String, file containing initial values for the
bottom level DP scale parameters.
- Mufile: String, file containing initial values for the
bottom level component locations parameters.
- Sigmafile: String, file containing initial values for the
bottom level component shape parameters.
- Kfile: String, file containing initial values for the
bottom level component membership variables.
- Pfile: String, file containing initial values for the
bottom level component weights.
- pVfile: String, file containing initial values for the
bottom level stick breaking parameters.
|