Outputs:
The CDP software creates a number of
output files in the directory in which it is run. These are the
following:
Sampled Mixture Model Parameters
One important feature of the model is the posterior predictive
distribution on top-level components. This distribution is
essentially a mixture of T distributions dependent on the sampled
values of m, Phi, and q (see
discussion paper for more details).
- postm.txt: posterior samples of the top-level location variables.
For each of the iter MCMC iterations, all J cluster
location parameters (i.e. all m_j) are printed. Thus the first
J rows contain m_1 ... m_J for iteration 1, the next
J rows contain m_1 ... m_J for iteration 2, and so on.
In R,
you can load these values into an iter x J x D
dimensional array using
m = aperm(array(scan("postm.txt"),c(D,J,iter)),3:1).
- postPhi.txt: posterior samples of the top-level shape
variables (i.e. Phi_j). Each row consists of a D x D
matrix printed out in row major order. The first J rows
contain Phi_1 ... Phi_J for iteration 1, the next J rows
contain Phi_1 ... Phi_J for iteration 2, and so on.
In R, you can
load these values into an iter x J x D x
D dimensional array using
Phi = aperm(array(scan("postPhi.txt"),c(D,D,J,iter)),4:1)
- postq.txt: posterior samples of the top-level weights.
The first row consists of J weights (q_1, q_2, ..., q_J) for
iteration 1, and so on.
In R, you can load these values into a iter x J
dimensional array using
q = aperm(array(scan("postq.txt"),c(J,iter)),2:1)
Another important feature of the model is the posterior predictive
distribuiton based on the bottom-level components. This distribution
is essentially a mixture of normal distributions, based on the sampled
values of mu, Sigma, and p (see dicussion paper for more details).
- postmu.txt: posterior samples of the bottom-level
location variables. For each
iteration of MCMC, all J x T component location
parameters parameters are printed. The first J x T
rows contain mu_1,1, ... mu_1,T, mu_2,1, ... mu_2,T,
... mu_J,1,...mu_J,T for the first iteration of MCMC and so on.
In R, you can load these values into an iter x
JT x D dimensional array using
mu = aperm(array(scan("postmu.txt"),c(D,J*T,iter)),3:1)
- postSigma.txt: posterior samples of the bottom-level
shape variables (i.e. all Sigma_j,t). Each row consists of a D x D
matrix printed out in row major order. The first J x
T rows contain Sigma_1,1, ...Sigma_1,T, Sigma_2,1,
... Sigma_2,T, ... Sigma_J,1, ... Sigma_J,T from the first iteration
of MCMC, and so on.
In R, you can load these values into an iter x
JT x D x D dimensional array using
Sigma = aperm(array(scan("postSigma.txt"),c(D,D,J*T,iter)),4:1)
- postp.txt: posterior samples of the bottom-level
component weights. Each row consists of the T weights
associated with a particular top level component. The first
J x T rows contain p_1,1 ... p_1,T, p_2,1 ... p_2,T,
... p_J,1 ... p_J,T for the first iteration of MCMC, and so on.
Note: these weights have been scaled so that p_1,1 + p_1,2 +
... p_J,T = 1. Or in other words, the values printed in this file are
actually p_j,t' where p_j,t' = p_j,t * q_j. This was done to
facilitate the interpretation of the predictive distribution at
specific iteration of MCMC as a J x T component
mixture of normal distributions.
In R, you can load these values into an iter x
JT dimensional array using
p = aperm(array(scan("postp.txt"),c(J*T,iter)),2:1)
Another useful summary of the model are the observation-specific
mean cluster locations and the observation-specific mean component
locations. These may be regarded as compressed representations of the
original data (see discussion paper for more details).
- postxmbar.txt: Mean cluster location for each
observation. Each row i contains the mean value of m_{w_i} for
observation i, averaged over the course of the MCMC.
In R, you can load these values into an N x D
dimensional array using
xm = read.table("postxmbar.txt")
- postxmubar.txt: Mean component location for each
observation. Each row i contains the mean value of mu_{w_i,k_k} for
observation i, averaged over the course of the MCMC.
In R, you can load these values into an N x D
dimensional array using
xmu = read.table("postxmubar.txt")
Last Parameter Values
The following files contain the last sampled values of all model
parameters. The format of these files is the same as that expected
when specifying initial values of the MCMC as described on the Inputs page.
- lastm.txt: J x D matrix, with row j
representing m_j.
- lastPhi.txt: J x (D*D) matrix, with row j
representing Phi_j in row-major order.
- lastq.txt: A single row of J values,
representing q_1 ... q_J.
- lastqV.txt: A single row of J values,
representing the draws from the Beta distribution from which the weights
q are derived.
- lastw.txt: A single column of N values w_1 ... w_N,
indicating the association of an observation i with a top-level mixture
component w_i (i.e. if w_i = 2, observation i is assumed to be a
realization from g_2).
- lastmu.txt: (J*T) x D dimensional
matrix, with row T*(j-1)+t representing mu_j,t.
- lastSigma.txt: (J*T) x (D*D)
dimensional matrix, with row T*(j-1)+t representing
Sigma_j,t.
- lastk.txt: A single column of N values k_1
... k_N, indicating the association of an observation i with mixture
component k_i in cluster w_i.
- lastp.txt: A J x T matrix, with row j
containing the T component weights for top-level mixture j
(i.e. p_j,1, ... p_j,T).
Note Unlike the values printed in postp.txt, these values
are not preprocessed, so each p_j,1 + p_j,2 + ... + p_j,T = 1.
- lastpV.txt:A J x T matrix, with row j
containing the T draws from the Beta distribution from which
the weights p_j,1 ... p_j,T are derived.
- lastalpha.txt: A single row of J values,
representing the cluster-specific DP scale parameters alpha_1 ... alpha_j.
- lastalpha0.txt: A single value representing the top level
DP scale parameter alpha_0.
|