mbsc - R-functions for model-based subspace clustering
mbsc is a set of R functions
that fit a multivariate Dirichlet-process mixture model
to identify clusterings based on subsets of attributes,
as described in Hoff(2004).
- Usage: mbsc(Y,...)
- Data arguments: Y is a matrix. Each row represents the
attribute measurements of a single case.
Missing (at random) data is allowed.
- Optional arguments (with defaults):
- groups=1:dim(Y)[1] : the starting value of the cluster membership
function.
- ps2=cbind(rep(1/2,dim(Y)[2]),apply(Y,2,var)/2) : parameters
in the inverse-gamma prior for the error variance.
- pmu=cbind(apply(Y,2,mean),apply(Y,2,var)) : parameters
in the normal prior for the baseline means.
- pp0=matrix( c(1,1,1,1,1,1),byrow=T,ncol=2,nrow=3) :
parameters of the priors for p(s=1), 1/t2s, alpha/(alpha+1)
- nscan=10000 : number of scans of the Markov chain.
- verb=T : whether to printout the output as the chain progresses.
- odens=round(nscan/1000) : How often to save output.
- seed=1 : random seed.
- nsm=5 : number of split-merge proposals to make per scan
(this can be zero). Split-merge proposals are made using an
approach similar to, but faster than, that of Jain and Neal (2004) and
Dahl (2003). If your data provide a roughly unimodal clustering
this can be set to zero. If the chain is having trouble mixing
between modes this should be nonzero.
- Output: A list with the following objects:
- OUT : values of scan number (ns), log-posterior (lpp),
maximum log posterior so far (lpp.max),
t2s, theta, mean(s2), alpha, and K, saved every
odens scans.
- GROUPS : posterior samples of the group membership function,
saved every odens scans.
- MAPE : (maximum a-posteriori estimate) A list giving the
model parameters at the scan which had the maximum value of lpp.
Installation:
- Download the text files
mbsc.r and mbsc.c
- Execute the shell command
R CMD SHLIB mbsc.c
This should create the file mbsc.so.
- Start an R-session with mbsc.r and mbsc.so
in the directory and type
source("mbsc.r")
Running the MCMC may take a long time, so you might want to
do it in batch mode.
Feedback: Let me know if you use this package, have suggestions,
or encounter bugs. The more feedback I get, the more I will feel
compelled to improve the software.
email: hoff@stat.washington.edu