Model based subspace clustering

mbsc - R-functions for model-based subspace clustering

mbsc is a set of R functions that fit a multivariate Dirichlet-process mixture model to identify clusterings based on subsets of attributes, as described in Hoff(2004).

Usage: mbsc(Y,...)
Data arguments: Y is a matrix. Each row represents the attribute measurements of a single case. Missing (at random) data is allowed.
Optional arguments (with defaults):
- groups=1:dim(Y)[1] : the starting value of the cluster membership function.
- ps2=cbind(rep(1/2,dim(Y)[2]),apply(Y,2,var)/2) : parameters in the inverse-gamma prior for the error variance.
- pmu=cbind(apply(Y,2,mean),apply(Y,2,var)) : parameters in the normal prior for the baseline means.
- pp0=matrix( c(1,1,1,1,1,1),byrow=T,ncol=2,nrow=3) : parameters of the priors for p(s=1), 1/t2s, alpha/(alpha+1)
- nscan=10000 : number of scans of the Markov chain.
- verb=T : whether to printout the output as the chain progresses.
- odens=round(nscan/1000) : How often to save output.
- seed=1 : random seed.
- nsm=5 : number of split-merge proposals to make per scan (this can be zero). Split-merge proposals are made using an approach similar to, but faster than, that of Jain and Neal (2004) and Dahl (2003). If your data provide a roughly unimodal clustering this can be set to zero. If the chain is having trouble mixing between modes this should be nonzero.
Output: A list with the following objects:
- OUT : values of scan number (ns), log-posterior (lpp), maximum log posterior so far (lpp.max), t2s, theta, mean(s2), alpha, and K, saved every odens scans.
- GROUPS : posterior samples of the group membership function, saved every odens scans.
- MAPE : (maximum a-posteriori estimate) A list giving the model parameters at the scan which had the maximum value of lpp.

Installation:

Download the text files mbsc.r and mbsc.c
Execute the shell command
R CMD SHLIB mbsc.c
This should create the file mbsc.so.
Start an R-session with mbsc.r and mbsc.so in the directory and type
source("mbsc.r")

Running the MCMC may take a long time, so you might want to do it in batch mode.

Feedback: Let me know if you use this package, have suggestions, or encounter bugs. The more feedback I get, the more I will feel compelled to improve the software.

email: hoff@stat.washington.edu