Adrian Dobra, Quanli Wang and Mike West
NOTE: IF YOU HAVE DOWNLOADED THE SOFTWARE BEFORE August 15, 2004, PLEASE DOWNLOAD IT AGAIN AND READ THE INSTRUCTIONS BELOW.
MetageneCreator identifies overlapping clusters of genes and generates the meta-genes associated with these clusters in arbitrarily large datasets.
After you unzip the package, you need to edit the file “parameters.m”. You have to specify the following information:
· “covfiles” This is the directory where the covariance models generated with HdBCS are located. MetageneCreator can run without the covariance models but the resulting clusters will not be overlapping. In addition, these clusters are likely to be smaller in size. HERE you can download an example.
· “workdirectory” This is the directory where your dataset is located and the directory in which the output files are saved.
· “datafile” The name of your gene expression dataset. Rows correspond to samples and columns correspond to variables. It is assumed that the expression levels are on a log2 scale.
· “annotationfile” This file gives a short description of each probe from the “datafile”. The description of the variable from column k in “datafile” is found on row k in “annotationfile”. The ID of this variable (probe) is k.
· “resultsfile” This is the file where your clusters are saved. The first column represents the ID of each probe. The second column gives the identifier of the group each probe belongs to. The third column is gives the description of each probe if you have loaded one (the parameter “annotationfile”).
· “metavarsfile” The name of the file in which the meta-genes are saved. As before, rows correspond to samples and columns correspond to meta-genes. The meta-genes are quantile-normalized and scaled (sample mean equal to zero and sample variance equal to one).
· “metavarslabelfile” The label attached with each meta-gene. If a group has at least two variables, its name starts with “M”. Otherwise its name starts with “V”.
· “maxgroupsize” The maximum number of variables to be processed at each iteration. The idea is to make this number as large as possible depending on the capabilities of your computer.
· “maxclustersize” The maximum size allowed for a cluster.
· “minpvexplained” A cluster is not saved unless the first principal component (=the meta-gene) explains at least “minpvexplained” percent of the variation within the group. A higher value of this parameter increases the number of clusters produced.
You can either type “metagenecreator” in a MATLAB session after you have correctly set the current directory or run the program in batch using the following command:
matlab < metagenecreator.m > mylogfile.log &
· Dobra, A., Wang, Q. and West, M. (2004). Graphical model-based gene clustering and metagene expression analysis. Manuscript submitted to Bioinformatics.