MetageneCreator

Adrian Dobra, Quanli Wang and Mike West

adobra@stat.duke.edu

 

NOTE: IF YOU HAVE DOWNLOADED THE SOFTWARE BEFORE August 15, 2004, PLEASE DOWNLOAD IT AGAIN AND READ THE INSTRUCTIONS BELOW.

 

Overview

 

MetageneCreator identifies overlapping clusters of genes and generates the meta-genes associated with these clusters in arbitrarily large datasets.

 

Download

 

HERE you can download MetageneCreator. To run this package, you need MATLAB and the Statistics Toolbox.

 

Input Parameters

 

After you unzip the package, you need to edit the file “parameters.m”. You have to specify the following information:

 

·        covfilesThis is the directory where the covariance models generated with HdBCS are located. MetageneCreator can run without the covariance models but the resulting clusters will not be overlapping. In addition, these clusters are likely to be smaller in size. HERE you can download an example.

·        workdirectoryThis is the directory where your dataset is located and the directory in which the output files are saved.

·        datafileThe name of your gene expression dataset. Rows correspond to samples and columns correspond to variables. It is assumed that the expression levels are on a log2 scale.

·        annotationfileThis file gives a short description of each probe from the datafile. The description of the variable from column k in datafile is found on row k in annotationfile. The ID of this variable (probe) is k.

·        resultsfileThis is the file where your clusters are saved. The first column represents the ID of each probe. The second column gives the identifier of the group each probe belongs to. The third column is gives the description of each probe if you have loaded one (the parameter annotationfile).

·        metavarsfileThe name of the file in which the meta-genes are saved. As before, rows correspond to samples and columns correspond to meta-genes. The meta-genes are quantile-normalized and scaled (sample mean equal to zero and sample variance equal to one).

·        metavarslabelfileThe label attached with each meta-gene. If a group has at least two variables, its name starts with “M”. Otherwise its name starts with “V”.

·        maxgroupsize The maximum number of variables to be processed at each iteration. The idea is to make this number as large as possible depending on the capabilities of your computer.

·         maxclustersizeThe maximum size allowed for a cluster.

·         minpvexplainedA cluster is not saved unless the first principal component (=the meta-gene) explains at least minpvexplainedpercent of the variation within the group. A higher value of this parameter increases the number of clusters produced.

 

Running MetageneCreator

 

You can either type “metagenecreator” in a MATLAB session after you have correctly set the current directory or run the program in batch using the following command:

 

matlab < metagenecreator.m > mylogfile.log &

 

References

       

·        Dobra, A., Wang, Q. and West, M. (2004). Graphical model-based gene clustering and metagene expression analysis. Manuscript submitted to Bioinformatics.