Running PROPA requires the following Input files:
- parameters.txt - parameter file containing
all the information about the data, prior specification, etc. Before
running PROPA a user must create this file in a predefined format as
described in detail below.
- geneid.txt - full gene list file
containing Gene IDs of unique genes from a gene expression data set.
There should be no duplicate Gene IDs in this list.
- p.txt - data file containing
association probabilities. This file is a numerical matrix with rows and
columns represent genes and experimental factors/phenotypes,
respectively. The value at row i and column j is the probability of
association between gene i and factor j. The genes (rows) should be in
the order determined in geneid.txt, and the number of rows must be same as that
of geneid.txt. Association probabilities can be estimated through a
regression analysis with BFRM.
- geneset.txt - reference gene sets file.
Each line is one gene set composed of the Gene IDs of member genes.
With the input files in place, a user can run PROPA
- from a command window:
- on Windows: propa.exe parameters.txt
- on Unix: ./propa parameters.txt
- in Matlab: !propa.exe parameters.txt
Generating a parameter file:
How to generate a default parameter file?
Here is a default default.parameters.txt file, to be used as a template. The examples on this site give specific examples for three selected data analyses.
What is the parameter file format?
- The parameter file is a text file that can be generated by any text editor with each line representing a parameter name/value pair. Only predefined names are accepted and the program will report an error and quit if any unknown parameter/name is found. Empty lines are accepted and ignored. Any line staring with # will be treated as a comment and therefore ignored.
- At each line the name/value pair takes the format
Parameter name = value
The value can either be a number (integer/double) or a character string.
- Parameter names are not case sensitive and white spaces are allowed within the name if that is more convenient to the user. As an example, the following parameter names are valid and represent the same information:
Gene set file
- The parameters are not ordered and any order convenient to the user will be accepted. If a parameter name appears more than once, the last appearance will be used.
- There are default values for all parameters used by PROPA. If a parameter name is not specified in the parameter file, the default value will automatically be used.
What are the parameters?
The parameters defined in the parameter file are used to indicate the data settings, the prior information. The default parameter file is organized into "sections" as follows:
- Data section
PFile: String. The name of the file that contains the association probabilities. This data file must be:
- a flat text file with each line representing a unique gene (with a unique Gene ID) and each column representing a factor to be annotated,
- tabs separating factors within a line,
- fields that are numeric and with no string values of any kinds,
- no missing values.
GeneIdFile: String. The name of the file that contains Gene IDs of the genes in the data set specified by PFile.
NFactors: Integer. The number of experimental/phenotype factors to be analyzed. The value should match the number of columns in the data file.
GeneSetFile: String. The name of the file that contains the reference gene sets. This file must be:
- a flat text file with each line representing a gene set specified by the index of genes as listed in PFile and NVariables (integers),
- tabs separating factors within a line,
- fields that are integers and with no string values of any kinds.
NGeneSets: Integer. The number of pathway gene sets to be used in the analysis. The value should match the number of lines in the gene set file specified by GeneSetFile.
- Prior section
rA, PhiA: Double. Hyper-parameter values of the beta(rA, PhiA) prior of pathway membership probability for genes in a reference gene set. The default values are (0.7, 8).
rB, PhiB: Double. Hyper-parameter values of the beta(rB, PhiB) prior of pathway membership probability for genes not in a reference gene set. The default values are (0.01, 3).
- MCMC section
NBurnin: Integer. The number of burn-in iterations in the MCMC. Default 200.
NIter: Integer. The number of MCMC iterations. Default 5000.
by: Haige Shen, Quanli Wang & Mike West