1.
PreliminariesBefore running BFRM
2.0 the user must create a data file in the predefined
format. This is a text-based parameter file containing
all the information about the model, data, prior
specification, etc (described in detail later). With
this parameter file in place, the user can run BFRM 2.0
by typing at the command line:
- Bfrm
parameter.txt
2. How to generate a
default parameter file?Here is a
default default.parameters.txt
file, to be used as a template. The examples on this
site give specific examples for three selected data
analyses. 3.
What is the parameter file format?
- The parameter file is a text file that can be
generated by any text editor with each line
representing a parameter name/value pair. Only
predefined names are accepted and the program will
report an error and quit if any unknown
parameter/name is found. Empty lines are accepted and
ignored. Any line staring with “#” will
be treated as a comment and therefore ignored.
- At each line the name/value pair takes the
format
- Parameter
name = value
The value can either
be a number (integer/double) or a character
string.
- Parameter names are
not case sensitive and white spaces are allowed
within the name if that is more convenient to the
user. As an example, the following parameter names
are valid and represent the same
information:
-
ResponseMaskFile
- response
mask file
-
responsemaskfile
- The parameters are
not ordered and any order convenient to the user will
be accepted. If a parameter name appears more than
once, the last appearance will be used.
- There are default
values for all parameters used by BFRM 2.0. If a
parameter name is not specified in the parameter
file, the default value will automatically be
used.
4. What are the
parameters?The parameters defined
in the parameter file are used to indicate the data
settings, the model settings, the prior information and
print controls. The default parameter file is organized
into “sections” as follows:
- Nobservations:
Integer. The total number of samples (observations)
in the dataset.
- NVariables:
Integer. The total number of variables in the
dataset, including X variables (genes) and Y
variables (response variables).
- NbinaryResponses: Integer. The
total number of binary response variables in the
model.
- NcategoricalResponses:
Integer. The total number of categorical response
variables in the model.
- NsurvivalResponses: Integer.
The total number of survival response variables in
the model.
- NcontinuousResponses: Integer.
The total number of continuous response variables in
the model.
- NDesignVariables: Integer. The
number of design covariates (not including any
assay-artifact control variables – see next
item), including the intercept. If the user wants to
fit a model without intercept, this value should be
set to 0. The default value for this one is 1, just
the intercept.
- NcontrolVariables: Integer.
The total number of “assay-artifact”
control covariates to be used in analysis of
Affymetrix expression data, based on the housekeeping
genes on the array. The default value is 0.
- NlatentFactors: Integer. This
parameter has two possible interpretations:
- if choosing to fit a static factor model to a
specified set of variables, this is the number of
latent factor in the model;
- if choosing the evolutionary variable selection
and factor model search method, this represents the
starting number of latent factors in the model. The
default value for this parameter is 0.
- DataFile:
String. The name of the file that contains the
“data” Y and X (in this order). This must
be a flat text file with:
- each line representing a variable and each column
representing an observation,
- tabs separating observations within a line,
- fields that are numeric and with no string values
of any kind are allowed,
- missing values in the dataset indicated by a
specific numeric value (such as 0 or -999 – a
second input file discussed later is used to indicate
which the missing observations are)
- HFile: String.
The name of the file that contains the ( intercept,
design, covariate ) data for the analysis. This file
is a flat text file with each line representing an
observation and each column representing a variable.
The columns in H file must be in the order of
intercept, design and control variables. If the
NdesignVariables
is set to 1, which means no other design/control
variable other than an intercept, Hfile can be omitted.
- ResponseMaskFile: String. The
name of the text file (mask) indicating the missing
and/or censored observations in the responses Y. It
is only necessary if at least one response variable
has missing or censored (in the survival case)
observations. This file is a flat text file with each
row representing a response variable and each column
representing the status of each observation. Each
observation should take value 0 for observed, 1 for
missing (non-observed) and 2 for censored.
- XmaskFile:
String. The name of the text file (mask) indicating
the missing observations in the X variables. It is
only necessary if there are missing observations in
X. This file is a flat text file with each row
representing a variable and each column representing
the status of each observation. Each observation
should take 0 for observed and 1 for missing
values.
5. Prior
section
- ShapeOfB:
Integer. This parameter defines the constraints
placed on the factor loadings matrix B. It takes
either 0 (no constraint) or 2 (upper triangular of B
set to zero) as its value. For identification
purposes 2 is the default value for this
parameter.
- NonGaussianFactors: Integer.
This parameter indicates whether a Gaussian model (0)
or Dirichlet Process (1) model is used to model the
latent factors. The default value is 1, which means a
DP model will be used.
- PriorPsia,
PriorPsib: double. Hyper-parameter values for
the inverse-Gamma(a,b) prior for elements of Psi, the
vector of residual variables for all X variables. The
default values are (2,0.005) for Affymetrix data
under the standard analysis of RMA (log base 2)
expression indices.
- PriorSurvivalPpsia,PriorSurvivalPpsiab:
double. Hyper-parameter values for the
inverse-gamma(a,b) prior for residual variances of an
included survival response variable; right censored
survival data are modelled as log-normal, linear
regressions. The default values are (2,0.5).
- PriorRhoN,
PriorRhoMean: double. Hyper-parameter values
for the Beta(PriorRhoMean* PriorRhoN,
(1-PriorRhoMean)*PriorRhoN) prior for the sparsity
base rate parameters -- the elements of the vector
Rho. The default values are (0.001, 200).
- PriorPiMean,
PriorPiN: double. Hyper-parameter values for
the Beta(PriorPiMean* PriorPiN,
(1-PriorPiMean)*PriorPiN) prior for the hierachical
components of the prior on non-zero inclusion
probabilities. The default values are (0.9,
10.0).
- PriorTauDesigna,PriorTauDesignb:
double. Hyper-parameter values for the
inverse-Gamma(a,b) prior for the variances
Tau of the design/control
factor effects. The default values are (5,1).
- PriorTauResponseBinarya,PriorTauResponseBinaryb:
double. Hyper-parameter values for the
inverse-Gamma(a,b) prior for the variances Tau of the
binary response factors. The default values are
(5,1). This is only necessary if binary responses are
present in the model.
- PriorTauResponseCategoricala,PriorTauResponseCategoricalb:
double. Hyper-parameter values for the
inverse-Gamma(a,b) prior for the variances Tau of the
categorical response factors. The default values are
(5,1). This is only necessary if categorical
responses are present in the model.
- PriorTauResponseSurvivala,PriorTauResponseSurvivalb:
double. Hyper-parameter values for the
inverse-Gamma(a,b) prior for the variances Tau in the
of the survival response factors. The default values
are (5,1). This is only necessary if survival
responses are present in the model.
- PriorTauResponseContinuousa,PriorTauResponseContinuousb:
double. Hyper-parameter values for the
inverse-Gamma(a,b) prior for the variances Tau of the
continuous response factors. The default values are
(5,1). This is only necessary if continuous responses
are present in the model.
- PriorTauLatenta,PriorTauLatentb:
double. Hyper-parameter values for the
inverse-Gamma(a,b) prior of the variances Tau for the
latent factors. The default values are (5,1).
- PriorInterceptMean,
PriorInterceptVar: Prior mean and variance for
the intercept (baseline level) of X variables. The
default values are (8,100) based on the prototype of
Affymetrix gene expression X variables.
- PriorContinuousMean,
PriorContinuousVar: Prior mean and variance
for the intercept (baseline) of any continuous
response variables. The default values are (0,1)
consistent with standardised response data.
- PriorSurvivalMean,
PriorSurvivalVar: Prior mean and variance for
the intercept (baseline) of any survival response
variables. The default values are (2,10) consistent
with standardised response data.
6.
Evolutionary variable and factor model search
section
- Evol: Integer.
This parameter takes either 0 or 1. Setting evol to 1
activates the evolving mode in BFRM. The default
value is 0.
- EvolVarIn:
Integer. This parameter is only necessary if Evol is
set to 1. It indicates the number of variables
(elements of X) used to initialize the evolutionary
analysis.
- EvolVarInFile:
String. The indices of the variables (of X) that are
included in this initializing set (the first X
variable is indexed by 1, and so on). If this file is
missing, then the indices default to 1, i.e. only the
first X variable is assumed to be in the initial
model.
- EvolIncludeVariableThreshold:
Double. This parameter sets the threshold for
bringing a new variable into the model. In
considering whether to add in new variables (genes)
at a given evolutionary analysis step, variables are
ranked according to their approximate posterior
probability of inclusion at that stage. One of the
two elements of the decision to include some of the
most highly ranked variables is then a threshold on
this posterior inclusion probability –
variables with probabilities below that threshold
will not be included. The default value is 0.75.
- EvolMaxiumVariablesPerIteration:
Integer. This parameter sets the maximum number of
variables that can be added to the model at each
iteration. The default value is 5. If the most highly
ranked A variables
currently exceed EvolIncludeVariableThreshold,
then the most highly ranked min{ A,
EvolMaxiumVariablesPerIteration } are added.
This may be zero, which is one way the evolutionary
analysis may terminate.
- EvolIncludeFactorThreshold:
Double. This parameter sets the threshold for adding
a new latent factor into the model. A new latent
factor will be added if and only if at least this
number of variables (genes) for that factor have
posterior probability of association with the factor
that exceed this probability threshold. The default
value is 0.75.
- EvolMinumVariablesInFactor:
Integer. This parameter sets the minimum number of
variables (genes) showing significant association
with a factor in order that the factor be included in
the model. The default value is 5.
- EvolMaximumVariablesPerFactor:
Integer. This parameter sets the maximum number of
variables that can be weighted on any one factor in
the evolutionary inclusion steps. This allows the
user to limit the number of variables brought into
the model for each factor and hence to explore more
effectively other factor dimensions. The default
value is 15.
- EvolMaximumFactors: Integer.
This parameter sets the maximum number of latent
factors that the final model can have. The default
value is 5.
- EvolMaximumVariables: Integer.
This parameter sets the maximum number of variables
the final model can have. The default value is
100.
7. MCMC
section
- BurnIn:
Integer. The number of burn-in iterations in the
MCMC. Default 2000.
- NMCSamples:Integer. The number
of MCMC iterations. Default 5000.
8.
Monitoring section
- PrintIteration: Integer. A
number defining how often a MCMC iteration is printed
to the screen. Default 100.
9.
Dirichlet Process parameters
- PriorAlphaa,
PriorAlphab: doubles. Prior parameters for the
Gamma prior for Alpha. Default (1,1).
|