This file describes the use of SimTree

----------
1. Compile

A Makefile is created. Simply run 

  make 

to compile this program. Run

   make clean 

to delete the generated files. The executable will be
stored in Release/bin/. The dafault name is SimTree. Make sure this
directory exists before running make. Otherwise, run:
mkdir Release
mkdir Release/bin

------------------
2. Running SimTree

---
2.1 Script for running SimTree

Simply run the script file Run.sh in the same directory of the
codes. This script file runs the following commands:

       cd Release/bin
       ./SimTree 1234 ../../wisconsin.txt ../wis_Results

The first argument is the seed number for the random number generator. 
The second argument specifies the data file. The third argument specifies
the prefix for the output files. 

---
2.2 Specify the tree type. In the current version, some changes to
start.cpp are needed to specify the tree type. Please follow these
steps: 

(i)	     Comment out lines from L92 to L115; 
(ii)	     Classification tree: uncomment lines from L92 to L94 
	     Regression tree: uncomment lines from L113 to L115 
	     Survival tree: uncomment lines from L109 to L111

------------------
3. Input data file

The first few lines of an input file, e.g. wisconsin.txt, look like:

    1
    683
    9
    0.5 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 0
    0.5 0.4 0.4 0.5 0.7 1 0.3 0.2 0.1 0
    ...

The first line (1) specifies the type of tree. 1 means classification
tree. For detailed specification, see Observation.h.
The second line specifies the number of observations.
The third line specifies the number of predictors.
Starting from the fourth line is the data, in format:

	 x_1 x_2 x_3 ... x_p y

---------------
4. Output files

---
4.1 Summary

On running the executable, several output files will be created in 
Release. These files have the prefix specified in Run.sh. In this
example, the prefix is wis_Results. The following files will be there:

	 wis_Results0_ColTable.txt
	 wis_Results0_Count.txt
	 wis_Results0_Error.txt
	 wis_Results0_leaveone.txt
	 wis_Results0_logLikelihood.txt
	 wis_Results0_potential.txt
	 wis_Results0_tree.txt

---
4.2 Output Details

a)    wis_Results0_ColTable.txt

A p*p matrix, where p is the number of predictors. The ith row and jth
column gives the number of occurrences of the ith predictor and jth predictor
in the tree samples, i.e., the pairwise inclusion frequencies in the MCMC

b)   wis_Results0_Count.txt

This file contains three columns. The first column is the index number
of tree samples. The second is the index number of predictors. The
third is the count, standing for the number of occurrences of the
predictor in this tree sample. 

c)  wis_Results0_Error.txt

This file contains information about the misclassification rate. Each
row corresponds to a tree sample. The first column is the size of this
tree and the second one is the number of misclassification. 

d)   wis_Result0_leaveone.txt

This file contains information about the results from leave-one-out
cross validation. The first column is the index of tree samples. The
second one is the index of the held-out observation. The third column
is the predicted mean.

e)   wis_Result0_logLikelihood.txt

This file contains the log integrated likelihood for each tree
sample. The first column is the size of the tree and the second is log
integrated likelihood. 

f)   wis_Result0_potential.txt

This file contains the log posterior probability for each tree
sample. The first column is the size of the tree and the second is log
posterior probability. 

g)   wis_Result0_tree.txt:

The tree samples described by directed graph. For each defined
directed graph in this file, one can use dot from graphviz to output
to graphical files, such as eps and jpg. For example,

   dot -Tps -o tree.eps tree.txt


-------
5. Misc

Detailed control for the number of iterations and type of tree are in
start.cpp. 

----------
6. Example

We illustrate the usage of SimTree by an example data wisconsin.txt.

(i) Edit start.cpp. Comment lines from L92 to L115. Uncomment lines
    from L92 to L115;

(ii) In the directory where the source codes are, run
    
      make

(iii) In the same directory, run

     ./Run.sh

(iv) Iteration number will be shown on screen. When this is done, the
resulting files will be in Release/

(v) Referring to the description of the results files, extract the
information of interest. For example, one can obtain the size of tree
samples, log integrated likelihood and then draw the histogram of
these statistics, etc