This file describes the use of SimTree ---------- 1. Compile A Makefile is created. Simply run make to compile this program. Run make clean to delete the generated files. The executable will be stored in Release/bin/. The dafault name is SimTree. Make sure this directory exists before running make. Otherwise, run: mkdir Release mkdir Release/bin ------------------ 2. Running SimTree --- 2.1 Script for running SimTree Simply run the script file Run.sh in the same directory of the codes. This script file runs the following commands: cd Release/bin ./SimTree 1234 ../../wisconsin.txt ../wis_Results The first argument is the seed number for the random number generator. The second argument specifies the data file. The third argument specifies the prefix for the output files. --- 2.2 Specify the tree type. In the current version, some changes to start.cpp are needed to specify the tree type. Please follow these steps: (i) Comment out lines from L92 to L115; (ii) Classification tree: uncomment lines from L92 to L94 Regression tree: uncomment lines from L113 to L115 Survival tree: uncomment lines from L109 to L111 ------------------ 3. Input data file The first few lines of an input file, e.g. wisconsin.txt, look like: 1 683 9 0.5 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 0 0.5 0.4 0.4 0.5 0.7 1 0.3 0.2 0.1 0 ... The first line (1) specifies the type of tree. 1 means classification tree. For detailed specification, see Observation.h. The second line specifies the number of observations. The third line specifies the number of predictors. Starting from the fourth line is the data, in format: x_1 x_2 x_3 ... x_p y --------------- 4. Output files --- 4.1 Summary On running the executable, several output files will be created in Release. These files have the prefix specified in Run.sh. In this example, the prefix is wis_Results. The following files will be there: wis_Results0_ColTable.txt wis_Results0_Count.txt wis_Results0_Error.txt wis_Results0_leaveone.txt wis_Results0_logLikelihood.txt wis_Results0_potential.txt wis_Results0_tree.txt --- 4.2 Output Details a) wis_Results0_ColTable.txt A p*p matrix, where p is the number of predictors. The ith row and jth column gives the number of occurrences of the ith predictor and jth predictor in the tree samples, i.e., the pairwise inclusion frequencies in the MCMC b) wis_Results0_Count.txt This file contains three columns. The first column is the index number of tree samples. The second is the index number of predictors. The third is the count, standing for the number of occurrences of the predictor in this tree sample. c) wis_Results0_Error.txt This file contains information about the misclassification rate. Each row corresponds to a tree sample. The first column is the size of this tree and the second one is the number of misclassification. d) wis_Result0_leaveone.txt This file contains information about the results from leave-one-out cross validation. The first column is the index of tree samples. The second one is the index of the held-out observation. The third column is the predicted mean. e) wis_Result0_logLikelihood.txt This file contains the log integrated likelihood for each tree sample. The first column is the size of the tree and the second is log integrated likelihood. f) wis_Result0_potential.txt This file contains the log posterior probability for each tree sample. The first column is the size of the tree and the second is log posterior probability. g) wis_Result0_tree.txt: The tree samples described by directed graph. For each defined directed graph in this file, one can use dot from graphviz to output to graphical files, such as eps and jpg. For example, dot -Tps -o tree.eps tree.txt ------- 5. Misc Detailed control for the number of iterations and type of tree are in start.cpp. ---------- 6. Example We illustrate the usage of SimTree by an example data wisconsin.txt. (i) Edit start.cpp. Comment lines from L92 to L115. Uncomment lines from L92 to L115; (ii) In the directory where the source codes are, run make (iii) In the same directory, run ./Run.sh (iv) Iteration number will be shown on screen. When this is done, the resulting files will be in Release/ (v) Referring to the description of the results files, extract the information of interest. For example, one can obtain the size of tree samples, log integrated likelihood and then draw the histogram of these statistics, etc