Tutorial

 

·        Input files

·        Loading a project

·        Configuring GraphExplore

·        Graphical layouts

·        Using GraphExplore

·        Working with clusters

·        Comparing two graphs

·        Menu description

 

Input Files

 

The user needs to provide up to three input files: a node-names file with information about the objects, a graph file with information about the associations among the objects and a cluster file that identifies groups (clusters) of objects. The node-names and the graph files are required, while the cluster file is optional.

 

Here is an example of a valid node-names file called [description.txt]. The first column represents a unique identifier assigned to each object. This identifier takes values 1, 2, . . . up to the total number of objects - in this case 10. The second column gives a name associated with each object, while the third and fourth columns represent node-namess, notes, URLs, etc associated with these objects. The columns in this file are delimited by one tab character, but in general the columns can also be separated by one space [blank], one comma [,] or one semi-colon [;]. Remark that the fourth column contains information only for object 8. The format of the node-names file is much more general and you are allowed to customize it when a new project is opened (see below). The only requirement is the presence of the unique identifier column. This column does not have to be the first column in the file.

 

Here is an example of a graph file describing associations among these ten objects [data.txt]. The graph file is required to have two, three or five columns that can be separated by a comma or by a combination of tabs and/or blanks. Each row represents an association or link between two objects specified by their unique identifiers (first column in file description.txt). The identifiers are given in the first two columns of the graph file. The other three columns are weights that identify the nature of the association between the two objects. The third column represents the weight/likelihood of an undirected link (a line) between the two objects; the fourth column gives the weight of a forward arrow, while the fifth column gives the weight of a backward arrow. For example, the first line of data.txt says that there is a direct link between objects 1 and 2. The weight that there is a line between 1 and 2 is 0.3 (i.e., 1—2), the weight of a forward arrow between 1 and 2 is 0.1 (i.e., 12) and the weight of a backward arrow between 1 and 2 is 0.1 (i.e., 12). If the graph file has two columns, all the associations between objects are undirected and have the same weight equal to one. If the graph file has three columns, the associations between objects are all undirected, but they could have different weights specified by the third column.

 

GraphExplore displays the type link (i.e., a line, a forward arrow or a backward arrow) having the maximum weight. For example, GraphExplore displays a line between objects 1 and 2, a backward arrow between objects 1 and 3, and a forward arrow between objects 1 and 5. The total weight that there exists a direct association between two objects is defined as the sum of columns three, four and five. In the running example, the weight associated with a direct link between objects 1 and 2 is 0.5 and the weight associated with a direct link between objects 1 and 3 is 0.3. GraphExplore is able to display only the associations with a total weight between a minimum and a maximum threshold.

 

The third input file is optional and defines clusters in the set of objects [clusters.txt]. The cluster file is required to have two columns that can be separated by a comma or by a combination of tabs and/or blanks. Each line specifies the cluster an object belongs to. The unique identifier of the object is in the first column, while the unique identifier of a cluster is in the second column. The unique identifier of a cluster can be a number or a string. In the example, objects 1 and 7 belong to cluster 1; objects 2, 3 and 6 belong to cluster 2; objects 4 and 5 belong to cluster 3, while objects 8, 9 and 10 are in cluster 4. GraphExplore can display clusters of objects as vertices with the same color and/or shape (see below). Three clustering can be simultaneously represented in the graphical displays: a clustering given by the color of the vertices, a clustering given by the shape of the vertices and the clustering specified when the project is loaded.

 

 

Loading a Project

 

You start working in GraphExplore by opening a new project. You go to File > Project menu and load the node-names, the graph and the cluster files.

 

 

You have to specify the format of the node-names file by clicking on the Header button. The following dialog box opens:

 

 

 

Column Separator lets you indicate the delimiter used to separate columns in the node-names file. If you check the Header Line String box, you can give a name to each column in the node-names file. Index Column tells GraphExplore what column gives the unique identifiers of each object. In the running example, this is the first column in the file [description.txt].

 

Besides its unique identifier, each object can have one or more names stored in separate columns. The primary name is given in Name Column, while the second, third or fourth name of each object can be specified in the Alternate Name Column controls. In the running example, the objects have only one name stored in the second column in the file [description.txt]. You can query the graph by using the unique identifier or one of the names you defined for each object

 

GraphExplore is able to dynamically retrieve information about the objects in the graph from the Internet. To do this, you need to check the Primary Link ID column and specify what column has to be used for the search. This can be a new column or one of the columns that give names for the objects.  You also need to enter the URL of the site you want to search in the Template edit box. To define a valid query, this URL needs to contain a <placeholder> field that will be substituted with the object names given in the Primary Link ID column. You can specify a second template to be used in combination with object names from the Secondary Link ID column. In the running example, the names from the second column in the file [description.txt] were used to search information for objects using the template: http://finance.yahoo.com/q/ecn?s=<placeholder>

 

You can provide a particular website that is relevant for each object by checking the HTTP Link Column field. The valid websites in this column are accessed instead of searching the Internet using the names and templates from the Primary/Secondary Link ID Column fields. In the running example, the fourth column in file [description.txt] contains specific websites relevant for some of the objects in the graph. Actually, a website is given only for object 8 whose name is CAGP. When you click on object 8 in a graphical display generated by GraphExplore, this is the website that will be shown. For the other objects, GraphExplore substitutes their names in the placeholder found in the template for the Primary Link ID Column and shows the corresponding URL.

 

GraphExplore saves the format you have specified in the Header dialog box in the node-names file (description.txt in the running example):

 

GraphExplore Header Start

Separator := \t

Index    := 1

Name := 2

PrimaryLinkID := 2

PrimaryLinkTemplate := http://finance.yahoo.com/q/ecn?s=<placeholder>

Http := 4

Note := 3

GraphExplore Header End

1      INTC Intel Corp

2      NT    NorTel

3      UTSI  UTStar Com

4      RHAT Red Hat

5      MSFT Microsoft

6      LU     Lucent

7      AMD  Advanced Micro Devices

8      CAGP CAGP http://www.cagp.duke.edu   

9      SOHU

10     SINA

 

This way you do not need to provide the format of the node-names file every time you load the project.

 

 

Configuring GraphExplore

 

Before starting using GraphExplore, you need to make sure the program is properly installed on your system. You go to Tools > Preference Editor to bring up the following dialog box:

 

 

In order to function properly, GraphExplore uses web browsers (e.g., Internet Explorer, Netscape) and external programs that generate graphical layouts (e.g., neato from AT&T’s GraphViz). In addition, GraphExplore creates temporary files in which it stores information used by the external programs. The paths that point to the web browsers, graphical layout programs and temporary files have to be valid paths. Note that neato is only one of the graphical layout programs that are called. If you do not wish to use neato or if you do not have neato installed in your system, leave the corresponding path field blank.

 

Graphical Layouts

 

GraphExplore creates the image of a (sub)graph by employing the neato utility from AT&T’s GraphViz open source graph drawing software or in-house modified versions of the layout libraries available from Java Universal Graph/Graph Framework (JUNG). GraphExplore saves the graph in a temporary text file using GraphViz’s DOT format, calls an external library to generate an image file in SVG format, then shows that image file in Graph area of the current display panel (see below). The libraries from JUNG are included in GraphExplore’s distribution package and they do not need to be separately installed on your computer. Since they are written in Java, they are guaranteed to function on almost any platform. Unfortunately, the installation of neato is more problematic and the version that comes with GraphExplore might not work. If this is the case, you might want to re-install neato directly from AT&T’s GraphViz, then tell GraphExplore where neato is located by going to Tools > Preference Editor. However, GraphExplore functions properly without neato by employing the JUNG-based layouts. The original JUNG libraries were modified by Quanli Wang (quanli@stat.duke.edu) to fit the purpose of the application. Due to these changes, the original JUNG libraries (e.g., Jakata, Colt) are no longer part of GraphExplore’s distribution package.

 

You can see which layout engine is currently used by GraphExplore to generate graphical displays by selecting Layout from the menu:

 

 

You can select another engine by checking the corresponding box in the menu. If the first option is selected, GraphExplore will use neato while allowing the vertices in the graph to overlap. The second option specifies neato too, but this time no overlap between the vertices in the resulting graph is allowed. The other five options that follow are different JUNG layouts. You should try several engines to find the best format for a particular graphical structure. After selecting a new layout engine, do not forget to go to Graphs > Subgraph to actually create the new graphical display.

 

Using GraphExplore

 

Once you have loaded a project, properly configured GraphExplore and specified a valid layout engine, you can start building graphs of interest by querying the graph. Here are the main features of the GraphExplore’s work area as well as the functions you can access from the program’s toolbar.

 

 

 

To find out relevant information about a particular set of objects, you need to type a query in the Query edit box. The sub-graph associated with the objects consistent with your query is compiled using the current layout engine and shown in the Main display area. The objects can be searched by their numerical unique identifiers or by their names. GraphExplore first tries to match the unique identifiers, then the names from the Name Column, then the names specified in the Alternate Name Columns. You can use a wildcard (*) if you are not sure about the name of an object. For example, to display all the objects whose names contain an “n” or whose name begin with “s” or whose name are exactly “cagp”, type the following query in the Query box:

 

s*

*n*

cagp

 

The query is not case sensitive. If you simply type a “*” in the Query area, you will obtain an image with the entire graph. Once the query is ready, you generate a graph by choosing Graphs > Subgraph from the menu, by typing Ctrl+Enter or by pressing the Subgraph button in the toolbar. You can find out what each button in the toolbar does by positioning the mouse cursor on that button; a tooltip will appear soon. The result of the previous query is presented below:

 

 

If too many objects are relevant for a particular query, you might experience a long waiting time and might want to stop the current process. To do that you can choose Tools > Stop from the menu or press the red button on the toolbar. You can also choose Tools > Run garbage collector to free up all the memory that might still be allocated to the process that has not completed.

 

The objects that are consistent with the current query are called targets. In the figure above there are five target objects. The targets might appear to be disconnected because all the objects around them were filtered out by the query. Such objects are called linkers and they are displayed to connect the targets. For example, you might want to learn how the objects SINA and SOHU are connected to the other three targets. To display the most relevant linkers that connect SINA and SOHU to the rest of the graph, go to the drop-down box to the right of the Print button in the toolbar and change the “0” to “1”. You will display linkers on paths of length two connecting any two targets. The selected linkers are UTSI, LU and RHAT.

 

 

 

If you go to the same drop-down box and change the linker relevance from “1” to “3”, you display all the linkers on paths at most three between any two target nodes.

 

 

You can include in the resulting sub-graph only edges with weight between a minimum and a maximum threshold. In the example below you have chosen to display only edges with weight between 0.2 and 0.4. This essentially creates another graph with fewer connections; this is the graph that is currently being queried.

 

 

 

If you right-click in the Main Display area outside any object (vertex or edge), you obtain the following floating menu:

 

 

The Edge Report option has already been discussed. The Node List option lets you display and save in a text file information about some or all the objects in the Main Display area.

 

 

The Shortest Path option lets you identify multiple paths up to a certain maximum length connecting two objects that are disconnected in the current graph. This option is extremely useful if your graph contains many objects and you need to identify objects that link two target nodes. Please note that the length of a certain path is defined as the number of edges on this path. The weight of each edge as well as the type of an edge (line, forward or backward arrow) is not taken into account when the total cost of a path is determined.

 

 

The Neighbor Graph option lets you query the graph to identify objects that are closely connected with the objects that are currently in the Main Display area. Assume you would like to learn about the objects in the graph around CAGP. You type CAGP in the Query box, select Graphs > Subgraph from the menu, bring up the floating menu by right-clicking in the white background of the Main Display, then select Neighbor Graph from that menu. You will bring up the following dialog box:

 

 

Say you want do display objects that are on paths at most two from CAGP. Then type 2 in the edit box above and click OK. You will obtain the following sub-graph:

 

 

 

Working with clusters

 

A very important and attractive feature of GraphExplore is that it lets you choose specific colors and shapes for each object in the graph. You can access these functions from the menu by going to Options > Node > Color or Options > Node > Shape. Moreover, GraphExplore lets you assign the same shapes and/or colors to groups/clusters of objects. These groups can be different than the clusters you loaded with a new project. Therefore objects with the same shape can identify one clustering, objects with the same color can identify another clustering and both of these clusterings can be different than the clusters loaded with the project. This degree of flexibility is necessary to create meaningful displays of objects having different functions.

 

You can begin by typing “*” in the Query box and select Graphs > Subgraph to create a display of the entire graph. Remark that all the objects are targets and have the same default color (light blue).

 

 

Bring up the colors dialog box by going to Options > Node > Color. You can either edit the default colors assigned to targets and linkers, or specify a custom color format.

 

 

If you click on the drop-down list under the “Customized Node Coloring” field, you can choose from the following options:

 

 

A color file has a two-columns format: the first column represents the unique identifier of each object, while the second column designates the corresponding colors.

Your new color choices will become visible once you re-construct the graph. If you bring up the shapes dialog box (Options > Node > Shape), you can change the default shapes associated with the target and linker objects or specify custom shapes that identify a clustering of your choice or the default clustering you loaded with the project. This way you can effectively work and graphically represent three different clusterings in the same time.

 

Comparing Two Graphs

 

GraphExplore can help you learn the differences and/or common elements between two graphs you generated from two graphs of objects.  Assume you have used GraphExplore to generate the two (sub-)graphs and have saved them in SVG format.  From the menu, select Graphs > Compare Two Graphs to bring up the following dialog box:

 

 

Input the names of the corresponding SVG files, specify the colors you want to see for the elements of the two graphs as well as the type of operation you want to perform (intersection or union), click OK and the resulting union/intersection graph will be generated in a graph panel.

 

Menu Description

 

At this point you should have a good idea about the main features offered by GraphExplore. The node-names of all the menu options that follows should give you an idea about the remaining features that might not have been mentioned so far in this Help file.