The
user needs to provide up to three input files: a node-names file
with information about the objects, a graph file with
information about the associations among the objects and a cluster file
that identifies groups (clusters) of objects. The node-names and the graph
files are required, while the cluster file is optional.
Here
is an example of a valid node-names file called [description.txt]. The first
column represents a unique identifier assigned to each object. This identifier
takes values 1, 2, . . . up to the total number of objects - in this case 10. The
second column gives a name associated with each object, while the third and
fourth columns represent node-namess, notes, URLs, etc associated with these
objects. The columns in this file are delimited by one tab character,
but in general the columns can also be separated by one space [blank],
one comma [,] or one semi-colon [;]. Remark that the fourth
column contains information only for object 8. The format of the node-names
file is much more general and you are allowed to customize it when a new
project is opened (see below). The only requirement is the presence of the
unique identifier column. This column does not have to be the first column in
the file.
Here
is an example of a graph file describing associations among these ten objects
[data.txt]. The graph file is
required to have two, three or five columns that can be separated by a comma or
by a combination of tabs and/or blanks. Each row represents an association or
link between two objects specified by their unique identifiers (first column in
file description.txt). The
identifiers are given in the first two columns of the graph file. The other
three columns are weights that identify the nature of the association between
the two objects. The third column represents the weight/likelihood of an
undirected link (a line) between the two objects; the fourth column gives the
weight of a forward arrow, while the fifth column gives the weight of a
backward arrow. For example, the first line of data.txt says that there is a
direct link between objects 1 and 2. The weight that there is a line
between 1 and 2 is 0.3 (i.e., 1—2),
the weight of a forward arrow between 1
and 2 is 0.1 (i.e., 1→2) and
the weight of a backward arrow between 1
and 2 is 0.1 (i.e., 1←2). If
the graph file has two columns, all the associations between objects are
undirected and have the same weight equal to one. If the graph file has three
columns, the associations between objects are all undirected, but they could
have different weights specified by the third column.
GraphExplore displays the type link
(i.e., a line, a forward arrow or a backward arrow) having the maximum weight.
For example, GraphExplore displays a line between objects 1 and 2, a backward arrow between
objects 1 and 3, and a forward arrow between
objects 1 and 5. The total weight that there
exists a direct association between two objects is defined as the sum of
columns three, four and five. In the running example, the weight associated
with a direct link between objects 1 and 2 is 0.5 and the
weight associated with a direct link between objects 1 and 3 is 0.3. GraphExplore is able to display
only the associations with a total weight between a minimum and a maximum
threshold.
The
third input file is optional and defines clusters in the set of objects [clusters.txt]. The cluster file
is required to have two columns that can be separated by a comma or by a
combination of tabs and/or blanks. Each line specifies the cluster an object
belongs to. The unique identifier of the object is in the first column, while
the unique identifier of a cluster is in the second column. The unique
identifier of a cluster can be a number or a string. In the example, objects 1
and 7 belong to cluster 1; objects 2, 3 and 6
belong to cluster 2; objects 4 and 5 belong to cluster 3,
while objects 8, 9 and 10 are in cluster 4.
GraphExplore can display clusters of objects as vertices with the same color
and/or shape (see below). Three clustering can be simultaneously represented in
the graphical displays: a clustering given by the color of the vertices, a
clustering given by the shape of the vertices and the clustering specified when
the project is loaded.
You start working in GraphExplore
by opening a new project. You go to File > Project menu and load the node-names, the graph and the cluster files.
You have to specify the
format of the node-names file by clicking on the Header button. The
following dialog box opens:
Column
Separator lets you indicate the
delimiter used to separate columns in the node-names file. If you check the Header
Line String box, you can give a name to each column in the node-names
file. Index Column tells GraphExplore what column gives the unique
identifiers of each object. In the running example, this is the first column in
the file [description.txt].
Besides
its unique identifier, each object can have one or more names
stored in separate columns. The primary name is given in Name Column,
while the second, third or fourth name of each object can be specified in the Alternate
Name Column controls. In the running example, the objects have only one
name stored in the second column in the file [description.txt].
You can query the graph by using the unique identifier or one of the names
you defined for each object
GraphExplore
is able to dynamically retrieve information about the objects in the graph
from the Internet. To do this, you need to check the Primary
Link ID column and specify what column has to be used for the search. This
can be a new column or one of the columns that give names for the objects. You also need to enter the URL of the site
you want to search in the Template edit box. To define a valid query,
this URL needs to contain a <placeholder> field that will be substituted with the object
names given in the Primary Link ID column. You can specify a second
template to be used in combination with object names from the Secondary Link
ID column. In the running example, the names from the second column in the
file [description.txt] were used to search information for
objects using the template: http://finance.yahoo.com/q/ecn?s=<placeholder>
You
can provide a particular website that is relevant for each object by checking
the HTTP Link Column field. The valid websites in this column are
accessed instead of searching the Internet using the names and templates from
the Primary/Secondary Link ID Column fields. In the running example, the
fourth column in file [description.txt] contains specific
websites relevant for some of the objects in the graph. Actually, a website
is given only for object 8 whose name is CAGP. When you click on object 8
in a graphical display generated by GraphExplore, this is the website
that will be shown. For the other objects, GraphExplore substitutes
their names in the placeholder found in the template for the Primary Link ID
Column and shows the corresponding URL.
GraphExplore saves the format you have specified in the Header
dialog box in the node-names file (description.txt in the
running example):
GraphExplore
Header Start
Separator
:= \t
Index := 1
Name
:= 2
PrimaryLinkID
:= 2
PrimaryLinkTemplate
:= http://finance.yahoo.com/q/ecn?s=<placeholder>
Http
:= 4
Note
:= 3
GraphExplore
Header End
1 INTC Intel
Corp
2 NT NorTel
3 UTSI UTStar
Com
4 RHAT Red
Hat
5 MSFT Microsoft
6 LU Lucent
7 AMD Advanced
Micro Devices
8 CAGP CAGP http://www.cagp.duke.edu
9 SOHU
10 SINA
This way you do not need
to provide the format of the node-names file every time you load the project.
Before starting using
GraphExplore, you need to make sure the program is properly installed on your
system. You go to Tools > Preference Editor to
bring up the following dialog box:
In
order to function properly, GraphExplore uses web browsers (e.g., Internet
Explorer, Netscape) and external programs that generate graphical layouts (e.g.,
neato from AT&T’s GraphViz).
In addition, GraphExplore creates temporary files in which it stores
information used by the external programs. The paths that point to the web
browsers, graphical layout programs and temporary files have to be valid paths.
Note that neato is only one of the graphical layout programs that
are called. If you do not wish to use neato or if you do not have
neato installed in your system, leave the corresponding path
field blank.
GraphExplore creates the image of a (sub)graph by employing the
neato utility from AT&T’s GraphViz
open source graph drawing software or in-house modified versions of the layout
libraries available from Java Universal
Graph/Graph Framework (JUNG). GraphExplore saves the graph
in a temporary text file using GraphViz’s DOT format, calls an external library
to generate an image file in SVG format, then shows that image file in Graph
area of the current display panel (see below). The libraries from JUNG
are included in GraphExplore’s distribution package and they do not need to be
separately installed on your computer. Since they are written in Java, they are
guaranteed to function on almost any platform. Unfortunately, the installation
of neato is more problematic and the version that comes with
GraphExplore might not work. If this is the case, you might want to re-install neato
directly from AT&T’s
GraphViz, then tell GraphExplore where neato is located by
going to Tools
> Preference Editor. However, GraphExplore
functions properly without neato by employing the
JUNG-based layouts. The original JUNG libraries were modified by Quanli Wang
(quanli@stat.duke.edu) to fit the
purpose of the application. Due to these changes, the original JUNG libraries
(e.g., Jakata, Colt) are no longer part of GraphExplore’s distribution package.
You
can see which layout engine is currently used by GraphExplore to
generate graphical displays by selecting Layout from the menu:
You can select another
engine by checking the corresponding box in the menu. If the first option is
selected, GraphExplore will use neato while allowing the
vertices in the graph to overlap. The second option specifies neato
too, but this time no overlap between the vertices in the resulting graph is
allowed. The other five options that follow are different JUNG layouts. You
should try several engines to find the best format for a particular graphical
structure. After selecting a new layout engine, do not forget to go to Graphs > Subgraph to actually create the new graphical display.
Once
you have loaded a project, properly configured GraphExplore and
specified a valid layout engine, you can start building graphs of interest by
querying the graph. Here are the main features of the GraphExplore’s
work area as well as the functions you can access from the program’s toolbar.
To
find out relevant information about a particular set of objects, you need to
type a query in the Query edit box. The sub-graph associated with the
objects consistent with your query is compiled using the current layout engine
and shown in the Main display area. The objects can be searched by their
numerical unique identifiers or by their names. GraphExplore first tries to
match the unique identifiers, then the names from the Name Column, then
the names specified in the Alternate Name Columns. You can use a
wildcard (*) if you are not sure about the name of an object. For example, to
display all the objects whose names contain an “n” or whose name begin with “s”
or whose name are exactly “cagp”, type the following query in the Query
box:
s*
*n*
cagp
The query is not case
sensitive. If you simply type a “*” in the Query area, you will obtain
an image with the entire graph. Once the query is ready, you generate a graph
by choosing Graphs > Subgraph from
the menu, by typing Ctrl+Enter or by pressing
the Subgraph button in the toolbar. You can find out what each button in
the toolbar does by positioning the mouse cursor on that button; a tooltip will
appear soon. The result of the previous query is presented below:
If
too many objects are relevant for a particular query, you might experience a
long waiting time and might want to stop the current process. To do that you
can choose Tools > Stop from the
menu or press the red button on the toolbar. You can also choose Tools > Run garbage
collector to free up all the
memory that might still be allocated to the process that has not completed.
The
objects that are consistent with the current query are called targets.
In the figure above there are five target objects. The targets might appear to
be disconnected because all the objects around them were filtered out by the
query. Such objects are called linkers and they are displayed to
connect the targets. For example, you might want to learn how the objects SINA
and SOHU are connected to the other three targets. To display the most relevant
linkers that connect SINA and SOHU to the rest of the graph, go to the
drop-down box to the right of the Print button in the toolbar and change
the “0” to “1”. You will display linkers on paths of length two connecting any
two targets. The selected linkers are UTSI, LU and RHAT.
If you go to the same drop-down
box and change the linker relevance from “1” to “3”, you display all the
linkers on paths at most three between any two target nodes.
You can include in the
resulting sub-graph only edges with weight between a minimum and a maximum
threshold. In the example below you have chosen to display only edges with
weight between 0.2 and 0.4. This essentially creates another graph with fewer
connections; this is the graph that is currently being queried.
If
you right-click in the Main Display area outside
any object (vertex or edge), you obtain the following floating menu:
The
Edge
Report option has already been
discussed. The Node List option lets you
display and save in a text file information about some or all the objects in
the Main
Display area.
The
Shortest
Path option lets you identify
multiple paths up to a certain maximum length connecting two objects that are
disconnected in the current graph. This option is extremely useful if your
graph contains many objects and you need to identify objects that link two
target nodes. Please note that the length of a certain path is defined as the
number of edges on this path. The weight of each edge as well as the type of an
edge (line, forward or backward arrow) is not taken into account when the total
cost of a path is determined.
The
Neighbor
Graph option lets you query the
graph to identify objects that are closely connected with the objects that
are currently in the Main Display area. Assume you would like to learn
about the objects in the graph around CAGP. You type CAGP in the Query box, select Graphs > Subgraph from the menu, bring up the floating menu by
right-clicking in the white background of the Main Display, then select Neighbor Graph from that menu. You will bring up the following
dialog box:
Say
you want do display objects that are on paths at most two from CAGP. Then type
2 in the edit box above and click OK.
You will obtain the following sub-graph:
A
very important and attractive feature of GraphExplore is that it lets
you choose specific colors and shapes for each object in the graph. You can
access these functions from the menu by going to Options > Node >
Color or Options > Node >
Shape. Moreover, GraphExplore
lets you assign the same shapes and/or colors to groups/clusters of objects.
These groups can be different than the clusters you loaded with a new project.
Therefore objects with the same shape can identify one clustering, objects with
the same color can identify another clustering and both of these clusterings
can be different than the clusters loaded with the project. This degree of
flexibility is necessary to create meaningful displays of objects having
different functions.
You
can begin by typing “*” in the Query
box and select Graphs > Subgraph to
create a display of the entire graph. Remark that all the objects are targets
and have the same default color (light blue).
Bring up the colors dialog
box by going to Options > Node > Color.
You can either edit the default colors assigned to targets and linkers, or
specify a custom color format.
If
you click on the drop-down list under the “Customized Node Coloring”
field, you can choose from the following options:
A color file has a two-columns
format: the first column represents the unique identifier of each object, while
the second column designates the corresponding colors.
Your new color choices
will become visible once you re-construct the graph. If you bring up the shapes
dialog box (Options > Node > Shape),
you can change the default shapes associated with the target and linker objects
or specify custom shapes that identify a clustering of your choice or the
default clustering you loaded with the project. This way you can effectively
work and graphically represent three different clusterings in the same time.
GraphExplore can help you
learn the differences and/or common elements between two graphs you generated
from two graphs of objects. Assume you
have used GraphExplore to generate the two (sub-)graphs and have saved
them in SVG format. From the menu,
select Graphs
> Compare Two Graphs to bring
up the following dialog box:
Input the names of the
corresponding SVG files, specify the colors you want to see for the elements of
the two graphs as well as the type of operation you want to perform
(intersection or union), click OK and the resulting union/intersection graph
will be generated in a graph panel.
At this point you should have
a good idea about the main features offered by GraphExplore. The
node-names of all the menu options that follows should give you an idea about
the remaining features that might not have been mentioned so far in this Help
file.