graph1 graph2


Background

For this homework we are going to explore the world of data structures and algorithms. This is a huge subject and we will only be able to examine a tiny subset of a tiny subset of this area. Specifically, we will focus on the mathematical concept of a graph - a collection of nodes (vertices) some subset of which are connected to one another by edges. For this homework we will specify a simple data structure in R to represent these graphs and you will be responsible for implementing a number of common and useful algorithms for working with this kind of data.


Graphs

  • A graph G = (V,E) is simply a set of vertices V, and a set of edges E between those vertices. It may be more convenient to think about a graph as being a network.

  • Edges in graphs can be directed or undirected, the difference being whether the relationship is mutual or one-sided.

  • Edges can be weighted, the value of the weight represents some quantitative measure. Most often these weights are thought of as being the distance between the connected vertices.

  • A path is a sequence of edges that connects two vertices. There may be many paths, or no paths, between two vertices in a graph.


Data Structure

We will simplify things somewhat by using only a single data structure for the most general case - a labeled and weighted directed graph. We will represent this as a list of lists in R. The primary list will consist of the graphs vertices which are identified by a unique label. These labels should be stored in the primary list’s names attribute - a graph object without these labels is considered invalid.

The secondary lists stored in primary list contain the properties of each vertex, specifically an integer vector named edges which stores the indices of vertices connected to the current vertex. Since our data structure is for a directed graph, we will assume that all of these connects are from the current vertex to the listed vertices. It is allowed to have edges that connect a vertex back to itself. The second element for each secondary list is a numeric vector named weights that contains the weights of each connection listed in the edges vector. We again assume that all edges are weighted, if a weight is not specified it should be assumed to be 1 and that all weights should be strictly greater than 0.

Below are example representations of the two graphs at the top of this document:

graph1 = list(A = list(edges   = c(2L),
                       weights = c(1 )),
              B = list(edges   = c(3L),
                       weights = c(1 )),
              C = list(edges   = c(5L),
                       weights = c(1 )),
              D = list(edges   = c(2L),
                       weights = c(1 )),
              E = list(edges   = c(4L,6L),
                       weights = c(1,1  )),
              F = list(edges   = c(),
                       weights = c())
             )
str(graph1)
## List of 6
##  $ A:List of 2
##   ..$ edges  : int 2
##   ..$ weights: num 1
##  $ B:List of 2
##   ..$ edges  : int 3
##   ..$ weights: num 1
##  $ C:List of 2
##   ..$ edges  : int 5
##   ..$ weights: num 1
##  $ D:List of 2
##   ..$ edges  : int 2
##   ..$ weights: num 1
##  $ E:List of 2
##   ..$ edges  : int [1:2] 4 6
##   ..$ weights: num [1:2] 1 1
##  $ F:List of 2
##   ..$ edges  : NULL
##   ..$ weights: NULL
graph2 = list(A = list(edges   = c(2L),
                       weights = c(14)),
              B = list(edges   = c(3L,4L),
                       weights = c(23,13)),
              D = list(edges   = c(1L),
                       weights = c(5) ),
              F = list(edges   = c(1L,5L),
                       weights = c(43,33)),
              N = list(edges   = c(1L,2L,4L),
                       weights = c(33,22,11))
             )
str(graph2)
## List of 5
##  $ A:List of 2
##   ..$ edges  : int 2
##   ..$ weights: num 14
##  $ B:List of 2
##   ..$ edges  : int [1:2] 3 4
##   ..$ weights: num [1:2] 23 13
##  $ D:List of 2
##   ..$ edges  : int 1
##   ..$ weights: num 5
##  $ F:List of 2
##   ..$ edges  : int [1:2] 1 5
##   ..$ weights: num [1:2] 43 33
##  $ N:List of 2
##   ..$ edges  : int [1:3] 1 2 4
##   ..$ weights: num [1:3] 33 22 11

Requirements

Functions

Below are a list of functions that you will need to implement for this assignment. For each function you will be given a name, the input arguments, the output type, and a description of the required behavior of the function. Your implementation of these functions may only use R’s base functionality, meaning you are not allowed to directly use any additional packages in your implementation. However, you may use other packages for testing purposes.

Be aware that there are we established algortihms to solve most of the problems listed below - don’t try to reinvent the wheel use what is known to work. Your task is translate the existing algorithms to our specific data structure and R.

Extra Credit

  • Function - min_span_tree

    • Input - g, a graph object.

    • Output - a graph object (undirected) containing the minimum spanning tree

    • Description - A tree is an undirected graph in which any two vertices are connected by exactly one path (no simple cycles). Therefore, a minimum spanning tree is the tree that connects all vertices in a graph with the shortest possible total of edges, using the existing edges. If given a directed graph return an error. Note that there may not be a unique solution for any given graph, you are only required to return one tree.

  • Function - plot_graph

    • Input - g, a graph object

    • Output - an R plot showing all vertices (labeled) and edges.

    • Description - This function should be able to take any graph object and produce a reasonably attractive visual representation of that graph. Your algorithm should make use edge weights to layout the distance between vertices.


Testing

As this and next week progresses I will be adding tests (using the testthat package) for these functions to every team’s repository. We will be using the wercker continuous integration tool, so that you will be given real time feedback on your progress via github. This process and these tools will be discussed in more detail in class in the coming weeks.

Work Product

For this assignment you will need to create the following files:

Submission and Grading

This homework is due by midnight Saturday, September 26th at 2 pm. You are to complete the assignment as a group and to keep everything (code, write ups, etc.) on your team’s hw2 github repository (commit early and often). All team members are expected to contribute equally to the completion of this assignment and group assessments will be given at its completion - anyone judged to not have sufficient contributed to the final product will have their grade penalized. While different teams members may have different coding backgrounds and abilities, it is the responsibility of every team member to understand how and why all code in the assignment works.