graph1 graph2


Background

For this homework we are going to explore the world of data structures and algorithms. This is a huge subject and we will only be able to examine a tiny subset of a tiny subset of this area. Specifically, we will focus on the mathematical concept of a graph - a collection of nodes (vertices) some subset of which are connected to one another by edges. For this homework we will specify a simple data structure in R to represent these graphs and you will be responsible for implementing a number of common and useful algorithms for working with this kind of data.


Graphs

  • A graph G = (V,E) is simply a set of vertices V, and a set of edges E between those vertices. It may be more convenient to think about a graph as being a network.

  • Edges in graphs can be directed or undirected, the difference being whether the relationship is mutual or one-sided.

  • Edges can be weighted, the value of the weight represents some quantitative measure. Most often these weights are thought of as being the distance between the connected vertices.

  • A path is a sequence of edges that connects two vertices. There may be many paths, or no paths, between two vertices in a graph.


Data Structure

We will simplify things somewhat by using only a single data structure for the most general case - a labeled and weighted directed graph. We will represent this as a list of lists in R. The primary list will consist of the graphs vertices which are identified by a unique label. These labels should be stored in the primary list’s names attribute - a graph object without these labels is considered invalid.

The secondary lists, stored in primary list, contain the properties of each vertex, specifically an integer vector named edges which stores the indices of vertices connected to the current vertex. Since our data structure is for a directed graph, we will assume that all of these connections are from the current vertex to the listed vertices. It possible (and allowed) to have a edge that connects a vertex back to itself. The second element for each secondary list is a numeric vector named weights that contains the weights of each connection listed in the edges vector. We again assume that all edges are weighted, if a weight is not specified it should be assumed to be 1 and that all weights should be strictly greater than 0. The ordering of the edges and weights in this list does not matter - it is allowed to list edges then weights, or weights then edges.

Below are example representations of the two graphs at the top of this document:

graph1 = list(A = list(edges   = c(2L),
                       weights = c(1 )),
              B = list(edges   = c(3L),
                       weights = c(1 )),
              C = list(edges   = c(5L),
                       weights = c(1 )),
              D = list(edges   = c(2L),
                       weights = c(1 )),
              E = list(edges   = c(4L,6L),
                       weights = c(1,1  )),
              F = list(edges   = c(),
                       weights = c())
             )
str(graph1)
## List of 6
##  $ A:List of 2
##   ..$ edges  : int 2
##   ..$ weights: num 1
##  $ B:List of 2
##   ..$ edges  : int 3
##   ..$ weights: num 1
##  $ C:List of 2
##   ..$ edges  : int 5
##   ..$ weights: num 1
##  $ D:List of 2
##   ..$ edges  : int 2
##   ..$ weights: num 1
##  $ E:List of 2
##   ..$ edges  : int [1:2] 4 6
##   ..$ weights: num [1:2] 1 1
##  $ F:List of 2
##   ..$ edges  : NULL
##   ..$ weights: NULL
graph2 = list(A = list(edges   = c(2L),
                       weights = c(14)),
              B = list(edges   = c(3L,4L),
                       weights = c(23,13)),
              D = list(edges   = c(1L),
                       weights = c(5) ),
              F = list(edges   = c(1L,5L),
                       weights = c(43,33)),
              N = list(edges   = c(1L,2L,4L),
                       weights = c(33,22,11))
             )
str(graph2)
## List of 5
##  $ A:List of 2
##   ..$ edges  : int 2
##   ..$ weights: num 14
##  $ B:List of 2
##   ..$ edges  : int [1:2] 3 4
##   ..$ weights: num [1:2] 23 13
##  $ D:List of 2
##   ..$ edges  : int 1
##   ..$ weights: num 5
##  $ F:List of 2
##   ..$ edges  : int [1:2] 1 5
##   ..$ weights: num [1:2] 43 33
##  $ N:List of 2
##   ..$ edges  : int [1:3] 1 2 4
##   ..$ weights: num [1:3] 33 22 11

Requirements

Functions

Below are a list of functions that you will need to implement for this assignment. For each function you will be given a name, the input arguments, the output type, and a description of the required behavior of the function. Your implementation of these functions should only use R’s base functionality, meaning you are not allowed to directly use any additional packages in your implementation. However, you may use other packages for testing purposes.

Be aware that there are well established algortihms to solve all of the problems listed below - don’t try to reinvent the wheel use what is known to work. Your task is translate the existing algorithms to our specific data structure and R.

Extra Credit

  • Function - min_span_tree

    • Input - g, a graph object.

    • Output - a graph object (undirected) containing the minimum spanning tree, throw an error if g is not a valid undirected graph

    • Description - A tree is an undirected graph in which any two vertices are connected by exactly one path (no simple cycles). Therefore, a minimum spanning tree is the tree that connects all vertices in a graph with the shortest possible total of edges, using the existing edges. If given a directed graph return an error. Note that there may not be a unique solution for any given graph, you are only required to return one tree.

  • Function - plot_graph

    • Input - g, a graph object

    • Output - an R plot showing all vertices (labeled) and edges.

    • Description - This function should be able to take any graph object and produce a reasonably attractive visual representation of that graph. Attractive here means that there is a minimum of overlap of edges and vertices in your plot. Your algorithm should make use edge weights to layout the distance between vertices.


Testing

As this and next week progresses I will be adding tests (using the testthat package) for these functions to every team’s repository. We will be using the wercker continuous integration tool to give you real time feedback on your progress via github. This process and these tools will be discussed in more detail in class in the coming weeks.

Work Product

For this assignment you will need to create the following files:

Note that code efficiency, quality, and formating will be a graded component of this assignment. If you do not know what good R code should look like please read Google’s and/or Hadley Wickham’s R style guides. Myself and the TAs will give feedback as the assignment progresses.

Submission and Grading

This homework is due by 11 pm Monday, February 15th. You are to complete the assignment as a group and to keep everything (code, write ups, etc.) on your team’s hw2 github repository (commit early and often). All team members are expected to contribute equally to the completion of this assignment and group assessments will be given at its completion - anyone judged to not have sufficient contributed to the final product will have their grade penalized. While different teams members may have different coding backgrounds and abilities, it is the responsibility of every team member to understand how and why all code in the assignment works.