Testing DiGraph (#73)

by Robert Feldt

In this week's Ruby Quiz you will not only have fun and (hopefully) learn something; you will also contribute to a research project evaluating automated testing techniques. So please read on and then take the quiz!

The goal of this quiz is to write a good and extensive test suite for a Ruby DiGraph (directed graph) class. The new (hotshot, and annoying ;)) quality manager at your work has challenged all the developers. He is planning major cutbacks since he claims that automated testing tools can do as good or better a job! The focus of the testing is on the following two methods of the DiGraph class:

# Return the length of the longest simple path (an arc/edge can only
# occur once in the path) that includes <node>.

# Returns the strongly connected component (itself a DirectedGraph)
# that includes <node>.

Any Ruby object can be a node in a graph and you create graphs by giving a number of edges. Each edge is an Array with maximum two nodes where the first node is the source node.

The quiz has two phases: first a black-box phase and then a white-box phase. In the black-box phase you do not have access to the source code but do your testing over the network via drb. When you are satisfied with your tests for this phase you submit them, get the source code and start the white-box phase. Now you can extend your test suite given the actual code, fix problems and even refactor the code as you see fit (as long as you do not change the interface).

You need to download the file "rubyquiz73.rb" to participate in the distributed testing:


After downloading and saving that file, here is how you get a reference to the class under test (CUT):

require 'test/unit'
require 'rubyquiz73'

# You must insert your email address as <youremail> in this method call!
DiGraph = RubyQuiz73.class_under_test("<youremail>")

class TestDiGraph < Test::Unit::TestCase
def test_01_digraph_creation
dg1 = DiGraph.new
assert_kind_of(DiGraph, dg1)
assert_equal(0, dg1.size)

def test_02_size
dg2 = DiGraph.new([1,2], [2,3])
assert_equal(3, dg2.size)
assert_equal(2, dg2.num_edges)

# Add/write your own tests here...

Note that since we use drb for the distributed testing and we had to make some performance trade-offs not every assertion or code might work exactly as it would if run locally. However, most "normal" things will work.

When you consider yourself ready with the blackbox phase of the testing you should submit your test suite. You do this by issuing the command:

ruby rubyquiz73.rb submit1 <test_filename.rb>

and giving the path to your test file. The script will get back the source code for the class being tested and save it. You can now look at the source code and add tests as you see fit. You can also alter and refactor the source code as long as you do not change the interface. When you are done with this, whitebox phase, you submit your test file and source file like so (you can add additional files if you have split your tests over several files):

ruby rubyquiz73.rb submit2 <test_filename.rb> <src_filename.rb>

[Editor's Note: Please also send in your tests to Ruby Talk, after the spoiler period, for use in the summary. --JEG2]

You can also get some help information by issuing:

ruby rubyquiz73.rb help

You are encouraged to briefly document (in comments or by other means) how and why you arrived at and included the test cases you've chosen and why you think your tests are thorough. You are also encouraged to add tests for additional methods of the DiGraph class as you see fit. Note that the devious Quality Manager has not eliminated all problems with the given code so you are expected to find problems/faults!

If the specifications in the RDoc comments above are not complete enough then please make additional, sound assumptions and make them clear in your tests / documentation.

Digraph Definition

strongly connected component

Quiz Summary

The first step to solving this quiz is to come up with some graphs that you can test. I started by thinking of some trivial patterns I could easily hand figure the answers for. A good example is the graph:

A -> B -> C -> D

Finding the longest simple path for that is not hard, it is always all of the edges. That's just a straight line basically and it will include all the nodes. Building these kinds of graphs programatically for any size is also easy:

>> Array.new(3) { |i| [(?A + i).chr, (?A + i + 1).chr] }
=> [["A", "B"], ["B", "C"], ["C", "D"]]

Himadri ChoudHury took a different approach, generating completely random graphs. This has the advantage of testing far more cases including isolated nodes and circular paths. Here's the generation code:

# @dg is the generated DiGraph. Store the paths in a hash called @paths
# @paths is a hash. Each element of the hash is an array that contains
# all the paths from that node
# @nodes is an array which contains a list of all the nodes

# Randomly generate @dg and the corresponding @paths
def generate_dg
nodes = Array.new
# 10 nodes total
10.times do
nodes << rand(10)
@paths = Hash.new
nodes.each do |n|
num_paths_from_each_node = rand(3) + 1
next_nodes = Array.new
num_paths_from_each_node.times do
next_nodes << nodes[rand(nodes.length)]
@paths[n] = next_nodes
arr = Array.new
@paths.each do |key,vals|
@paths[key].each do |val|
arr << [key,val]
@dg = DiGraph.new(*arr)
@nodes = @paths.keys

(Note: those bare instance variables nixed in with the comments reference variables on the Class object and not the instances themselves. They read and discard nil values having no effect.)

The code begins by generating up to ten (uniq!() can reduce the count) random nodes as simple Integers. After that, each node is randomly connected to between one and three other nodes. A Hash is created during this process to represent the connections. This turns out to be very helpful later on. Finally, the paths are converted into the Array representation DiGraph expects and the graph is created. The last line also stores the nodes for easy access.

Here's a random graph (viewed from the paths Hash) created from this code, so you can see how they come out:

{ 5 => [8, 6],
0 => [0, 6],
6 => [1, 7, 6],
1 => [8],
7 => [7, 9, 4],
8 => [6, 1],
9 => [4, 8, 0],
4 => [5] }

Now, in order to use these random graphs, we really need code that can tell us the right answer for the data. Essentially, this means that we require our own implementation of the test methods, to prove that we get the same answers as the DiGraph implementation. Here is one of those methods, used to test max_length_of_simple_path_including_node():

# Depth first search for the longest simple path starting from 'node'
# Simple path means a path that doesn't contain any duplicate edges
# Note: I'm not using the definition of simply connected based on no
# duplicate nodes
def search(node)
longest_path = 0
if (@paths[node])
@paths[node].each_index do |next_idx|
next_node = @paths[node][next_idx]
tmp = 1 + search(next_node)
if (longest_path < tmp)
longest_path = tmp
return longest_path

This method is a simple depth-first search, as the comment says, counting the steps taken to reach the farthest node without crossing a single edge twice. The implementation is just a recursive search from each node reachable from the start node. At each step, the last edge crossed is removed from the paths Hash, the recursive search is performed, and then the edge is restored. The returned result here is just a count, not the actual path.

The tests using that are pretty trivial:

def test_03_max_length_of_simple_path_including_node
@nodes.each do |node|
longest_path = search(node)
# ...
assert_equal( longest_path,
@dg.max_length_of_simple_path_including_node(node) )

Unfortunately, this is where we get to execution problems with the quiz. The above code passes some tests, due to bugs in the DiGraph implementation, but it does not correctly implement the max_length_of_simple_path_including_node() method described in the quiz.

Here's an example of where it gets wrong answers, using my trivial one-way path described at the beginning of this quiz:

>> @paths = {"A" => ["B"], "B" => ["C"], "C" => ["D"]}
=> {"A"=>["B"], "B"=>["C"], "C"=>["D"]}
>> search("A")
=> 3
>> search("B")
=> 2
>> search("C")
=> 1
>> search("D")
=> 0

Those are the longest simple paths *starting* from a given node, but not the longest given paths *including* the given node (which would all be 3).

These errors made it tricky to correctly evaluate the expected results of the methods and I apologize for this. I ran into the same issue with my own tests, as anyone following my posts to Ruby Talk saw in painful detail.

Luckily, it's easy to get to a real solution of the first method from here. First, we can modify Himadri's search() method to return the path instead of just a count:

def search(node, longest_path = [node])
if (@paths[node])
@paths[node].each_index do |next_idx|
next_node = @paths[node][next_idx]
tmp = search(next_node, longest_path + [next_node])
if (longest_path.size < tmp.size)
longest_path = tmp
return longest_path

Then, using that method, we can exhaustively search all the long paths for the biggest one containing our node:

def max_length_of_simple_path_including_node(n)
all_paths = @paths.keys.map { |node| search(node) }
all_paths = all_paths.select { |path| path.include? n }
all_paths = all_paths.sort_by { |path| -path.size }
all_paths.empty? ? 0 : all_paths.shift.size - 1

Line by line that makes a list of the longest paths starting from each node, reduces that list to those including the desired node, sorts the remaining paths biggest to smallest, and finally returns an edge count of the largest path or 0 if there aren't any paths left.

The other method is left as an exercise for the interested reader.

Many, many thanks to the submitters who braved the waters and came up with some basic tests, right or wrong.

Tomorrow, we will programatically generate 1,000 monkeys in the hopes that they can recreate the works of Shakespeare...