We need definitions for other graph-related terminology such as trees and forest and so on. There tend to be widely accepted definitions for undirected graphs but there seem to be more options and variations for these definitions as applied to directed graphs.
I have therefore drawn up two sets of definitions in order to make these issues clearer:
Definitions for undirected graph
- vertex
- edge
- graph
- path or route - a sequence of alternating vertex and edges, starting and ending with a vertex, where every edge connects the nodes it is between. In certain circumstances we will specify a path by its nodes only, or by its edges only, where this can be done unambiguously.
- connected graph - A graph where there is a path from every vertex to every other vertex.
- loop - a path from a vertex back to itself which does not use other nodes or edges more than once.
- tree - A graph which is connected and has no loops (adding any edge to a tree will create a loop). Note: this is different from Tree domain in pan-Axiom because no vertex is designated as the root.
- forest - A graph which has no loops (and in not necessarily connected)(a disjoint union of trees).
- spanning tree of G - A subgraph of G which is a tree and contains all the vertices of G.
- spanning forest of G - A subgraph of G which is a forest and contains all the vertices of G.
Definitions for directed graph
- arrow - an edge with a specified direction
- directed acyclic graph (DAG) - A graph without any cycles, in a directed graph this is not necessarily a tree. A DAG defines a partial order.
- tree - This is where the definition starts to get more fuzzy
for the directed case. There are lots of sub-definitions:
- polytree - a directed graph with at most one undirected path between any two vertices.
- oriented tree - same as polytree
- directed tree - a directed graph which would be a tree if the directions on the arrows were ignored.
- arborescence - arrows are all directed towards a particular vertex, or all directed away from a particular vertex.
- rooted tree - one vertex has been designated the root and the arrows are all towards or all away from the root.
- free tree - tree without any designated root.
- tree-order - partial ordering on the vertices of a tree.
Design Issues
So far, only finite graphs are implemented, these all extend the category FiniteGraph (FGRPH). I would like this to extend a more general Graph (GRPH) category which would include functions such as 'neighbours' and 'distance' which are common to both finite and infinite graphs. However, at this stage, I don't know how infinite graphs would be represented (presumably some inductively defined data structure). Therefore I don't know how vertices of the graph would be referred too, in the most general case (can they still be indexed), so I don't know if the same signature can be used for 'distance' in the finite and infinite cases. I have therefore left this Graph (GRPH) category as a possible future enhancement.
Probably the most efficient general representation coding of a graph is as a set of external 'objects' (graph is defined over a set) together with pairs of indices to represent the arrows.
Because the arrows are defined in terms of indices this means that the order of the objects becomes important and therefore must be held in a list/array rather than a set.
The issue of list verses array is a performance issue, more of that later.
It seems to me that there is a complimentary coding of graph where the arrows are a set of external objects and each node would be defined as two lists of indices: a list of incoming nodes and a list of outgoing nodes. I have not implemented this complimentary coding because its more complicated and I can't think of a use for it. However I have implemented two special related cases:
- FunctionGraph
- MultifunctionGraph
In 'FunctionGraph' each vertex has one outgoing arrow and In 'MultifunctionGraph' each vertex has exactly 'n' outgoing arrows where 'n' is the same for every vertex. These limitations mean that they can be coded more efficiently by having the arrow information directly associated with each vertex. As its name suggests 'FunctionGraph' can represent a function, a mapping where the domain and codomain are the same finite set of objects over which it is defined.
'MultifunctionGraph' can represent multiple function graphs such as Cayley graphs. Although these are special cases of the general graph, and could have used the general coding it is more efficient coding them in this way. They have a number of functions which are not applicable to the general case.
The graph structures are concerned with finite graphs, that is finite numbers of vertices. It would be interesting to find ways to encode repetitive or repeating structures in an efficient way.
Now I come to an issue which gives me a lot of concern: the representation so far includes only the information required to define the pure mathematical structures but there is other information associated with graphs which helps to make it more human understandable such as names for arrows and diagram coordinates. I think this information is very important because there are great benefits to human users in being able to 'see' the structure in an accessible way. As a general rule I think it is important to separate out these issues and as far as I can I put the graphics stuff into the graphics framework: 'scene.spad.pamphlet'. However, in this case, I have not managed to separate out these issues.
I did consider creating a wrapper domain, which would be external to the graph code, that would allow annotation and coordinate information to be associated with a vertex.
objectWrapper(S) .... Rep := Record(value:S,name:String,posX:NNI,posY:NNI)
However, if we want to do this with arrow information (which we do), then we would have to have a separate external arrow domain. There might be some benefits to this but its also quite messy, if we continue to use indexes then they have no meaning outside the graph and if we specify to objects directly then there are efficiency issues. If we do all that then, the Rep will be free of annotation and coordinate information, but the graph code still needs to know about coordinate information. For example, if we are taking the product of two graphs, then we can combine the coordinate of the two operands to generate useful coordinate information for the product graph as a whole.
So, in the end, I could not separate out the graphical/annotation information. It was just easier and more efficient to leave this information in Rep even though it goes against certain principles.
There is another issue about what goes into Rep, at the moment only certain combinations of graph type types are possible, for instance, we can't have a function 'FunctionGraph' with weighted arrows, it would be more general to allow all combinations of graph? If we do then the memory usage goes up a bit but if we don't then we need lots of combinations of domains:
- Weighted nodes + directed graph
- Weighted arrows + directed graph
- Weighted nodes + weighted arrows + directed graph
- Weighted nodes + undirected graph
- Weighted arrows + undirected graph
- Weighted nodes + weighted arrows + undirected graph
- Weighted nodes + function graph
- ... and so on
There are other types of graph that I would like to experiment with, for instance, what I think of as a multilevel graph which perhaps could be implemented in equivalent ways:
- As a graph whose elements are themselves graphs, where the outer graph can see inside (has arrows between the nodes of) the inner graph.
- As a set of graphs with mappings between them, where the whole system is treated as a graph.
- As a graph which also has arrows between the arrows.
More about 'multilevel graph' here.
How should these graph structures fit in and relate to the rest of the Axiom code library? It seems to me that structures like Tree, POSET/lattice are special cases of graph? Also there may be more general structures like Closed Cartesian Category (CCC) that should be categories in the code library?
Performance Issues
There is plenty of scope to improve the code. I have not tested how well it scales up (in terms of memory usage and runtime) but I suspect not very well at the moment. I think there is a lot of scope to improve the algorithms used.
There is the issue (pointed out by Waldek) of using lists verses arrays in Rep. If we are thinking about how well this would scale up then I guess:
- mutable (using List) would be more efficient for constructing very large graphs.
- immutable (using array) would be more efficient for deconstructing/using the graphs once they have been constructed.
I guess it would be nice to have both mutable and immutable graphs but that would add even more permutations of domain types.
To Do
- Check out definitions of 'spanningForestNode' and 'spanningTreeArrow' to make sure there is no source of confusion.
- At the moment the loop detection uses 'spanningForestNode' as it is. I realise this is very inefficient (in terms of memory and runtime), I want to improve the loop detection and routing algorithms (perhaps use Floyd's algorithm or something line that) and not use spanningForestNode for this purpose. Although I think spanningForestNode should still be provided as it represents an important concept.
- Need to improve SVG output. While this is useful, at the moment we need to tweak the SVG files produced using a program such as Inkscape, It would be possible to produce SVG output that is more readable but this first requires improvements to the graphics framework: scene.spad.pamphlet
- I would like two-way graphical interaction by the user, so for instance, a user could drag a vertex and its arrows would move with it. This type of interaction is not possible by writing code in SPAD.
- Add domain parameter of type OrderedAbelianMonoid to weights to be other than NNI, in particular doubleFloat.
- Allow nodes to have weights in addition to arrows.
- I would also like to experiment with the connection between (duality?) this code and the computing framework. That is use graphs to implement finite-state machine, Turing machine, etc.