Sponsored Links

Rabu, 23 Mei 2018

Sponsored Links

Minimum Spanning Tree #1: Kruskal Algorithm - YouTube
src: i.ytimg.com

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted (un)directed graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

There are quite a few use cases for minimum spanning trees. One example would be a telecommunications company trying to lay cable in a new neighborhood. If it is constrained to bury the cable only along certain paths (e.g. roads), then there would be a graph containing the points (e.g. houses) connected by those paths. Some of the paths might be more expensive, because they are longer, or require the cable to be buried deeper; these paths would be represented by edges with larger weights. Currency is an acceptable unit for edge weight - there is no requirement for edge lengths to obey normal rules of geometry such as the triangle inequality. A spanning tree for that graph would be a subset of those paths that has no cycles but still connects every house; there might be several spanning trees possible. A minimum spanning tree would be one with the lowest total cost, representing the least expensive path for laying the cable.


Video Minimum spanning tree



Properties

Possible multiplicity

If there are n vertices in the graph, then each spanning tree has n - 1 edges.

There may be several minimum spanning trees of the same weight; in particular, if all the edge weights of a given graph are the same, then every spanning tree of that graph is minimum.

Uniqueness

If each edge has a distinct weight then there will be only one, unique minimum spanning tree. This is true in many realistic situations, such as the telecommunications company example above, where it's unlikely any two paths have exactly the same cost. This generalizes to spanning forests as well.

Proof:

  1. Assume the contrary, that there are two different MSTs A and B.
  2. Since A and B differ despite containing the same nodes, there is at least one edge that belongs to one but not the other. Among such edges, let e1 be the one with least weight; this choice is unique because the edge weights are all distinct. Without loss of generality, assume e1 is in A.
  3. As B is a MST, {e1} ? {\displaystyle \cup } B must contain a cycle C.
  4. As a tree, A contains no cycles, therefore C must have an edge e2 that is not in A.
  5. Since e1 was chosen as the unique lowest-weight edge among those belonging to exactly one of A and B, the weight of e2 must be greater than the weight of e1.
  6. Replacing e2 with e1 in B therefore yields a spanning tree with a smaller weight.
  7. This contradicts the assumption that B is a MST.

More generally, if the edge weights are not all distinct then only the (multi-)set of weights in minimum spanning trees is certain to be unique; it is the same for all minimum spanning trees.

Minimum-cost subgraph

If the weights are positive, then a minimum spanning tree is in fact a minimum-cost subgraph connecting all vertices, since subgraphs containing cycles necessarily have more total weight.

Cycle property

For any cycle C in the graph, if the weight of an edge e of C is larger than the individual weights of all other edges of C, then this edge cannot belong to a MST.

Proof: Assume the contrary, i.e. that e belongs to an MST T1. Then deleting e will break T1 into two subtrees with the two ends of e in different subtrees. The remainder of C reconnects the subtrees, hence there is an edge f of C with ends in different subtrees, i.e., it reconnects the subtrees into a tree T2 with weight less than that of T1, because the weight of f is less than the weight of e.

Cut property

For any cut C of the graph, if the weight of an edge e in the cut-set of C is strictly smaller than the weights of all other edges of the cut-set of C, then this edge belongs to all MSTs of the graph.

Proof: Assume that there is a MST T that does not contain e. Adding e to T will produce a cycle, that crosses the cut once at e and crosses back at another edge e' . Deleting e' we get a spanning tree T\{e'}U{e} of strictly smaller weight than T. This contradicts the assumption that T was a MST.

By a similar argument, if more than one edge is of minimum weight across a cut, then each such edge is contained in some minimum spanning tree.

Minimum-cost edge

If the minimum cost edge e of a graph is unique, then this edge is included in any MST.

Proof: if e was not included in the MST, removing any of the (larger cost) edges in the cycle formed after adding e to the MST, would yield a spanning tree of smaller weight.

Contraction

If T is a tree of MST edges, then we can contract T into a single vertex while maintaining the invariant that the MST of the contracted graph plus T gives the MST for the graph before contraction.


Maps Minimum spanning tree



Algorithms

In all of the algorithms below, m is the number of edges in the graph and n is the number of vertices.

Classic algorithms

The first algorithm for finding a minimum spanning tree was developed by Czech scientist Otakar Bor?vka in 1926 (see Bor?vka's algorithm). Its purpose was an efficient electrical coverage of Moravia. The algorithm proceeds in a sequence of stages. In each stage, called Boruvka step, it identifies a forest F consisting of the minimum-weight edge incident to each vertex in the graph G, then forms the graph G 1 = G \ F {\displaystyle G_{1}=G\setminus F} as the input to the next step. Here G \ F {\displaystyle G\setminus F} denotes the graph derived from G by contracting edges in F (by the Cut property, these edges belong to the MST). Each Boruvka step takes linear time. Since the number of vertices is reduced by at least half in each step, Boruvka's algorithm takes O(m log n) time.

A second algorithm is Prim's algorithm, which was invented by Jarnik in 1930 and rediscovered by Prim in 1957 and Dijkstra in 1959. Basically, it grows the MST (T) one edge at a time. Initially, T contains an arbitrary vertex. In each step, T is augmented with a least-weight edge (x,y) such that x is in T and y is not yet in T. By the Cut property, all edges added to T are in the MST. Its run-time is either O(m log n) or O(m + n log n), depending on the data-structures used.

A third algorithm commonly in use is Kruskal's algorithm, which also takes O(m log n) time.

A fourth algorithm, not as commonly used, is the reverse-delete algorithm, which is the reverse of Kruskal's algorithm. Its runtime is O(m log n (log log n)3).

All these four are greedy algorithms. Since they run in polynomial time, the problem of finding such trees is in FP, and related decision problems such as determining whether a particular edge is in the MST or determining if the minimum total weight exceeds a certain value are in P.

Faster algorithms

Several researchers have tried to find more computationally-efficient algorithms.

In a comparison model, in which the only allowed operations on edge weights are pairwise comparisons, Karger, Klein & Tarjan (1995) found a linear time randomized algorithm based on a combination of Bor?vka's algorithm and the reverse-delete algorithm.

The fastest non-randomized comparison-based algorithm with known complexity, by Bernard Chazelle, is based on the soft heap, an approximate priority queue. Its running time is O(m ?(m,n)), where ? is the classical functional inverse of the Ackermann function. The function ? grows extremely slowly, so that for all practical purposes it may be considered a constant no greater than 4; thus Chazelle's algorithm takes very close to linear time.

Linear-time algorithms in special cases

Dense graphs

If the graph is dense (i.e. m/n >= log log log n), then a deterministic algorithm by Fredman and Tarjan finds the MST in time O(m). The algorithm executes a number of phases. Each phase executes Prim's algorithm many times, each for a limited number of steps. The run-time of each phase is O(m+n). If the number of vertices before a phase is n ? {\displaystyle n'} , the number of vertices remaining after a phase is at most n ? / 2 m / n ? {\displaystyle n'/2^{m/n'}} . Hence, at most log * n {\displaystyle \log ^{*}{n}} phases are needed, which gives a linear run-time for dense graphs.

There are other algorithms that work in linear time on dense graphs.

Integer weights

If the edge weights are integers represented in binary, then deterministic algorithms are known that solve the problem in O(m + n) integer operations. Whether the problem can be solved deterministically for a general graph in linear time by a comparison-based algorithm remains an open question.

Decision trees

Given graph G where the nodes and edges are fixed but the weights are unknown, it is possible to construct a binary decision tree (DT) for calculating the MST for any permutation of weights. Each internal node of the DT contains a comparison between two edges, e.g. "Is the weight of the edge between x and y larger than the weight of the edge between w and z?". The two children of the node correspond to the two possible answers "yes" or "no". In each leaf of the DT, there is a list of edges from G that correspond to an MST. The runtime complexity of a DT is the largest number of queries required to find the MST, which is just the depth of the DT. A DT for a graph G is called optimal if it has the smallest depth of all correct DTs for G.

For every integer r, it is possible to find optimal decision trees for all graphs on r vertices by brute-force search. This search proceeds in two steps.

A. Generating all potential DTs

  • There are 2 ( r 2 ) {\displaystyle 2^{r \choose 2}} different graphs on r vertices.
  • For each graph, an MST can always be found using r(r-1) comparisons, e.g. by Prim's algorithm.
  • Hence, the depth of an optimal DT is less than r 2 {\displaystyle r^{2}} .
  • Hence, the number of internal nodes in an optimal DT is less than 2 r 2 {\displaystyle 2^{r^{2}}} .
  • Every internal node compares two edges. The number of edges is at most r 2 {\displaystyle r^{2}} so the different number of comparisons is at most r 4 {\displaystyle r^{4}} .
  • Hence, the number of potential DTs is less than: ( r 4 ) ( 2 r 2 ) = r 2 ( r 2 + 2 ) {\displaystyle {(r^{4})}^{(2^{r^{2}})}=r^{2^{(r^{2}+2)}}} .

B. Identifying the correct DTs To check if a DT is correct, it should be checked on all possible permutations of the edge weights.

  • The number of such permutations is at most ( r 2 ) ! {\displaystyle (r^{2})!} .
  • For each permutation, solve the MST problem on the given graph using any existing algorithm, and compare the result to the answer given by the DT.
  • The running time of any MST algorithm is at most ( r 2 ) {\displaystyle (r^{2})} , so the total time required to check all permutations is at most ( r 2 + 1 ) ! {\displaystyle (r^{2}+1)!} .

Hence, the total time required for finding an optimal DT for all graphs with r vertices is: 2 ( r 2 ) ? r 2 ( r 2 + 2 ) ? ( r 2 + 1 ) ! {\displaystyle 2^{r \choose 2}\cdot r^{2^{(r^{2}+2)}}\cdot (r^{2}+1)!} , which is less than: 2 2 r 2 + o ( r ) {\displaystyle 2^{2^{r^{2}+o(r)}}} .

Optimal algorithm

Seth Pettie and Vijaya Ramachandran have found a provably optimal deterministic comparison-based minimum spanning tree algorithm. The following is a simplified description of the algorithm.

  1. Let r = log log log n {\displaystyle r=\log \log \log n} , where n is the number of vertices. Find all optimal decision trees on r vertices. This can be done in time O(n) (see Decision trees above).
  2. Partition the graph to components with at most r vertices in each component. This partition can be done in time O(m).
  3. Use the optimal decision trees to find an MST for each component.
  4. Contract each connected component spanned by the MSTs to a single vertex.
  5. It is possible to prove that the resulting graph has at most n/r vertices. Hence, the graph is dense and we can use any algorithm which works on Dense graphs in time O(m).

The runtime of all steps in the algorithm is O(m), except for the step of using the decision trees. We don't know the runtime of this step, but we know that it is optimal - no algorithm can do better than the optimal decision tree.

Thus, this algorithm has the peculiar property that it is provably optimal although its runtime complexity is unknown.

Parallel and distributed algorithms

Research has also considered parallel algorithms for the minimum spanning tree problem. With a linear number of processors it is possible to solve the problem in O ( log n ) {\displaystyle O(\log n)} time. Bader & Cong (2006) demonstrate an algorithm that can compute MSTs 5 times faster on 8 processors than an optimized sequential algorithm.

Other specialized algorithms have been designed for computing minimum spanning trees of a graph so large that most of it must be stored on disk at all times. These external storage algorithms, for example as described in "Engineering an External Memory Minimum Spanning Tree Algorithm" by Roman, Dementiev et al., can operate, by authors' claims, as little as 2 to 5 times slower than a traditional in-memory algorithm. They rely on efficient external storage sorting algorithms and on graph contraction techniques for reducing the graph's size efficiently.

The problem can also be approached in a distributed manner. If each node is considered a computer and no node knows anything except its own connected links, one can still calculate the distributed minimum spanning tree.


Tips for Writing a Personal Essay | Human Centered Design The Long ...
src: image.slidesharecdn.com


MST on complete graphs

Alan M. Frieze showed that given a complete graph on n vertices, with edge weights that are independent identically distributed random variables with distribution function F {\displaystyle F} satisfying F ? ( 0 ) > 0 {\displaystyle F'(0)>0} , then as n approaches +? the expected weight of the MST approaches ? ( 3 ) / F ? ( 0 ) {\displaystyle \zeta (3)/F'(0)} , where ? {\displaystyle \zeta } is the Riemann zeta function. Frieze and Steele also proved convergence in probability. Svante Janson proved a central limit theorem for weight of the MST.

For uniform random weights in [ 0 , 1 ] {\displaystyle [0,1]} , the exact expected size of the minimum spanning tree has been computed for small complete graphs.


Kruskal's Algorithm - Minimum Spanning Tree (MST) - YouTube
src: i.ytimg.com


Applications

Minimum spanning trees have direct applications in the design of networks, including computer networks, telecommunications networks, transportation networks, water supply networks, and electrical grids (which they were first invented for, as mentioned above). They are invoked as subroutines in algorithms for other problems, including the Christofides algorithm for approximating the traveling salesman problem, approximating the multi-terminal minimum cut problem (which is equivalent in the single-terminal case to the maximum flow problem), and approximating the minimum-cost weighted perfect matching.

Other practical applications based on minimal spanning trees include:

  • Taxonomy.
  • Cluster analysis: clustering points in the plane, single-linkage clustering (a method of hierarchical clustering), graph-theoretic clustering, and clustering gene expression data.
  • Constructing trees for broadcasting in computer networks.
  • Image registration and segmentation - see minimum spanning tree-based segmentation.
  • Curvilinear feature extraction in computer vision.
  • Handwriting recognition of mathematical expressions.
  • Circuit design: implementing efficient multiple constant multiplications, as used in finite impulse response filters.
  • Regionalisation of socio-geographic areas, the grouping of areas into homogeneous, contiguous regions.
  • Comparing ecotoxicology data.
  • Topological observability in power systems.
  • Measuring homogeneity of two-dimensional materials.
  • Minimax process control.
  • Minimum spanning trees can also be used to describe financial markets. A correlation matrix can be created by calculating a coefficient of correlation between any two stocks.This matrix can be represented topologically as a complex network and a minimum spanning tree can be constructed to visualize relationships.

Minimum Bottleneck Spanning Trees (MBST) - ppt video online download
src: slideplayer.com


Related problems

The problem of finding the Steiner tree of a subset of the vertices, that is, minimum tree that spans the given subset, is known to be NP-Complete.

A related problem is the k-minimum spanning tree (k-MST), which is the tree that spans some subset of k vertices in the graph with minimum weight.

A set of k-smallest spanning trees is a subset of k spanning trees (out of all possible spanning trees) such that no spanning tree outside the subset has smaller weight. (Note that this problem is unrelated to the k-minimum spanning tree.)

The Euclidean minimum spanning tree is a spanning tree of a graph with edge weights corresponding to the Euclidean distance between vertices which are points in the plane (or space).

The rectilinear minimum spanning tree is a spanning tree of a graph with edge weights corresponding to the rectilinear distance between vertices which are points in the plane (or space).

In the distributed model, where each node is considered a computer and no node knows anything except its own connected links, one can consider distributed minimum spanning tree. The mathematical definition of the problem is the same but there are different approaches for a solution.

The capacitated minimum spanning tree is a tree that has a marked node (origin, or root) and each of the subtrees attached to the node contains no more than a c nodes. c is called a tree capacity. Solving CMST optimally is NP-hard, but good heuristics such as Esau-Williams and Sharma produce solutions close to optimal in polynomial time.

The degree constrained minimum spanning tree is a minimum spanning tree in which each vertex is connected to no more than d other vertices, for some given number d. The case d = 2 is a special case of the traveling salesman problem, so the degree constrained minimum spanning tree is NP-hard in general.

For directed graphs, the minimum spanning tree problem is called the Arborescence problem and can be solved in quadratic time using the Chu-Liu/Edmonds algorithm.

A maximum spanning tree is a spanning tree with weight greater than or equal to the weight of every other spanning tree. Such a tree can be found with algorithms such as Prim's or Kruskal's after multiplying the edge weights by -1 and solving the MST problem on the new graph. A path in the maximum spanning tree is the widest path in the graph between its two endpoints: among all possible paths, it maximizes the weight of the minimum-weight edge. Maximum spanning trees find applications in parsing algorithms for natural languages and in training algorithms for conditional random fields.

The dynamic MST problem concerns the update of a previously computed MST after an edge weight change in the original graph or the insertion/deletion of a vertex.

The minimum labeling spanning tree problem is to find a spanning tree with least types of labels if each edge in a graph is associated with a label from a finite label set instead of a weight.

A bottleneck edge is the highest weighted edge in a spanning tree. A spanning tree is a minimum bottleneck spanning tree (or MBST) if the graph does not contain a spanning tree with a smaller bottleneck edge weight. A MST is necessarily a MBST (provable by the cut property), but a MBST is not necessarily a MST.


Starter Minimum Spanning Tree.avi - YouTube
src: i.ytimg.com


References


Minimum Spanning Tree - Best Tree 2017
src: he-s3.s3.amazonaws.com


Further reading

  • Otakar Boruvka on Minimum Spanning Tree Problem (translation of the both 1926 papers, comments, history) (2000) Jaroslav Ne?et?il, Eva Milková, Helena Nesetrilová. (Section 7 gives his algorithm, which looks like a cross between Prim's and Kruskal's.)
  • Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapter 23: Minimum Spanning Trees, pp. 561-579.
  • Eisner, Jason (1997). State-of-the-art algorithms for minimum spanning trees: A tutorial discussion. Manuscript, University of Pennsylvania, April. 78 pp.
  • Kromkowski, John David. "Still Unmelted after All These Years", in Annual Editions, Race and Ethnic Relations, 17/e (2009 McGraw Hill) (Using minimum spanning tree as method of demographic analysis of ethnic diversity across the United States).

Graph : Minimum Spanning Tree (Kruskal, Prim Algorithm) - YouTube
src: i.ytimg.com


External links

  • Implemented in BGL, the Boost Graph Library
  • The Stony Brook Algorithm Repository - Minimum Spanning Tree codes
  • Implemented in QuickGraph for .Net

Source of the article : Wikipedia

Comments
0 Comments