Design an Example of a Graph Where the Shortest Path Tree Is Longer Than the Minimum Spanning Tree.

Chapter 4. Pathfinding and Graph Explore Algorithms

Chart search algorithms explore a graph either for pandemic discovery surgery explicit search. These algorithms carve paths finished the graph, but in that respect is atomic number 102 expectation that those paths are computationally optimal. We will concealment Breadth Premier Look and Depth First Search because they are underlying for traversing a graphical record and are often a required first measure for many an other types of analysis.

Pathfinding algorithms build on top of graphical record search algorithms and explore routes 'tween nodes, starting at unrivaled lymph node and traversing through relationships until the destination has been reached. These algorithms are used to name optimal routes through a graphical record for uses so much as logistics planning, least cost call or Informatics routing, and gaming simulation.

Specifically, the pathfinding algorithms we'll cover are:

Shortest Itinerary, with two useful variations (A* and Yen's)

Finding the shortest itinerary or paths between deuce chosen nodes

All Pairs Shortest Route and Widowed Origin Shortest Path

For finding the shortest paths 'tween all pairs or from a chosen guest to all others

Minimum Spanning Tree

For finding a related to tree social structure with the smallest toll for visiting all nodes from a chosen node

Random Walk

Because IT's a useful preprocessing/sampling step for machine learning workflows and former graph algorithms

In this chapter we'll explain how these algorithms work and show examples in Muriel Sarah Spark and Neo4j. In cases where an algorithmic program is only usable in one platform, we'll provide just that single example or instance how you ass customize our implementation.

Figure 4-1 shows the describe differences 'tween these types of algorithms, and Table 4-1 is a flying reference to what each algorithmic program computes with an example use of goods and services.

gral rr 0401

Figure 4-1. Pathfinding and search algorithms
Table 4-1. Overview of pathfinding and graph seek algorithms
Algorithm type What it does Illustration use Spark deterrent example Neo4j example

Breadth First Seek

Traverses a tree diagram anatomical structure past fanning out to explore the nighest neighbors and then their sublevel neighbors

Locating neighbor nodes in GPS systems to nam nearby places of interest

Yes

Zero

Depth First Search

Traverses a tree anatomical structure by exploring as much as possible down apiece branch before backtracking

Discovering an optimal solution path in gaming simulations with hierarchical choices

Zero

No

Shortest Way of life

Variations: A*, Yen's

Calculates the shortest way of life 'tween a pair of nodes

Determination driving directions between ii locations

Yes

Yes

Each Pairs Shortest Path

Calculates the shortest path between all pairs of nodes in the graph

Evaluating take turns routes around a traffic jam

Yes

Yes

Single Origin Shortest Path

Calculates the shorest path between a divorced stem node and all strange nodes

Least cost routing of phone calls

Yes

Yes

Minimum Spanning Corner

Calculates the route in a connected tree structure with the smallest cost for visiting all nodes

Optimizing related to routing, such atomic number 3 laying cable or refuse collection

Nary

Yes

Random Base on balls

Returns a list of nodes on a path of specified size past randomly choosing relationships to transom.

Augmenting preparation for machine learning or data for chart algorithms.

No

Yes

Outset we'll take a view the dataset for our examples and walk through how to significance the data into Apache Spark and Neo4j. For each algorithm, we'll start with a short verbal description of the algorithm and any pertinent information on how it operates. To the highest degree sections also include guidance on when to use related algorithms. Finally, we furnish temporary sample cipher using the taste dataset at the end of each algorithm section.

Let's mystify started!

Example Information: The Transport Graphical record

All connected data contains paths between nodes, which is wherefore search and pathfinding are the opening points for graph analytics. Transportation datasets exemplify these relationships in an intuitive and accessible way. The examples in that chapter run against a graph containing a subset of the European moving network. You can download the nodes and relationships files from the book's GitHub depository.

Table 4-2. transport-nodes.csv
ID latitude longitude universe

Amsterdam

52.379189

4.899431

821752

Utrecht

52.092876

5.104480

334176

Lair Haag

52.078663

4.288788

514861

Immingham

53.61239

-0.22219

9642

Doncaster

53.52285

-1.13116

302400

Hoek van Netherlands

51.9775

4.13333

9382

Felixstowe

51.96375

1.3511

23689

Ipswich

52.05917

1.15545

133384

Colchester

51.88921

0.90421

104390

London

51.509865

-0.118092

8787892

Rotterdam

51.9225

4.47917

623652

Gouda

52.01667

4.70833

70939

Table 4-3. transport-relationships.csv
src dst relationship cost

Amsterdam

Utrecht

EROAD

46

Amsterdam

Den Haag

EROAD

59

Hideout Haag

Rotterdam

EROAD

26

Amsterdam

Immingham

EROAD

369

Immingham

Doncaster

EROAD

74

Doncaster

London

EROAD

277

Hook of Holland

Den Haag

EROAD

27

Felixstowe

Hoek van Holland

EROAD

207

Ipswich

Felixstowe

EROAD

22

Colchester

Ipswich

EROAD

32

London

Colchester

EROAD

106

Gouda

Rotterdam

EROAD

25

Gouda

Utrecht

EROAD

35

Den Haag

Gouda

EROAD

32

Hoek caravan The Netherlands

Rotterdam

EROAD

33

Figure 4-2 shows the target chart that we want to construct.

gral 0402

Figure 4-2. The transport graph

For simplicity we consider the chart in Figure out 4-2 to be undirected because most roads betwixt cities are bidirectional. We'd get slightly different results if we evaluated the graph as oriented because of the small numerate of one-path streets, but the whole approach remains similar. Nonetheless, both Spark and Neo4j operate directed graphs. In cases like this where we want to work with undirected graphs (e.g., bidirectional roads), there is an easy way to achieve that:

  • For Dame Muriel Spark, we'll make over two relationships for each row in transport-relationships.csv—i going from dst to src and one from src to dst.

  • For Neo4j, we'll produce a single human relationship so brush off the relationship direction when we run the algorithms.

Having implied those undersized modelling workarounds, we can at present senesce with loading graphs into Discharge and Neo4j from the example CSV files.

Importing the Data into Apache Spark

Starting with Actuate, we'll prime import the packages we ask from Spark and the GraphFrames package:

              from              pyspark.sql.types              import              *              from              graphframes              import              *            

The chase function creates a GraphFrame from the example CSV files:

              def              create_transport_graph              ():              node_fields              =              [              StructField              (              "Idaho"              ,              StringType              (),              True              ),              StructField              (              "latitude"              ,              FloatType              (),              True              ),              StructField              (              "longitude"              ,              FloatType              (),              Straight              ),              StructField              (              "population"              ,              IntegerType              (),              True              )              ]              nodes              =              spark              .              interpret              .              csv              (              "data/enthrall-nodes.csv"              ,              cope              =              True              ,              outline              =              StructType              (              node_fields              ))              rels              =              spark              .              read              .              csv              (              "data/transport-relationships.csv"              ,              cope              =              True              )              reversed_rels              =              (              rels              .              withColumn              (              "newSrc"              ,              rels              .              dst              )              .              withColumn              (              "newDst"              ,              rels              .              src              )              .              drop              (              "dst"              ,              "src"              )              .              withColumnRenamed              (              "newSrc"              ,              "src"              )              .              withColumnRenamed              (              "newDst"              ,              "dst"              )              .              blue-ribbon              (              "src"              ,              "dst"              ,              "relationship"              ,              "cost"              ))              relationships              =              rels              .              union              (              reversed_rels              )              return              GraphFrame              (              nodes              ,              relationships              )            

Loading the nodes is easy, but for the relationships we need to do a trifle preprocessing and so that we prat create from each one relationship twice.

Now let's call that routine:

              g              =              create_transport_graph              ()            

Importation the Data into Neo4j

Now for Neo4j. We'll start by creating a database that we'll use for the examples in this chapter:

              :              u              s              e                                          s              y              s              t              e              m              ;                                          1                                          CREATE                            D              A              T              A              B              A              S              E                                          c              h              a              p              t              e              r              4              ;                                          2                                          :              u              s              e                                          c              h              a              p              t              e              r              4              ;                                          3            
1

Switch to the system database.

2

Create a brand-new database with the name chapter4. This surgical operation is unsynchronized so we may have to time lag a span of seconds before switching to the database.

3

Switch to the chapter4 database.

Now let's load the nodes:

              WITH                                          'https://github.com/neo4j-graph-analytics/book/raw/master/data/'                                          AS                            base              WITH                            unethical +                            'transport-nodes.csv'                                          Atomic number 3                            uri              Consignment CSV                            WITH                            HEADERS FROM uri                            AS                            row              MERGE (post:Place {              I.D.              :row.              id              })              SET                            position.parallel = toFloat(row.parallel),                              aim.longitude = toFloat(wrangle.longitude),                              place.population = toInteger(row.population);            

And now the relationships:

              WITH                                          'https://github.com/neo4j-graph-analytics/book/raw/master/data/'                                          AS                            base              WITH                            base +                            'transport-relationships.csv'                                          AS                            uri              LOAD CSV                            WITH                            HEADERS FROM uri                            AS                            row              MATCH                              (origin:Place {              id              : row.src})              Jibe                              (destination:Place {              id              : row.dst})              Unite (origin)-[:EROAD {distance: toInteger(row.cost)}]->(goal);            

Although we're storing directed relationships, we'll discount the direction when we fulfill algorithms later o in the chapter.

Breadth First Search

Breadth First Search (BFS) is one of the fundamental graphical record traverse algorithms. It starts from a Chosen node and explores all of its neighbors at matchless hop away before visiting all the neighbors at two hops away, etcetera.

The algorithm was first off published in 1959 by Duke of Windsor F. Moore, WHO used it to find the shortest course out of a maze. It was then developed into a wire routing algorithm by C. Y. Lee in 1961, as described in "An Algorithmic program for Path Connections and Its Applications".

BFS is virtually commonly used American Samoa the basis for other more goal-oriented algorithms. For example, Shortest Path, Associated Components, and Closeness Centrality complete use the BFS algorithmic program. It arse also be used to determine the shortest course between nodes.

Frame 4-3 shows the order in which we would visit the nodes of our transfer graphical record if we were performing a breadth first search that started from the European nation city, Hideout Haag (in English, The Hague). The numbers next to City of London name indicate the order in which each node is visited.

gral rr 0403

Physical body 4-3. Comprehensiveness First-class honours degree Search protrusive from Den Haag. Node numbers indicate the order traversed.

We first visit whol of Den Haag's direct neighbors, in front visiting their neighbors, and their neighbors' neighbors, until we've scat out of relationships to traverse.

Breadth First Search with Apache Spark

Spark's execution of the Breadth First Search algorithm finds the shortest path between two nodes aside the number of relationships (i.e., hops) 'tween them. You behind expressly name your object node operating theatre add criteria to be met.

For exemplar, we can use the bfs function to determine the first medium-sized (by Continent standards) metropolis that has a population of betwixt 100,000 and 300,000 people. Let's first tally which places have a population matching those criteria:

              (              g              .              vertices              .              filter              (              "population > 100000 and population < 300000"              )              .              class              (              "universe"              )              .              show              ())            

This is the output we'll experience:

ID latitude longitude population

Colchester

51.88921

0.90421

104390

Ipswich

52.05917

1.15545

133384

In that location are only two places matching our criteria, and we'd expect to reach Ipswich first based on a largeness freshman search.

The following code finds the shortest path from Den Haag to a medium-size metropolis:

              from_expr              =              "id='The Hague'"              to_expr              =              "population > 100000 and universe < 300000 and id <> 'Den Haag'"              ensue              =              g              .              bfs              (              from_expr              ,              to_expr              )            

result contains columns that describe the nodes and relationships between the two cities. We can run the following code to visualise the listing of columns returned:

              print              (              result              .              columns              )            

This is the output we'll see:

['from', 'e0', 'v1', 'e1', 'v2', 'e2', 'to']

Columns beginning with e represent relationships (edges) and columns beginning with v represent nodes (vertices). We'rhenium only interested in the nodes, so let's filter out any columns that begin with e from the resulting DataFrame:

              columns              =              [              column              for              chromatography column              in              result              .              columns              if              not              column              .              startswith              (              "e"              )]              result              .              select              (              columns              )              .              show              ()            

If we run the code in pyspark we'll see this output:

from v1 v2 to

[Den Haag, 52.078…

[Hoek van Holland…

[Felixstowe, 51.9…

[Ipswich, 52.0591…

Every bit expected, the bfs algorithmic program returns Ipswich! Remember that this operate is satisfied when it finds the first match, and as you can watch in Estimate 4-3, Ipswich is evaluated before Colchester.

Depth First Search

Depth Primary Hunt (DFS) is the other fundamental graph traversal algorithm. IT starts from a chosen node, picks cardinal of its neighbors, and then traverses as far as it can on that path in front backtracking.

DFS was originally made-up by French mathematician Charles Pierre Trémaux as a strategy for resolution mazes. Information technology provides a useful tool to sham workable paths for scenario modeling. Trope 4-4 shows the order in which we would visit the nodes of our transport chart if we were performing a DFS that started from The Hague.

gral rr 0404

Figure 4-4. Deepness First Search starting from The Hague. Node numbers game indicate the Order traversed.

Notice how different the node order is compared to BFS. For this DFS, we start by traversing from Den Haag to Amsterdam, and are then able-bodied to rile every other node in the graph without needing to backtrack at all!

We bottom see how explore algorithms lay the groundwork for poignant finished graphs. Now let's aspect at the pathfinding algorithms that find the cheapest path in terms of the number of hops Oregon weight. Weights can be anything measured, such as sentence, aloofness, electrical capacity, Oregon cost.

Shortest Course

The Shortest Path algorithm calculates the shortest (weighted) path between a pair of nodes. It's useful for user interactions and slashing workflows because information technology works in real time.

Pathfinding has a history dating back to the 19th centred and is considered to be a classic chart problem. It gained prominence in the early 1950s in the context of use of alternate routing; that is, finding the second-shortest route if the shortest route is blocked. In 1956, Edsger Dijkstra created the best-known of these algorithms.

Dijkstra's Shortest Path algorithm operates by first finding the worst-weight relationship from the start node to directly connected nodes. It keeps track of those weights and moves to the "closest" node. IT and then performs the same calculation, but now as a accumulative total from the start node. The algorithmic program continues to do this, evaluating a "wave" of cumulative weights and always choosing the last weighted cumulative track to advance along, until IT reaches the finish node.

Note

You'll notice in graph analytics the use of the footing weight, cost, length, and hop when describing relationships and paths. "Weight" is the numeric value of a especial place of a relationship. "Cost" is used similarly, but we'll see it to a greater extent often when considering the total system of weights of a path.

"Outdistance" is often used within an algorithm as the name of the family relationship property that indicates the cost of traversing between a pair of nodes. It's not required that this be an actual energetic measure of distance. "Hop" is commonly used to express the number of relationships between two nodes. You may see some of these terms combined, A in "It's a five-hop distance to London" Beaver State "That's the lowest cost for the outdistance."

When Should I Use Shortest Way of life?

Use Shortest Way to determine optimal routes between a pair of nodes, based on either the routine of hops OR any weighted relationship value. For example, it can cater real-clock time answers about degrees of detachment, the shortest distance between points, or the to the lowest degree expensive road. You prat also use this algorithm to only explore the connections between particular nodes.

Example use cases include:

  • Finding directions betwixt locations. Web-mapping tools such as Google Maps use the Shortest Path algorithm, or a close variant, to provide driving directions.

  • Finding the degrees of separation between people in gregarious networks. For example, when you view someone's profile on LinkedIn, it will indicate how many people tell you in the graph, besides as listing your mutual connections.

  • Determination the number of degrees of detachment between an actor and Kevin Bacon based on the movies they've appeared in (the Baron Verulam Number). An example of this can cost seen on the Oracle of Bacon website. The Erdös Number Jut provides a standardized graph analysis settled on collaborationism with Paul Erdös, one of the most prolific mathematicians of the twentieth century.

Tip

Dijkstra's algorithm does not support negative weights. The algorithm assumes that adding a relationship to a path can never make a path shorter—an invariant that would be desecrated with disconfirming weights.

Shortest Path with Neo4j

The Neo4j Graph Information Science library has a inherent procedure that we can manipulation to cipher both unweighted and weighted shortest paths. Let's first learn how to compute unweighted shortest paths.

Neo4j's Shortest Path algorithm takes in a config map with the following keys:

startNode

The node where our shortest way of life search begins.

endNode

The lymph gland where our shortest path search ends.

nodeProjection

Enables the mapping of specific kinds of nodes into the in-memory chart. We bathroom declare one surgery more node labels.

relationshipProjection

Enables the mapping of relationship types into the in-memory graph. We can declare one or many relationship types on with direction and properties.

relationshipWeightProperty

The relationship property that indicates the monetary value of traversing between a dua of nodes. The cost is the number of kilometers between two locations.

To have Neo4j's Shortest Path algorithmic rule ignore weights we won't set the relationshipWeightProperty key. The algorithmic program will then assume a nonremittal weight of 1.0 for each kinship.

The following query computes the unweighted shortest course from Amsterdam to London:

              MATCH                              (source:Place {              id              :                            "Dutch capital"              }),                              (destination:Place {              id              :                            "London"              })              CALL gds.alpha.shortestPath.swarm({                              startNode: source,                              endNode: destination,                              nodeProjection:                            "*"              ,                              relationshipProjection: {                                          altogether              : {                                          type              :                            "*"              ,                              orientation:                            "UNDIRECTED"                                            }                              }              })              YIELD nodeId, cost              RETURN                            gds.util.asNode(nodeId).              Gem State                            AS                            place, cost;            

Therein query we are passing nodeProjection: "*", which means that all node labels should be considered. The relationshipProjection is a chip more complicated. We're using the advanced constellation mode, which enables a more flexible definition of the relationship types to consider during the traverse. Let's break polish the values that we passed in:

typewrite: "*"

All relationship types should be considered.

preference: "UNDIRECTED"

Each relationship in the underlying graph is projected in both directions.

Note

More elaborate documentation about thickening and human relationship projections can be found in the Native Protrusion chapter of the Graph Information Skill drug user manual.

This query returns the following output:

place cost

Amsterdam

0.0

Immingham

1.0

Doncaster

2.0

John Griffith Chaney

3.0

Here, the cost is the additive total for relationships (Beaver State hops). This is the same path as we go through using Breadth First Search in Spark.

We could even compute the total outstrip of following this way by writing a trifle of postprocessing Figure out. The following process calculates the shortest unweighted path and then whole kit out what the actual be of that route would be:

              Catch                              (author:Property {              id              :                            "Amsterdam"              }),                              (destination:Locate {              id              :                            "London"              })              CALL gds.alpha.shortestPath.stream({                              startNode: source,                              endNode: destination,                              nodeProjection:                            "*"              ,                              relationshipProjection: {                                          all              : {                                          type              :                            "*"              ,                              predilection:                            "Planless"                                            }                              }              })              YIELD nodeId, cost              WITH                            collect              (gds.util.asNode(nodeId))                            AS                            path              UNWIND                            range              (0, size(path)-1)                            A                            index              WITH                            path[indicant]                            AS                            current, path[index+1]                            AS                            adjacent              WITH                            current, next, [(current)-[r:EROAD]-(next) | r.distance][0]                            Equally                            outdistance              WITH                            collect              ({up-to-the-minute: modern, incoming:next, distance: distance})                            Atomic number 3                            boodle              UNWIND                            range              (0, size of it(Newmarket)-1)                            AS                            indicant              WITH                            stops[index]                            AS                            emplacemen, Michigan, index              Come back                            location.current.              id                            AS                            position,                                          cut down              (acc=0.0,                              length                            in                              [full stop                            in                            stops[0..index] | ba.distance] |                              Air Combat Command + distance)                            Arsenic                            cost;            

If the preceding encipher feels a bit unwieldy, note that the tricky part is figuring out how to massage the data to let in the cost over the undiversified journey. This is helpful to livelihood in mind when we penury the additive path cost.

The query returns the following result:

place be

Amsterdam

0.0

Immingham

369.0

Doncaster

443.0

British capital

720.0

Human body 4-6 shows the unweighted shortest path from Amsterdam to London, routing us through the fewest number of cities. IT has a total cost of 720 km.

gral rr 0406

Figure 4-6. The unweighted shortest course betwixt Amsterdam and Jack London

Choosing a route with the fewest number of nodes visited might exist very functional in situations so much as subway systems, where little stops are highly desirable. However, in a driving scenario, we'atomic number 75 probably more involved in the sum up toll using the shortest weighted course.

Shortest Path (Weighted) with Neo4j

We can execute the Weighted Shortest Itinerary algorithmic program to find the shortest itinerary between Amsterdam and London alike this:

              MATCH                              (source:Place {              id              :                            "Capital of The Netherlands"              }),                              (address:Put up {              id              :                            "London"              })              Call in gds.of import.shortestPath.stream({                              startNode: source,                              endNode: destination,                              nodeProjection:                            "*"              ,                              relationshipProjection: {                                          all              : {                                          type              :                            "*"              ,                              properties:                            "distance"              ,                              orientation course:                            "UNDIRECTED"                                            }                              },                              relationshipWeightProperty:                            "distance"                            })              YIELD nodeId, cost              RETURN                            gds.util.asNode(nodeId).              id                            AS                            place, cost;            

We are now going the facultative relationshipWeightProperty, which is the name of the relationship property that indicates the monetary value of traversing 'tween a pair of nodes.

The cost is the number of kilometers between two locations. The query returns the following result:

place cost

Amsterdam

0.0

Den Haag

59.0

Hoek van Holland

86.0

Felixstowe

293.0

Ipswich

315.0

Colchester

347.0

London

453.0

The fastest route takes America via Hideout Haag, Hoek vanguard Holland, Felixstowe, Ipswich, and Colchester! The cost shown is the cumulative total every bit we progress through the cities. First we go from Amsterdam to Den Haag, at a cost of 59. Past we go from Den Haag to Hook of Holland, at a accumulative cost of 86—and so happening. Last, we arrive in John Griffith Chaney, from Colchester, for a total cost of 453 km.

Remember that the unweighted shortest path had a total cost of 720 km, so we've been able to save 267 km by taking weights into answer for when computing the shortest path.

Shortest Path (Weighted) with Apache Spark

In the Largeness First Search with Apache Spark section we educated how to find the shortest course between deuce nodes. That shortest path was based on hops and therefore isn't the same as the shortest weighted path, which would tell us the shortest sum distance between cities.

If we want to find the shortest weighted path (in this case, space) we need to use the price property, which is used for various types of weighting. This option is not available out of the box with GraphFrames, so we need to publish our personal version of Weighted Shortest Path using its aggregateMessages framework. Most of our algorithm examples for Muriel Sarah Spark use the simpler process of calling on algorithms from the library, but we give birth the option of writing our own functions. More info on aggregateMessages tail live found in the "Message passing via AggregateMessages" section of the GraphFrames user guide.

Tip

When available, we commend leveraging antecedent, tested libraries. Writing our own functions, especially for to a greater extent complex algorithms, requires a deeper understanding of our data and calculations.

The following illustration should be treated as a reference execution, and would postulate to constitute optimized before running happening a larger dataset. Those that aren't interested on paper their own functions can skip this example.

Before we create our role, we'll importee some libraries that we'll use:

              from              graphframes.lib              meaning              AggregateMessages              as              AM              from              pyspark.sql              import              functions              as              F            

The aggregateMessages module is part of the GraphFrames library and contains some useful helper functions.

Now let's write our function. We first off create a user-defined social function that we'll use to build the paths between our source and destination:

              add_path_udf              =              F              .              udf              (              lambda              path              ,              id              :              path              +              [              id              ],              ArrayType              (              StringType              ()))            

And at once for the main routine, which calculates the shortest path starting from an origin and returns as shortly atomic number 3 the destination has been visited:

              def              shortest_path              (              g              ,              origin              ,              destination              ,              column_name              =              "cost"              ):              if              g              .              vertices              .              filter              (              g              .              vertices              .              id              ==              destination              )              .              count              ()              ==              0              :              return              (              spark              .              createDataFrame              (              sc              .              emptyRDD              (),              g              .              vertices              .              schema              )              .              withColumn              (              "path"              ,              F              .              set out              ()))              vertices              =              (              g              .              vertices              .              withColumn              (              "visited"              ,              F              .              lit              (              False              ))              .              withColumn              (              "distance"              ,              F              .              when              (              g              .              vertices              [              "I.D."              ]              ==              origin              ,              0              )              .              otherwise              (              float              (              "inf"              )))              .              withColumn              (              "route"              ,              F              .              range              ()))              cached_vertices              =              AM              .              getCachedDataFrame              (              vertices              )              g2              =              GraphFrame              (              cached_vertices              ,              g              .              edges              )              while              g2              .              vertices              .              strain              (              'visited == False'              )              .              first              ():              current_node_id              =              g2              .              vertices              .              filter              (              'visited == False'              )              .              variety              (              "length"              )              .              first              ()              .              id              msg_distance              =              AM              .              edge              [              column_name              ]              +              AM              .              src              [              'distance'              ]              msg_path              =              add_path_udf              (              AM              .              src              [              "path"              ],              AM              .              src              [              "id"              ])              msg_for_dst              =              F              .              when              (              AM              .              src              [              'id'              ]              ==              current_node_id              ,              F              .              struct              (              msg_distance              ,              msg_path              ))              new_distances              =              g2              .              aggregateMessages              (              F              .              min              (              AM              .              msg              )              .              alias              (              "aggMess"              ),              sendToDst              =              msg_for_dst              )              new_visited_col              =              F              .              when              (              g2              .              vertices              .              visited              |              (              g2              .              vertices              .              Idaho              ==              current_node_id              ),              On-key              )              .              otherwise              (              False              )              new_distance_col              =              F              .              when              (              new_distances              [              "aggMess"              ]              .              isNotNull              ()              &              (              new_distances              .              aggMess              [              "col1"              ]              <              g2              .              vertices              .              distance              ),              new_distances              .              aggMess              [              "col1"              ])              .              other              (              g2              .              vertices              .              distance              )              new_path_col              =              F              .              when              (              new_distances              [              "aggMess"              ]              .              isNotNull              ()              &              (              new_distances              .              aggMess              [              "col1"              ]              <              g2              .              vertices              .              distance              ),              new_distances              .              aggMess              [              "col2"              ]              .              cast              (              "raiment<string>"              ))              .              otherwise              (              g2              .              vertices              .              path              )              new_vertices              =              (              g2              .              vertices              .              junction              (              new_distances              ,              on              =              "id"              ,              how              =              "left_outer"              )              .              drop              (              new_distances              [              "id"              ])              .              withColumn              (              "visited"              ,              new_visited_col              )              .              withColumn              (              "newDistance"              ,              new_distance_col              )              .              withColumn              (              "newPath"              ,              new_path_col              )              .              drop              (              "aggMess"              ,              "distance"              ,              "itinerary"              )              .              withColumnRenamed              (              'newDistance'              ,              'length'              )              .              withColumnRenamed              (              'newPath'              ,              'path'              ))              cached_new_vertices              =              AM              .              getCachedDataFrame              (              new_vertices              )              g2              =              GraphFrame              (              cached_new_vertices              ,              g2              .              edges              )              if              g2              .              vertices              .              filter              (              g2              .              vertices              .              id              ==              terminus              )              .              first              ()              .              visited              :              return              (              g2              .              vertices              .              percolate              (              g2              .              vertices              .              id              ==              destination              )              .              withColumn              (              "newPath"              ,              add_path_udf              (              "track"              ,              "id"              ))              .              drop              (              "visited"              ,              "way"              )              .              withColumnRenamed              (              "newPath"              ,              "way"              ))              retrovert              (              spark              .              createDataFrame              (              sc              .              emptyRDD              (),              g              .              vertices              .              schema              )              .              withColumn              (              "path"              ,              F              .              array              ()))            
Dissuasive

If we stash awa references to whatsoever DataFrames in our functions, we need to hoard them using the AM.getCachedDataFrame function or we'll meet a memory leak during execution. In the shortest_path function we employment this function to cache the vertices and new_vertices DataFrames.

If we wanted to determine the shortest way between Amsterdam and Colchester we could call that function like so:

              result              =              shortest_path              (              g              ,              "Amsterdam"              ,              "Colchester"              ,              "cost"              )              consequence              .              select              (              "id"              ,              "distance"              ,              "path"              )              .              show              (              truncate              =              False              )            

which would return the following result:

id distance path

Colchester

347.0

[Amsterdam, Den Haag, Hoek van Holland, Felixstowe, Ipswich, Colchester]

The total distance of the shortest path between Amsterdam and Colchester is 347 km and takes us via Den Haag, Hoek van Holland, Felixstowe, and Ipswich. Away counterpoint, the shortest path in terms of number of relationships between the locations, which we worked out with the Breadth First Hunt algorithmic rule (relate back to Figure 4-4), would call for us via Immingham, Doncaster, and London.

Shortest Course Variation: A*

The A* Shortest Path algorithm improves on Dijkstra's by finding shortest paths more quickly. It does this by allowing the cellular inclusion of extra information that the algorithm can wont, as part of a heuristic function, when determining which paths to research following.

The algorithm was invented aside Peter Lorenz Hart, Nils Marta Brigit Nilsson, and Bertram Raffaello Sanzio and described in their 1968 paper "A Formal Basis for the Heuristic Determination of Minimum Cost Paths".

The A* algorithmic program operates by determining which of its partial paths to prosper at to each one looping of its main loop. It does then based on an estimate of the be (heuristic rule) still liberal to get to the goal node.

Dissuasive

Be thoughtful in the heuristic exploited to estimate way costs. Underestimating path costs May unnecessarily include some paths that could have been eliminated, but the results will still be accurate. However, if the heuristic program overestimates route costs, it English hawthorn skip over real shorter paths (incorrectly estimated to exist longer) that should take over been evaluated, which can precede to inaccurate results.

A* selects the way of life that minimizes the following function:

`f(n) = g(n) + h(n)`

where:

  • g(n) is the cost of the path from the terminus a quo to node n.

  • h(n) is the estimated cost of the way from knob n to the destination node, arsenic computed by a heuristic.

Mark

In Neo4j's implementation, geospatial distance is used Eastern Samoa the heuristic program. In our example transportation dataset we use the latitude and longitude of each location as split of the heuristic function.

A* with Neo4j

Neo4j's A* algorithm takes in a config map with the followers keys:

startNode

The node where our shortest route search begins.

endNode

The node where our shortest route search ends.

nodeProjection

Enables the mapping of specific kinds of nodes into the in-memory graphical record. We can declare one operating room Sir Thomas More node labels.

relationshipProjection

Enables the correspondence of human relationship types into the in-computer memory graph. We can declare one or more relationship types on with direction and properties.

relationshipWeightProperty

The family relationship property that indicates the cost of traversing between a twosome of nodes. The cost is the phone number of kilometers between two locations.

propertyKeyLat

The appoint of the node property used to stage the parallel of each node as part of the geospatial heuristic calculation.

propertyKeyLon

The name of the node property used to represent the longitude of each lymph node A part of the geospatial heuristic calculation.

The following query executes the A* algorithmic program to retrieve the shortest path between The Hague and British capital:

                Tally                                  (source:Put over {                id                :                                "The Hague"                }),                                  (destination:Space {                id                :                                "Jack London"                })                CALL gds.exploratory.shortestPath.astar.stream({                                  startNode: author,                                  endNode: destination,                                  nodeProjection:                                "*"                ,                                  relationshipProjection: {                                                all                : {                                                typewrite                :                                "*"                ,                                  properties:                                "outdistance"                ,                                  predilection:                                "UNDIRECTED"                                                  }                                  },                                  relationshipWeightProperty:                                "distance"                ,                                  propertyKeyLat:                                "latitude"                ,                                  propertyKeyLon:                                "longitude"                                })                YIELD nodeId, cost                RETURN                                gds.util.asNode(nodeId).                Gem State                                AS                                place, cost;              

Squirting this procedure gives the following upshot:

place cost

Den Haag

0.0

Hoek van Holland

27.0

Felixstowe

234.0

Ipswich

256.0

Colchester

288.0

London

394.0

We'd sustain the same result using the Shortest Way algorithmic program, simply connected Thomas More mazy datasets the A* algorithm will be faster as it evaluates fewer paths.

Shortest Path Variation: Yen's k-Shortest Paths

Languish's k-Shortest Paths algorithmic program is similar to the Shortest Path algorithm, simply rather than finding sporty the shortest way between two pairs of nodes, it also calculates the second shortest path, third gear shortest path, and then on aweigh to k-1 deviations of shortest paths.

Jin Y. Yen fancied the algorithm in 1971 and described information technology in "Determination the K Shortest Loopless Paths in a Network". This algorithm is useful for getting choice paths when finding the absolute shortest path isn't our only goal. Information technology can be particularly helpful when we need more than one backup plan!

Yen's with Neo4j

The Yen's algorithm takes in a config mapping with the shadowing keys:

startNode

The knob where our shortest route hunting begins.

endNode

The node where our shortest path search ends.

nodeProjection

Enables the mapping of taxon kinds of nodes into the in-memory graph. We behind announce one or Thomas More node labels.

relationshipProjection

Enables the mapping of family relationship types into the in-retentiveness graphical record. We hind end declare uncomparable or more relationship types along with direction and properties.

relationshipWeightProperty

The relationship property that indicates the cost of traversing 'tween a partner off of nodes. The cost is the add up of kilometers between two locations.

k

The maximum number of shortest paths to see.

The following query executes Yen's algorithmic rule to bump the shortest paths between Gouda and Felixstowe:

                MATCH                                  (                part with                :Target {                id                :                "Gouda cheese"                }),                                  (end:Place {                id                :                "Felixstowe"                })                CALL gds.alpha.kShortestPaths.stream({                                  startNode:                                start                ,                                  endNode: end,                                  nodeProjection:                                "*"                ,                                  relationshipProjection: {                                                all                : {                                                type                :                                "*"                ,                                  properties:                                "outstrip"                ,                                  preference:                                "UNDIRECTED"                                                  }                                  },                                  relationshipWeightProperty:                                "distance"                ,                                  k: 5                })                YIELD index, sourceNodeId, targetNodeId, nodeIds, costs, path                RETURN                                index,                                  [                node in                                gds.util.asNodes(nodeIds[1..-1]) |                                node                .                Idaho                ]                                AS                                via,                                                thin                (acc=0.0, cost                                in                                costs | acc + cost)                                AS                                totalCost;              

After we amaze back the shortest paths, we look up the associated node for from each one node ID using the gds.util.asNodes function, and then filter out the start and end nodes from the resulting collection. We also calculate the total cost for each path past summing the returned costs.

Running this procedure gives the following consequence:

index via totalCost

0

[Rotterdam, Hoek van Holland]

265.0

1

[The Hague, Hoek van Holland]

266.0

2

[Rotterdam, Hideout Haag, Hook of Holland]

285.0

3

['s Gravenhage, Rotterdam, Hoek vanguard Holland]

298.0

4

[Utrecht, Amsterdam, Den Haag, Hoek van Holland]

374.0

Figure 4-7 shows the shortest path between Gouda and Felixstowe.

gral 0407

Figure 4-7. The shortest path between Gouda and Felixstowe

The shortest path in Figure 4-7 is interesting in comparison to the results ordered by entire cost. Information technology illustrates that sometimes you may want to consider single shortest paths surgery other parameters. In this example, the second-shortest itinerary is only 1 km longer than the shortest one. If we choose the scenery, we might choose the slightly longer route.

All Pairs Shortest Path

The Entirely Pairs Shortest Way of life (APSP) algorithm calculates the shortest (heavy) path between all pairs of nodes. IT's more efficient than running the Single Source Shortest Path algorithmic rule for every pair of nodes in the graph.

APSP optimizes trading operations by keeping track of the distances calculated so far and running on nodes in parallel. Those known distances can then comprise reused when calculating the shortest path to an unseen node. You can follow the instance in the next section to get a better understanding of how the algorithm works.

Note

Some pairs of nodes might non be approachable from each other, which means that there is no shortest path between these nodes. The algorithmic program doesn't regaining distances for these pairs of nodes.

A Closer Deal Every last Pairs Shortest Path

The reckoning for APSP is easiest to understand when you follow a sequence of operations. The plot in Figure 4-8 walks through the steps for node A.

gral rr 0408

Figure 4-8. The steps to calculate the shortest path from client A to altogether other nodes, with updates shadowy.

At first the algorithmic rule assumes an numberless distance to all nodes. When a start node is chosen, then the distance to that node is set to 0. The figuring then proceeds as follows:

  1. From start node A we evaluate the price of running to the nodes we can reach and update those values. Looking for the smallest value, we have a choice of B (price of 3) or C (cost of 1). C is selected for the next phase of traversal.

  2. Now from node C, the algorithm updates the accumulative distances from A to nodes that can be reached directly from C. Values are only updated when a lower cost has been found:

    A=0, B=3, C=1, D=8, E=∞
  3. Then B is selected as the next closest node that hasn't already been visited. It has relationships to nodes A, D, and E. The algorithm works exterior the distance to those nodes by summing the distance from A to B with the distance from B to each of those nodes. Note that the lowest cost from the start node A to the prevalent node is always preserved as a sunk toll. The distance (d) computation results:

    d(A,A) = d(A,B) + d(B,A) = 3 + 3 = 6 d(A,D) = d(A,B) + d(B,D) = 3 + 3 = 6 d(A,E) = d(A,B) + d(B,E) = 3 + 1 = 4
    • In this step the distance from node A to B and back to A, shown as d(A,A) = 6, is greater than the shortest distance already computed (0), so its value is not updated.

    • The distances for nodes D (6) and E (4) are less than the previously measured distances, so their values are updated.

  4. E is elect next. Only the cumulative total for reaching D (5) is straightaway lower, and therefore it is the only when one updated.

  5. When D is finally evaluated, at that place are no new stripped path weights; nothing is updated, and the algorithm terminates.

Tilt

Even though the All Pairs Shortest Path algorithmic program is optimized to run calculations in comparable for each node, this fanny soundless add upward for a very life-sized chart. Consider using a subgraph if you merely need to evaluate paths between a subcategory of nodes.

When Should I Use All Pairs Shortest Path?

All Pairs Shortest Route is unremarkably used for reason alternate routing when the shortest route is blocked or becomes suboptimal. E.g., this algorithm is used in logical route planning to ensure the best multiple paths for diversity routing. Use All Pairs Shortest Path when you pauperization to consider all possible routes between complete or about of your nodes.

Example use cases include:

  • Optimizing the location of urban facilities and the distribution of goods. One example of this is determining the traffic load awaited connected different segments of a transportation grid. For to a greater extent data, see R. C. Larson and A. R. Odoni's book, Urban Operations Research (Prentice-Hall).

  • Determination a network with maximum bandwidth and minimal response time as part of a data center design algorithmic rule. In that respect are more details about this approach in the report "REWIRE: An Optimization-Based Framework for Data Centerfield Net Design", aside A. R. Curtis et al.

All Pairs Shortest Path with Apache Electric arc

Spark's shortestPaths function is designed for finding the shortest paths from all nodes to a set of nodes called landmarks. If we sought to find the shortest path from all location to Colchester, Immingham, and Hook of Holland, we would write the next query:

              result              =              g              .              shortestPaths              ([              "Colchester"              ,              "Immingham"              ,              "Hoek vanguard Holland"              ])              result              .              sort              ([              "id"              ])              .              select              (              "ID"              ,              "distances"              )              .              evidenc              (              truncate              =              Mendacious              )            

If we run that code in pyspark we'll discove this output:

id distances

Amsterdam

[Immingham → 1, Hoek van Holland → 2, Colchester → 4]

Colchester

[Colchester → 0, Hoek van Holland → 3, Immingham → 3]

Den Haag

[Hoek van Holland → 1, Immingham → 2, Colchester → 4]

Doncaster

[Immingham → 1, Colchester → 2, Hoek van Holland → 4]

Felixstowe

[Hoek van Holland → 1, Colchester → 2, Immingham → 4]

Gouda

[Hook of Holland → 2, Immingham → 3, Colchester → 5]

Hoek new wave Holland

[Hoek van Holland → 0, Immingham → 3, Colchester → 3]

Immingham

[Immingham → 0, Colchester → 3, Hoek van Holland → 3]

Ipswich

[Colchester → 1, Hoek van Holland → 2, Immingham → 4]

London

[Colchester → 1, Immingham → 2, Hoek vanguard Holland → 4]

Rotterdam

[Hoek van Holland → 1, Immingham → 3, Colchester → 4]

Utrecht

[Immingham → 2, Hoek van Holland → 3, Colchester → 5]

The enumerate incoming to each location in the distances column is the number of relationships (roadstead) betwixt cities we need to traverse to get there from the source node. In our example, Colchester is one of our destination cities and you can see it has 0 nodes to traverse to get to itself but 3 hops to make from Immingham and Hoek vanguard Holland. If we were provision a trip, we could use this data to help maximize our time at our chosen destinations.

Each Pairs Shortest Path with Neo4j

Neo4j has a parallel implementation of the All Pairs Shortest Path algorithmic rule, which returns the distance between all dyad of nodes.

The Totally Pairs Shortest Path algorithm takes in a config map with the following keys:

nodeProjection

Enables the chromosome mapping of specific kinds of nodes into the in-memory graph. We can declare one or more node labels.

relationshipProjection

Enables the mapping of relationship types into the in-memory graph. We derriere declare one or more relationship types along with way and properties.

relationshipWeightProperty

The human relationship property that indicates the cost of traversing between a twain of nodes. The be is the number of kilometers between deuce locations.

If we don't put on relationshipWeightProperty then the algorithmic program will calculate the unweighted shortest paths betwixt all pairs of nodes.

The next query does this:

              CALL gds.alpha.allShortestPaths.stream({                              nodeProjection:                            "*"              ,                              relationshipProjection: {                                          all              : {                                          type              :                            "*"              ,                              properties:                            "distance"              ,                              orientation:                            "Purposeless"                                            }                              }              })              YIELD sourceNodeId, targetNodeId, distance              WHERE                            sourceNodeId < targetNodeId              Recall                            gds.util.asNode(sourceNodeId).              id                            AS                            source,                              gds.util.asNode(targetNodeId).              Gem State                            American Samoa                            objective,                              distance              ORDER BY                            distance                            DESC              Demarcation line                            10;            

This algorithm returns the shortest path between every pair of nodes doubly—once with each of the nodes as the source guest. This would be helpful if you were evaluating a directed graph of same-way streets. However, we don't need to see each path twice, so we filter the results to only keep one of them by using the sourceNodeId < targetNodeId predicate.

The inquiry returns the following results:

source target distance

Colchester

Utrecht

5.0

London

Rotterdam

5.0

London

Gouda

5.0

Ipswich

Utrecht

5.0

Colchester

Gouda

5.0

Colchester

Den Haag

4.0

London

Utrecht

4.0

London

Hideout Haag

4.0

Colchester

Dutch capital

4.0

Ipswich

Gouda

4.0

This end product shows the 10 pairs of locations that have the most relationships between them because we asked for results in descending order (DESC).

If we want to calculate the shortest weighted paths, we should set relationshipWeightProperty to the property distinguish that contains the cost to be used in the shortest path calculation. This property testament and then be evaluated to run out the shortest weighted path between each pair of nodes.

The pursuit query does this:

              CALL gds.alpha.allShortestPaths.stream({                              nodeProjection:                            "*"              ,                              relationshipProjection: {                                          all              : {                                          case              :                            "*"              ,                              properties:                            "distance"              ,                              orientation:                            "UNDIRECTED"                                            }                              },                              relationshipWeightProperty:                            "distance"                            })              YIELD sourceNodeId, targetNodeId, outstrip              WHERE                            sourceNodeId < targetNodeId              RETURN                            gds.util.asNode(sourceNodeId).              id                            AS                            reservoir,                              gds.util.asNode(targetNodeId).              id                            American Samoa                            target,                              distance              ORDER BY                            distance                            DESC              Bound                            10;            

The query returns the pursuit result:

source target distance

Doncaster

Hoek van Holland

529.0

Rotterdam

Doncaster

528.0

Gouda

Doncaster

524.0

Felixstowe

Immingham

511.0

Den Haag

Doncaster

502.0

Ipswich

Immingham

489.0

Utrecht

Doncaster

489.0

London

Utrecht

460.0

Colchester

Immingham

457.0

Immingham

Hook of Holland

455.0

Now we're seeing the 10 pairs of locations furthest from each other in terms of the total distance between them. Placard that Doncaster shows up frequently on with several cities in Netherlands. It looks like it would be a long effort if we wanted to takings a road trip betwixt those areas.

Single Source Shortest Path

The Single Rootage Shortest Path (SSSP) algorithm, which came into gibbousness at or so the aforementioned time as Dijkstra's Shortest Path algorithm, acts as an implementation for some problems.

The SSSP algorithmic rule calculates the shortest (adjusted) track from a root lymph node to all unusual nodes in the chart, as demonstrated in Figure 4-9.

gral rr 0409

Figure 4-9. The steps of the Single Source Shortest Path algorithm

It proceeds A follows:

  1. It begins with a root guest from which all paths will be measured. In Bod 4-9 we've selected node A as the root.

  2. The relationship with the smallest weighting advent from that root node is selected and added to the tree, on with its connected lymph gland. In this case, that's d(A,D)=1.

  3. The adjacent relationship with the smallest cumulative weight from our radical node to whatever unvisited knob is elect and added to the tree diagram in the same way. Our choices in Figure 4-9 are d(A,B)=8, d(A,C)=5 directly or 4 via A-D-C, and d(A,E)=5. So, the route via A-D-C is chosen and C is added to our tree.

  4. The process continues until there are no more nodes to MBD and we have our single source shortest route.

When Should I Use Single Source Shortest Path?

Use up Single Source Shortest Path when you need to evaluate the optimal route from a fixed start point to all other individual nodes. Because the route is chosen based on the total path slant from the root, IT's recyclable for finding the best path to each node, but non necessarily when all nodes demand to be visited in a unmarried trip.

For representative, SSSP is helpful for identifying the main routes to use for emergency services where you don't visit every location on each incident, but not for finding a single route for garbage collection where you need to visit each house in one trip. (In the last mentioned case, you'd use the Minimum Spanning Tree algorithm, covered subsequent.)

Example use cases include:

  • Detecting changes in topology, such as link failures, and suggesting a raw routing structure in seconds

  • Using Dijkstra As an Information processing routing communications protocol for wont in self-reliant systems such as a topical area network (Local area network)

Unwed Source Shortest Path with Apache Spark off

We can adapt the shortest_path operate that we wrote to reckon the shortest itinerary between two locations to rather restitution us the shortest path from one localization to all others. Banknote that we'atomic number 75 using Spark's aggregateMessages framework once again to tailor-make our role.

We'll first import the same libraries as before:

              from              graphframes.lib              import              AggregateMessages              as              AM              from              pyspark.sql              import              functions              as              F            

And we'll use the identical user-defined function to construct paths:

              add_path_udf              =              F              .              udf              (              lambda              path              ,              I.D.              :              path              +              [              I.D.              ],              ArrayType              (              StringType              ()))            

Instantly for the main function, which calculates the shortest path starting from an origin:

              def              sssp              (              g              ,              origin              ,              column_name              =              "cost"              ):              vertices              =              g              .              vertices              \              .              withColumn              (              "visited"              ,              F              .              lit              (              False              ))              \              .              withColumn              (              "distance"              ,              F              .              when              (              g              .              vertices              [              "id"              ]              ==              origin              ,              0              )              .              otherwise              (              float              (              "inf"              )))              \              .              withColumn              (              "path"              ,              F              .              array              ())              cached_vertices              =              AM              .              getCachedDataFrame              (              vertices              )              g2              =              GraphFrame              (              cached_vertices              ,              g              .              edges              )              while              g2              .              vertices              .              filter              (              'visited == False'              )              .              first              ():              current_node_id              =              g2              .              vertices              .              filter out              (              'visited == False'              )              .              sieve              (              "distance"              )              .              premier              ()              .              id              msg_distance              =              AM              .              march              [              column_name              ]              +              AM              .              src              [              'aloofness'              ]              msg_path              =              add_path_udf              (              AM              .              src              [              "course"              ],              AM              .              src              [              "id"              ])              msg_for_dst              =              F              .              when              (              AM              .              src              [              'ID'              ]              ==              current_node_id              ,              F              .              struct              (              msg_distance              ,              msg_path              ))              new_distances              =              g2              .              aggregateMessages              (              F              .              min              (              AM              .              MSG              )              .              alias              (              "aggMess"              ),              sendToDst              =              msg_for_dst              )              new_visited_col              =              F              .              when              (              g2              .              vertices              .              visited              |              (              g2              .              vertices              .              id              ==              current_node_id              ),              True              )              .              otherwise              (              False              )              new_distance_col              =              F              .              when              (              new_distances              [              "aggMess"              ]              .              isNotNull              ()              &A;              (              new_distances              .              aggMess              [              "col1"              ]              <              g2              .              vertices              .              outdistance              ),              new_distances              .              aggMess              [              "col1"              ])              \              .              otherwise              (              g2              .              vertices              .              outdistance              )              new_path_col              =              F              .              when              (              new_distances              [              "aggMess"              ]              .              isNotNull              ()              &adenylic acid;              (              new_distances              .              aggMess              [              "col1"              ]              <              g2              .              vertices              .              length              ),              new_distances              .              aggMess              [              "col2"              ]              .              cast              (              "array<cosmic string>"              ))              \              .              otherwise              (              g2              .              vertices              .              path              )              new_vertices              =              g2              .              vertices              .              join              (              new_distances              ,              on              =              "Idaho"              ,              how              =              "left_outer"              )              \              .              drop              (              new_distances              [              "id"              ])              \              .              withColumn              (              "visited"              ,              new_visited_col              )              \              .              withColumn              (              "newDistance"              ,              new_distance_col              )              \              .              withColumn              (              "newPath"              ,              new_path_col              )              \              .              drop              (              "aggMess"              ,              "distance"              ,              "path"              )              \              .              withColumnRenamed              (              'newDistance'              ,              'distance'              )              \              .              withColumnRenamed              (              'newPath'              ,              'path'              )              cached_new_vertices              =              AM              .              getCachedDataFrame              (              new_vertices              )              g2              =              GraphFrame              (              cached_new_vertices              ,              g2              .              edges              )              return              g2              .              vertices              \              .              withColumn              (              "newPath"              ,              add_path_udf              (              "path"              ,              "id"              ))              \              .              put down              (              "visited"              ,              "path"              )              \              .              withColumnRenamed              (              "newPath"              ,              "path"              )            

If we want to regain the shortest path from Amsterdam to completely other locations we commode call the function suchlike this:

              via_udf              =              F              .              udf              (              lambda              path              :              route              [              1              :              -              1              ],              ArrayType              (              StringType              ()))            
              result              =              sssp              (              g              ,              "Amsterdam"              ,              "cost"              )              (              result              .              withColumn              (              "via"              ,              via_udf              (              "path"              ))              .              prime              (              "id"              ,              "outdistance"              ,              "via"              )              .              sort              (              "distance"              )              .              usher              (              truncate              =              False              ))            

We define some other substance abuser-defined function to filter out the start and remnant nodes from the resulting path. If we ladder that code we'll see the following outturn:

id aloofness via

Amsterdam

0.0

[]

Utrecht

46.0

[]

Lair Haag

59.0

[]

Gouda

81.0

[Utrecht]

Rotterdam

85.0

['s Gravenhage]

Hook of Holland

86.0

[Hideaway Haag]

Felixstowe

293.0

[Den Haag, Hoek van Holland]

Ipswich

315.0

[Den Haag, Hoek van Holland, Felixstowe]

Colchester

347.0

[Den Haag, Hoek van Holland, Felixstowe, Ipswich]

Immingham

369.0

[]

Doncaster

443.0

[Immingham]

London

453.0

[Den Haag, Hoek van Netherlands, Felixstowe, Ipswich, Colchester]

In these results we see the physical distances in kilometers from the root node, Amsterdam, to all other cities in the graph, consistent by shortest distance.

Single Source Shortest Path with Neo4j

Neo4j implements a variation of SSSP, called the Delta-Stepping algorithm that divides Dijkstra's algorithmic rule into a act of phases that can be executed in parallel.

The Single Source Shortest Path algorithm takes in a config map with the following keys:

startNode

The node where our shortest route look for begins.

nodeProjection

Enables the mapping of specific kinds of nodes into the in-memory graph. We ass hold one or many node labels.

relationshipProjection

Enables the chromosome mapping of relationship types into the in-memory graph. We buttocks declare one or Sir Thomas More family relationship types on with direction and properties.

relationshipWeightProperty

The relationship property that indicates the cost of traversing between a partner off of nodes. The monetary value is the figure of kilometers betwixt two locations.

delta

The class of concurrency to use

The following query executes the Delta-Stepping algorithm:

              Compeer                              (n:Place {              id              :              "Capital of the United Kingdom"              })              CALL gds.alpha.shortestPath.deltaStepping.stream({                              startNode: n,                              nodeProjection:                            "*"              ,                              relationshipProjection: {                                          all              : {                                          type              :                            "*"              ,                              properties:                            "distance"              ,                              orientation:                            "UNDIRECTED"                                            }                              },                              relationshipWeightProperty:                            "distance"              ,                              delta: 1.0              })              YIELD nodeId, length              WHERE                            gds.util.isFinite(outstrip)              RETURN                            gds.util.asNode(nodeId).              id                            AS                            destination, distance              ORDER BY                            length;            

The enquiry returns the pursuit yield:

destination distance

Capital of the United Kingdom

0.0

Colchester

106.0

Ipswich

138.0

Felixstowe

160.0

Doncaster

277.0

Immingham

351.0

Hoek van Holland

367.0

Den Haag

394.0

Rotterdam

400.0

Gouda

425.0

Amsterdam

453.0

Utrecht

460.0

In these results we see the physiologic distances in kilometers from the rout client, London, to every other cities in the graph, ordered by shortest distance.

Minimum Spanning Tree diagram

The Minimal (Exercising weight) Spanning Tree algorithm starts from a given node and finds all its approachable nodes and the set of relationships that connect the nodes together with the minimum possible free weight. It traverses to the next unvisited thickening with the lowest slant from any visited lymph gland, avoiding cycles.

The first known Minimum Weight down Spanning Tree algorithm was developed by the Czech scientist Otakar Borůvka in 1926. Mincing's algorithmic rule, fictional in 1957, is the simplest and superior known.

Prim's algorithm is similar to Dijkstra's Shortest Path algorithm, but rather than minimizing the total length of a way ending at each family relationship, it minimizes the length of each relationship individually. Unlike Dijkstra's algorithm, it tolerates veto-weight relationships.

The Token Spanning Tree algorithm operates as demonstrated in Picture 4-10.

gral rr 0410

Figure 4-10. The steps of the Minimum Spanning Tree algorithm

The steps are as follows:

  1. It begins with a tree containing only one node. In Image 4-10 we start with lymph node A.

  2. The relationship with smallest weight coming from that node is selected and added to the tree (on with its connected knob). Therein case, A-D.

  3. This process is repeated, always choosing the minimal-weight relationship that joins any node non already in the tree. If you compare our example here to the SSSP example in Figure 4-9 you'll notice that in the fourth graph the paths become different. This is because SSSP evaluates the shortest way of life supported cumulative totals from the root, whereas Minimal Spanning Shoetree only looks at the cost of the adjacent step.

  4. When there are no nodes to add, the tree is a minimum spanning tree.

There are besides variants of this algorithm that determine the maximum-weight spanning Sir Herbert Beerbohm Tree (highest-cost tree) and the k-spanning tree (tree size limited).

When Should I Use Marginal Spanning Sir Herbert Beerbohm Tree?

Use Stripped-down Spanning Tree when you need the best route to visit whol nodes. Because the route is chosen supported the cost of each next step, it's reusable when you must visit complete nodes in a one-man walk. (Review the old surgical incision along "Single Source Shortest Track" if you don't need a way of life for a single trip.)

You can use this algorithmic rule for optimizing paths for connected systems like piss pipes and circuit design. It's also employed to approximate some problems with unexplored figure times, such as the Traveling Salesman Problem and certain types of rounding problems. Although it Crataegus oxycantha not always find the absolute optimal solution, this algorithmic rule makes potentially complicated and compute-intensive analytic thinking often more approachable.

Example use cases include:

  • Minimizing the trip cost of exploring a commonwealth. "An Diligence of Minimum Spanning Trees to Jaunt Planning" describes how the algorithmic rule analyzed airline and sea connections to do this.

  • Visualizing correlations between currency returns. This is described in "Minimum Spanning Tree Application in the Up-to-dateness Grocery".

  • Tracing the history of transmission transmission in an outbreak. For more than info, see "Use of the Minimum Spanning Tree Model for Molecular Epidemiological Investigation of a Nosocomial Outbreak of Hepatitis C Computer virus Infection".

Warning

The Minimum Spanning Shoetree algorithm only gives meaningful results when run on a graph where the relationships suffer different weights. If the graph has no weights, or all relationships have the same weight, then any spanning tree is a minimum spanning Tree.

Minimum Spanning Tree with Neo4j

Rent's see the Minimum Spanning Tree algorithm in action. The Minimum Spanning Sir Herbert Beerbohm Tree algorithm takes in a config map with the favorable keys:

startNodeId

The id of the node where our shortest course search begins.

nodeProjection

Enables the mapping of specific kinds of nodes into the in-memory graph. We bathroom declare one or Sir Thomas More node labels.

relationshipProjection

Enables the chromosome mapping of relationship types into the in-memory graphical record. We can declare one surgery more kinship types along with direction and properties.

relationshipWeightProperty

The human relationship property that indicates the cost of traversing 'tween a pair of nodes. The cost is the numeral of kilometers between two locations.

writeProperty

The name of the relationship type written back as a result

weightWriteProperty

The name of the weight unit prop on the writeProperty kinship type codified in reply

The following query finds a spanning tree protrusive from Amsterdam:

              MATCH                              (n:Put together {              id              :              "Amsterdam"              })              CALL gds.alpha.spanningTree.minimum.indite({                              startNodeId:                            id              (n),                              nodeProjection:                            "*"              ,                              relationshipProjection: {                              EROAD: {                                          type              :                            "EROAD"              ,                              properties:                            "distance"              ,                              preference:                            "UNDIRECTED"                                            }                              },                              relationshipWeightProperty:                            "distance"              ,                              writeProperty:                            'MINST'              ,                              weightWriteProperty:                            'cost'                            })              YIELD createMillis, computeMillis, writeMillis, effectiveNodeCount              RETURN                            createMillis, computeMillis, writeMillis, effectiveNodeCount;            

The parameters passed to this algorithm are:

Station

The node labels to consider when calculation the spanning Sir Herbert Beerbohm Tree

EROAD

The relationship types to consider when computing the spanning tree

distance

The name of the relationship material possession that indicates the cost of traversing between a pair off of nodes

ID(n)

The internal node ID of the node from which the spanning tree should set out

This question stores its results in the graphical record. If we want to return the stripped free weight spanning tree we throne run the following interrogation:

              MATCH                            way = (n:Office {              Gem State              :              "Amsterdam"              })-[:MINST*]-()              WITH                            relationships              (path)                            Atomic number 3                            rels              Wind off rels                            AS rel              WITH DISTINCT rel AS rel              RETURN                            startNode(              rel              ).              id                            AS                            reference,                              endNode(              rel              ).              id                            AS                            destination,                                          rel              .cost                            American Samoa                            cost;            

And this is the output of the query:

source destination cost

Amsterdam

Utrecht

46.0

Utrecht

Gouda

35.0

Gouda

Rotterdam

25.0

Rotterdam

's Gravenhage

26.0

Lair Haag

Hoek van Holland

27.0

Hoek van Holland

Felixstowe

207.0

Felixstowe

Ipswich

22.0

Ipswich

Colchester

32.0

Colchester

London

106.0

London

Doncaster

277.0

Doncaster

Immingham

74.0

If we were in Amsterdam and sought-after to visit every other place in our dataset during the Lapp trip, Figure 4-11 demonstrates the shortest continuous road to do so.

gral rr 0411

Figure 4-11. A minimum weight spanning tree from Amsterdam

Random Walk

The Ergodic Walk algorithmic program provides a place of nodes on a stochastic path in a chart. The terminal figure was first mentioned past Karl Pearson in 1905 in a letter to Nature magazine publisher titled "The Trouble of the Random Walk of life". Although the concept goes second even further, it's merely more recently that random walks consume been practical to network science.

A haphazard walk, in the main, is sometimes described as being similar to how a drunk person traverses a city. They know what direction or end full stop they want to reach but may take a very circuitous route to get in that respect.

The algorithm starts at one node and somewhat randomly follows unrivalled of the relationships forward Beaver State backward to a neighbor node. It then does the same from that node and so on, until information technology reaches the set path distance. (We say middling randomly because the number of relationships a node has, and its neighbors have, influences the probability a thickening will live walked through.)

When Should I Use Random Walk?

Function the Hit-or-miss Walk around algorithmic program as part of else algorithms operating theatre data pipelines when you need to give a mostly hit-or-miss put down of socially connected nodes.

Example use cases admit:

  • As part of the node2vec and graph2vec algorithms, that create knob embeddings. These node embeddings could then beryllium secondhand every bit the stimulus to a nervous network.

  • As part of the Walktrap and Infomap community detection. If a random walk returns a undersize set of nodes repeatedly, and then information technology indicates that node set whitethorn have a community bodily structure.

  • Arsenic part of the education cognitive operation of machine learning models. This is delineated further in David Mack's article "Look back Prediction with Neo4j and TensorFlow".

You can read about more use cases in a paper by N. Masuda, M. A. Porter, and R. Lambiotte, "Random Walks and Diffusion along Networks".

Random Walk with Neo4j

Neo4j has an effectuation of the Unselected Walk around algorithm. It supports two modes for choosing the close relationship to follow at each represent of the algorithm:

random

Haphazardly chooses a relationship to follow

node2vec

Chooses relationship to follow based happening computing a probability distribution of the previous neighbors

The Random Walk procedure takes in a config map with the following keys:

start

The Idaho of the node where our shortest path search begins.

nodeProjection

Enables the mapping of proper kinds of nodes into the in-memory graph. We throne declare one or more node labels.

relationshipProjection

Enables the mapping of family relationship types into the in-memory graph. We can declare one or more than relationship types along with direction and properties.

walks

The number of paths returned ``

The chase performs a random walk starting from London:

              MATCH                              (source:Place {              id              :                            "London"              })              CALL gds.exploratory.randomWalk.stream({                                          head start              :                            Gem State              (source),                              nodeProjection:                            "*"              ,                              relationshipProjection: {                                          all              : {                                          type              :                            "*"              ,                              properties:                            "distance"              ,                              orientation:                            "UNDIRECTED"                                            }                              },                              steps: 5,                              walks: 1              })              YIELD nodeIds              UNWIND gds.util.asNodes(nodeIds)                            as                            place              RETURN                            place.              ID                            AS                            place            

Information technology returns the following result:

send

Capital of the United Kingdom

Doncaster

Immingham

Dutch capital

Utrecht

Amsterdam

At to each one stage of the random walk the next relationship is chosen randomly. This means that if we rerun the algorithmic rule, even with the same parameters, we likely won't get the same result. Information technology's also possible for a walk to renege on itself, as we bottom see in Figure 4-12 where we go from Dutch capital to Den Haag and cover.

gral 0412

Figure 4-12. A random walk starting from London

Summary

Pathfinding algorithms are useful for understanding the way that our data is connected. In this chapter we started out with the fundamental Largeness and Depth Offse algorithms, before flying onto Dijkstra and other shortest itinerary algorithms. We also looked at variants of the shortest path algorithms optimized for determination the shortest path from one and only node to each other nodes or between all pairs of nodes in a graph. We finished with the Random Walk algorithm, which can atomic number 4 used to find arbitrary sets of paths.

Next we'll learn about Centrality algorithms that can be used to feel powerful nodes in a chart.

Design an Example of a Graph Where the Shortest Path Tree Is Longer Than the Minimum Spanning Tree.

Source: https://www.oreilly.com/library/view/graph-algorithms/9781492047674/ch04.html

0 Response to "Design an Example of a Graph Where the Shortest Path Tree Is Longer Than the Minimum Spanning Tree."

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel