Recursive query example

 

A recursive query is one that is defined by a Union All with an initialization fullselect that seeds the recursion and an iterative fullselect that contains a direct reference to itself in the FROM clause.

There are additional restrictions as to what can be specified in the definition of a recursive query and those restrictions can be found in the SQL Programming. A key restriction is that query functions like grouping, aggregation or distinct that require a materialization of all the qualifying records before performing the function cannot be allowed within the iterative fullselect itself and must be requested in the main query, allowing the recursion to complete.

The following is an example of a recursive query over a table called flights, that contains information about departure and arrival cities. The query returns all the flight destinations available by recursion from the two specified cities (New York and Chicago) and the number of connections and total cost to arrive at that final destination.

Because this example uses the recursion process to also accumulate information like the running cost and number of connections, four values are actually put on the queue entry. These values are:

Typically the data needed for the queue entry is less then the full record (sometimes much less) although that is not the case for this example.

CREATE TABLE flights 
	( 
		departure CHAR (10) NOT NULL WITH DEFAULT,   
		arrival CHAR (10) NOT NULL WITH DEFAULT,  
		carrier CHAR (15) NOT NULL WITH DEFAULT, 
		flight_num CHAR (5) NOT NULL WITH DEFAULT,   
		ticket INT NOT NULL WITH DEFAULT)


WITH destinations (departure, arrival, connects, cost ) AS ( SELECT f.departure,f.arrival, 0, ticket FROM flights f WHERE f.departure = 'Chicago' OR f.departure = 'New York' UNION ALL SELECT r.departure, b.arrival, r.connects + 1, r.cost + b.ticket FROM destinations r, flights b WHERE r.arrival = b.departure )

SELECT DISTINCT departure, arrival, connects, cost FROM destinations

The following is the initialization fullselect of the above query. It seeds the rows that will start the recursion process. It provides the initial destinations (arrival cities) that are a direct flight from Chicago or New York.

SELECT  f.departure,f.arrival, 0, ticket 
	FROM flights f 
	WHERE f.departure='Chicago' OR
      	f.departure='New York'

The following is the iterative fullselect of the above query. It contains a single reference in the FROM clause to the destinations recursive common table expression and will source further recursive joins to the same flights table. The arrival values of the parent row (initially direct flights from New York or Chicago) are joined with the departure value of the subsequent child rows. It is important to identify the correct parent/child relationship on the recursive join predicate or infinite recursion can occur. Other local predicates can also be used to limit the recursion. For example, if you want a limit of at most 3 connecting flights, a local predicate using the accumulating connection count, r.connects<=3, can be specified.

SELECT
  r.departure, b.arrival, r.connects + 1 ,
  r.cost + b.ticket 
  FROM  destinations r, flights b 
	WHERE r.arrival=b.departure

The main query is the query that references the recursive common table expression or view. It is in the main query where requests like grouping, ordering and distinct will be specified.

SELECT DISTINCT departure, arrival, connects, cost 
	FROM destinations 

 

Implementation considerations

To implement a source for the recursion, a new temporary data object is provided called a queue. As rows meet the requirements of either the initialization fullselect or the iterative fullselect and are pulled up through the union all, values necessary to feed the continuing recursion process are captured and placed in an entry on the queue , an enqueue operation. At query runtime, the queue data source then takes the place of the recursive reference in the common table expression or view. The iterative fullselect processing ends when the queue is exhausted of entries or a fetch N rows limitation has been met. Because the recursive queue feeds the recursion process and holds transient data, the join between the dequeue of these queue entries and the rest of the fullselect tables will always be a constrained join, with the queue on the left.

Visual Explain diagram

 

Parent topic:

Recursive query optimization