Sunday, January 06, 2008

New challenges, new synthax

This post wants to be:

1. A quick glance at the new "common table expression" (aka hierarchical queries) in Firebird 2.1
2. A call to action for other opensource databases

So, on with 1., Firebird recently added another great feature, common table expressions (CTE) which, to my eye at least, boils down to hierarchical queries.
This is basically the ability to efficiently and easily query hierarchical data stored in the most intuitive way, the adjacency list, because the nested set (see Celko's articles and book) might be the most efficient, but it misses one important thing, widespread adoption ...

A recursive CTE is basically made of two parts, a base query which gives you the starting point and a recursive query which is the action to be performed in a "loop".
You see that the structure is:

WITH RECURSIVE which tells us that this CTE will use recursion
n(....) AS ( which describes the output of the base query, n is just an alias, you can use whatever name you like
SELECT ... which is the base query and sets the starting point, this query will be merged with results of the recursive query through an union
UNION ALL which marks the merge described above
SELECT ... which is the recursive query, note that it's joined to the base query through it's alias
) which closes the recursive block
SELECT ... which ends up the CTE and builds it's output, note that in the second example I applied a further where condition to this last query, something that might be non optimal.

I'll use an example table which mimics a BOM (Bill Of Materials) taken from the web.

  1. CREATE TABLE BILL_OF_MATERIAL(
  2. PART_NO Varchar(50) NOT NULL,
  3. PARENT_PART_NO Varchar(50),
  4. REVISION CHAR(1),
  5. QUANTITY Integer,
  6. UOM CHAR(2) NOT NULL,
  7. DESCRIPTION Varchar(255),
  8. MAKE_BUY CHAR(1) NOT NULL,
  9. CONSTRAINT PK_BOM PRIMARY KEY (PART_NO)
  10. );
  11. CREATE INDEX IDX_PARENT_PART_NO ON BILL_OF_MATERIAL (PARENT_PART_NO);
  12. GRANT DELETE, INSERT, REFERENCES, SELECT, UPDATE
  13. ON BILL_OF_MATERIAL TO "SYSDBA" WITH GRANT OPTION;

Then I'll populate it with some sample rows, the result should be


A query to retrieve BOM for a specific item:

  1. WITH RECURSIVE n(lvl, part_no, description) AS (SELECT 1 lvl, part_no, description
  2. FROM BILL_OF_MATERIAL
  3. WHERE description = 'Field Adapter'
  4. UNION ALL
  5. SELECT n.lvl + 1, nplus1.part_no, nplus1.description
  6. FROM BILL_OF_MATERIAL AS nplus1, n
  7. WHERE n.part_no = nplus1.parent_part_no)
  8. SELECT lvl, part_no, description FROM n;

A query to retrieve all buy parts for an item:

  1. WITH RECURSIVE n(part_no, description, quantity, make_buy) AS (SELECT part_no, description, quantity, make_buy
  2. FROM bill_of_material
  3. WHERE description = 'Field Adapter'
  4. UNION ALL
  5. SELECT nplus1.part_no, nplus1.description, nplus1.quantity, nplus1.make_buy
  6. FROM BILL_OF_MATERIAL AS nplus1, n
  7. WHERE n.part_no = nplus1.parent_part_no)
  8. SELECT part_no, description, quantity FROM n WHERE make_buy = 'B';

And the result is:



About 2., I'd love to see other mainstream opensource databases run at this, not because MsSQL (standard WITH ... synthax), Oracle (proprietary) and DB2 (standard WITH ... synthax) support it, but because it helps solving real life problems which are becoming more and more common.
I mean, PostgreSQL has a patch floating around which reproduces Oracle's non standard synthax, this patch has been somewhat maintained for a long time, this means that there is active interest, why don't they add it to the codebase? Just for the sake of standards?
And what about MySQL?

As always comments and corrections are more than welcome!

BTW: for comparison, you can find an example about BOM and nested sets here.

UPDATE: Looks like PostgreSQL users and developers felt the hitch, a patch to add CTE has been committed on 04-10-2008 see here.