[DDC-349] Add support for specifying precedence in joins in DQL Created: 18/Feb/10  Updated: 01/May/13

Status: Open
Project: Doctrine 2 - ORM
Component/s: DQL
Affects Version/s: 2.0-ALPHA4
Fix Version/s: None
Security Level: All

Type: Improvement Priority: Major
Reporter: Dennis Verspuij Assignee: Roman S. Borschel
Resolution: Unresolved Votes: 1
Labels: None

Attachments: Text File DDC349Test.patch    
Issue Links:
Duplicate
is duplicated by DDC-1256 Generated SQL error with DQL WITH and... Resolved

 Description   

This request is in followup to my doctrine-user message "Doctrine 2.0: Nested joins'.
I am a bit surprised by the responses in that defining precedences in joins by placing parenthesis around join expressions is not well-known. Although not in the original SQL92 specification it is a major and important feature offered by all the RDBMS's that Doctrine 2 supports, and oftenly performs better than using subselects or alike. Doctrine 1 did not support it, but imho Doctrine 2 should support it to be a mature allround ORM.

As a short example the following is a SQL statement with a nested join, where the nesting is absolutely necessary to return only a's together with either both b's and c's or no b's and c's at all:

SELECT *
FROM a A
LEFT JOIN (
b B
INNER JOIN c C ON C.b_id = B.id
) ON B.a_id = A.id

In order for Doctrine 2 to support this the BNF should be something like:
Join ::= ["LEFT" ["OUTER"] | "INNER"] "JOIN" ( "(" JoinAssociationPathExpression ["AS"] AliasIdentificationVariable Join ")" | JoinAssociationPathExpression ["AS"] AliasIdentificationVariable ) [("ON" | "WITH") ConditionalExpression]
instead of the current:
Join ::= ["LEFT" ["OUTER"] | "INNER"] "JOIN" JoinAssociationPathExpression ["AS"] AliasIdentificationVariable [("ON" | "WITH") ConditionalExpression]

This would allow DQL like:

SELECT A, B, C
FROM a A
LEFT JOIN (
A.b B
INNER JOIN B.c C
) WITH B.something = 'value' AND C.something = 'othervalue'

What further needs to be done is that the DQL parser loosly couples the ConditionalExpression to any of the previously parsed JoinAssociationPathExpression's instead of tieing it explicitely to the JoinAssociationPathExpression that preceedes it according to the old BNF notation. The new BNF should however not require any changes to the hydrator. Therefore I have the feeling that improving the DQL parser for nested joins does not require extensive work, while the benefit of running these kind of queries is considerable.

As an extra substantiation here are links to (BNF) FROM clause documentations of the RDBMS's that Doctrine 2 supports, they all show support for nested joins:
MySQL: http://dev.mysql.com/doc/refman/5.0/en/join.html
PostgreSQL: http://www.postgresql.org/docs/8.4/interactive/sql-select.html#SQL-FROM and http://www.postgresql.org/docs/8.1/interactive/explicit-joins.html
MSSQL: http://msdn.microsoft.com/en-us/library/ms177634.aspx
Oracle: http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/statements_10002.htm#CHDDCHGF
SQLite: http://www.sqlite.org/syntaxdiagrams.html#single-source

I surely hope you will consider implementing this improvement because it would save me and others from the hassle of writing raw SQL queries or executing multiple (thus slow) queries in DQL for doing the same. Thanks anyway for the great product so far!



 Comments   
Comment by Guilherme Blanco [ 13/Apr/10 ]

This seems to be a valid issue to me.

This implementation is the actual solution to associations retrieval that are inherited (type joined).

Example:

/** Joined */
class Base {}

class Foo extends Base {}

class Bar {
    public $foo;
}

// This causes the CTI to link as INNER JOIN, which makes the result become 0
// il if you have no Foo's defined (although it should ignore this)
$q = $this->_em->createQuery('SELECT b, f FROM Bar b LEFT JOIN b.foo f'); 
Comment by Roman S. Borschel [ 13/Apr/10 ]

Yes, this is a possible solution for DDC-512 but on the SQL level. I still don't see this as appropriate for DQL, it just doesnt make sense to me, DQL joins object associations, there is no precedence.

Comment by Roman S. Borschel [ 13/Apr/10 ]

So, no, this has nothing to do with DDC-512. DDC-512 can even be fixed differently as outlined in my comments there.

Comment by Roman S. Borschel [ 13/Apr/10 ]

On a side note I would still like to know/see the following for this issue:

  • Some realisitic DQL examples where this feature would be essential, i.e. there is no other way to do it.
    This also means explaining what the impact on the resulting object graph is and why it makes sense.
  • Which other ORMs support this on the OQL/Criteria level?

So far, my stance on this issue is:

1) It doesnt make sense (semantically) in DQL
2) Its rarely needed
3) When you really need it you can use a NativeQuery anyway and use this nesting in SQL, where it probably belongs and makes more sense
4) It would (unnecessarily) complicate DQL

Thus I am currently leaning towards "Wont fix" for this issue.

Comment by Dennis Verspuij [ 13/Apr/10 ]

Hi Roman. I understand your doubts, and I have been breaking my head over
creating a realistic example the last few hours that would hopefully convince
you for implementing this feature. But actually I cannot find one that you wouldn't
consider to be trivial. I do have a number of very complex optimized queries written
for sportskickoff dot com (using Doctrine 1.2) but they are probably hard to understand
because they may not be selfdescribing. Below is one example literally ripped from
the application. Still they often can be broken down to my example query in this
ticket's description, but applied grouping, additional other joins on the root component
and/or other criteria made them impossible to rewrite using subselects or choosing
another root component. Most often they just performed way best using the nested
syntax and saved me a number of additional queries.

SELECT A.id, A.username, A.balance, COALESCE(SUM(B.stake), 0) AS sumstake, COUNT(B.id) AS nrbets
FROM account A
LEFT JOIN (
bet B
INNER JOIN game G ON G.id = :GAMEID AND B.timestampcompletion BETWEEN G.timestampstart AND G.timestampend
) ON B.accountid = A.id AND B.timestampcompletion IS NOT NULL
WHERE A.Status & :ACTIVEORDISQUALIFIED = :ACTIVE
GROUP BY A.id, A.username, A.balance
ORDER BY A.balance DESC, sumstake ASC, nrbets ASC, A.username ASC

But let's put it another way. I would also like this feature to be supported in DQL
because I just do not want to use native queries. Why would I want to use native
queries if it can be done using DQL? In DQL I work with class names and field
names, and they may differ from the underlying table and column names. Doctrine
takes care of that mapping based on my schema/annotations and I do not
have to "know" these mappings. In native queries I suddenly do have to "know"
these mappings. I use Doctrine because it makes my application portable and
enables me to work with my database in an OOP way like I do in my model,
abstracting things. The need for native queries partly reverts the benefits Doctrine
offers in the first place.

Btw, I recall to have successfully used the nested join syntax in HQL (.NET Hibernate)
but I cannot find examples on the web or a BNF notation.

Furthermore, in reply to your stances:
1) It indeed doesnt make sense (semantically) in DQL, it only makes the result
set different, but not the way data is hydrated into objects;
2) Its indeed rarely needed for inserting, updating and populating basic lists but
it allows you to better select what combinations of associated rows are joined
and which not in more optimized queries without having to use native queries,
or because they perform better than using subseletcs and alike.
3) Not having to use native queries is just an extra reason for using Doctrine and
maintains the abstraction the ORM provides througout on'es whole application
4) Why would it complicate DQL, if people do not know about or understand
the feature it wouldn't matter because not using parenthesises is the default
way to specify joins?

Well, this is it, can't find any more words to promote and make you enthusiastic.... lol.

Comment by Dennis Verspuij [ 13/Apr/10 ]

Ok, I have not given up yet... , here's a "stupid" example.

Imagine a book store that sells books of various authors and keeps track of those sales.
Let's say you would have an admin page that lists all authors, and for each author
its also shows the books and their sales dates since january 1st, but only for those
books that were actually sold and contain an A in its name. An optimized SQL query
to fetch all the information at once would be something like:

SELECT A., B., S.*
FROM author A
LEFT JOIN (
book B
INNER JOIN sale S ON S.book_id = B.id AND S.dt >= '2010-01-01'
) ON B.author_id = A.id AND A.name LIKE '%A%'

In DQL it would then be something like:

SELECT A., B., S.*
FROM author A
LEFT JOIN (
book B
INNER JOIN sale S WITH S.dt >= '2010-01-01'
) WITH A.name LIKE '%A%'

If the database would contain thousands of books, but sales for just a
few books, this will definitely perform better than using subselects.
Off course one would like to fetch array graphs instead of objects for
further optimization, but this hopefully shows my point.

I have attached a test casefor a similar query, though without the additional
join constraints for clarity. I surely hope you can consider it.

One last note, you shouldn't be afraid that nesting joins is not in the
ansi SQL spec. Select queries are about record sets and products
between these sets, tables are just the basic means of providing record
sets to the query. This is an important terminological difference to think about.
Specifying precedence with parenthesis around joins is a logical and
natural evolution of the ansi sql standard. For example views are a good
proof of this concept, I could define book B INNER JOIN sale S as a view
and LEFT JOIN that to authors to get effectively the same result
set as the above example. The database server would internally perform the
same query (though may additionally take indexes on the view into account).
That said, rdbm's that support this syntax would certainly never drop the
feature, as its not a feature but just plain logical and smart querying!

P.S. I had a hard time finding out how to run the test cases, I could not find
it in the Doctrine 2 documentation, development wiki, cookbook or any other
place, while finally it was as easy as running phpunit Doctrine_Tests_AllTests
from within the tests/ directory, or just phpunit Doctrine_Tests_ORM_Functional_Ticket_DDC349Test
for my test. Could you please add some info about this somewhere, it might
save others some googling.

Comment by Dennis Verspuij [ 13/Apr/10 ]

Test case as SVN patch using a parenthesized join.
Just remove the parenthesises from the query to have it fail...

Comment by Roman S. Borschel [ 29/May/10 ]

@"The need for native queries partly reverts the benefits Doctrine offers in the first place."

That is something I hugely disagree with. Neither SQL abstraction, nor database vendor independence is the main purpose of an ORM like Doctrine 2.
It is the state management of your objects, the transparent change tracking, lazy-loading and synchronization of the object state with the database state and nothing of this gets lost when using native queries.

We could rip out DQL and any other querying mechanism except a basic find() (and lazy-loading, of course), only providing the native query facility and even only supporting MySQL and would still retain all the core ORM functionality.

NativeQuery is one of the best and core "features" of the project. It is even the foundation for DQL. A DQL query is nothing more than an additional (beautiful) abstraction but what comes out is a native query + a ResultSetMapping, the same thing you can build yourself in the first place, even using the mapping metadata to construct the query. Nothing forces you to hardcode table and column names in native queries if you don't want that. Just use the mapping metadata, DQL does the same.

SQL abstraction and database vendor independence is icing on the cake, not the heart of the ORM.

Generated at Wed Nov 26 14:16:33 UTC 2014 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.