Doctrine 2 - ORM
  1. Doctrine 2 - ORM
  2. DDC-349

Add support for specifying precedence in joins in DQL

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0-ALPHA4
    • Fix Version/s: None
    • Component/s: DQL
    • Security Level: All
    • Labels:
      None

      Description

      This request is in followup to my doctrine-user message "Doctrine 2.0: Nested joins'.
      I am a bit surprised by the responses in that defining precedences in joins by placing parenthesis around join expressions is not well-known. Although not in the original SQL92 specification it is a major and important feature offered by all the RDBMS's that Doctrine 2 supports, and oftenly performs better than using subselects or alike. Doctrine 1 did not support it, but imho Doctrine 2 should support it to be a mature allround ORM.

      As a short example the following is a SQL statement with a nested join, where the nesting is absolutely necessary to return only a's together with either both b's and c's or no b's and c's at all:

      SELECT *
      FROM a A
      LEFT JOIN (
      b B
      INNER JOIN c C ON C.b_id = B.id
      ) ON B.a_id = A.id

      In order for Doctrine 2 to support this the BNF should be something like:
      Join ::= ["LEFT" ["OUTER"] | "INNER"] "JOIN" ( "(" JoinAssociationPathExpression ["AS"] AliasIdentificationVariable Join ")" | JoinAssociationPathExpression ["AS"] AliasIdentificationVariable ) [("ON" | "WITH") ConditionalExpression]
      instead of the current:
      Join ::= ["LEFT" ["OUTER"] | "INNER"] "JOIN" JoinAssociationPathExpression ["AS"] AliasIdentificationVariable [("ON" | "WITH") ConditionalExpression]

      This would allow DQL like:

      SELECT A, B, C
      FROM a A
      LEFT JOIN (
      A.b B
      INNER JOIN B.c C
      ) WITH B.something = 'value' AND C.something = 'othervalue'

      What further needs to be done is that the DQL parser loosly couples the ConditionalExpression to any of the previously parsed JoinAssociationPathExpression's instead of tieing it explicitely to the JoinAssociationPathExpression that preceedes it according to the old BNF notation. The new BNF should however not require any changes to the hydrator. Therefore I have the feeling that improving the DQL parser for nested joins does not require extensive work, while the benefit of running these kind of queries is considerable.

      As an extra substantiation here are links to (BNF) FROM clause documentations of the RDBMS's that Doctrine 2 supports, they all show support for nested joins:
      MySQL: http://dev.mysql.com/doc/refman/5.0/en/join.html
      PostgreSQL: http://www.postgresql.org/docs/8.4/interactive/sql-select.html#SQL-FROM and http://www.postgresql.org/docs/8.1/interactive/explicit-joins.html
      MSSQL: http://msdn.microsoft.com/en-us/library/ms177634.aspx
      Oracle: http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/statements_10002.htm#CHDDCHGF
      SQLite: http://www.sqlite.org/syntaxdiagrams.html#single-source

      I surely hope you will consider implementing this improvement because it would save me and others from the hassle of writing raw SQL queries or executing multiple (thus slow) queries in DQL for doing the same. Thanks anyway for the great product so far!

        Issue Links

          Activity

          Hide
          Roman S. Borschel added a comment -

          @"The need for native queries partly reverts the benefits Doctrine offers in the first place."

          That is something I hugely disagree with. Neither SQL abstraction, nor database vendor independence is the main purpose of an ORM like Doctrine 2.
          It is the state management of your objects, the transparent change tracking, lazy-loading and synchronization of the object state with the database state and nothing of this gets lost when using native queries.

          We could rip out DQL and any other querying mechanism except a basic find() (and lazy-loading, of course), only providing the native query facility and even only supporting MySQL and would still retain all the core ORM functionality.

          NativeQuery is one of the best and core "features" of the project. It is even the foundation for DQL. A DQL query is nothing more than an additional (beautiful) abstraction but what comes out is a native query + a ResultSetMapping, the same thing you can build yourself in the first place, even using the mapping metadata to construct the query. Nothing forces you to hardcode table and column names in native queries if you don't want that. Just use the mapping metadata, DQL does the same.

          SQL abstraction and database vendor independence is icing on the cake, not the heart of the ORM.

          Show
          Roman S. Borschel added a comment - @"The need for native queries partly reverts the benefits Doctrine offers in the first place." That is something I hugely disagree with. Neither SQL abstraction, nor database vendor independence is the main purpose of an ORM like Doctrine 2. It is the state management of your objects, the transparent change tracking, lazy-loading and synchronization of the object state with the database state and nothing of this gets lost when using native queries. We could rip out DQL and any other querying mechanism except a basic find() (and lazy-loading, of course), only providing the native query facility and even only supporting MySQL and would still retain all the core ORM functionality. NativeQuery is one of the best and core "features" of the project. It is even the foundation for DQL. A DQL query is nothing more than an additional (beautiful) abstraction but what comes out is a native query + a ResultSetMapping, the same thing you can build yourself in the first place, even using the mapping metadata to construct the query . Nothing forces you to hardcode table and column names in native queries if you don't want that. Just use the mapping metadata, DQL does the same. SQL abstraction and database vendor independence is icing on the cake, not the heart of the ORM.
          Hide
          Dennis Verspuij added a comment -

          Test case as SVN patch using a parenthesized join.
          Just remove the parenthesises from the query to have it fail...

          Show
          Dennis Verspuij added a comment - Test case as SVN patch using a parenthesized join. Just remove the parenthesises from the query to have it fail...
          Hide
          Dennis Verspuij added a comment - - edited

          Ok, I have not given up yet... , here's a "stupid" example.

          Imagine a book store that sells books of various authors and keeps track of those sales.
          Let's say you would have an admin page that lists all authors, and for each author
          its also shows the books and their sales dates since january 1st, but only for those
          books that were actually sold and contain an A in its name. An optimized SQL query
          to fetch all the information at once would be something like:

          SELECT A., B., S.*
          FROM author A
          LEFT JOIN (
          book B
          INNER JOIN sale S ON S.book_id = B.id AND S.dt >= '2010-01-01'
          ) ON B.author_id = A.id AND A.name LIKE '%A%'

          In DQL it would then be something like:

          SELECT A., B., S.*
          FROM author A
          LEFT JOIN (
          book B
          INNER JOIN sale S WITH S.dt >= '2010-01-01'
          ) WITH A.name LIKE '%A%'

          If the database would contain thousands of books, but sales for just a
          few books, this will definitely perform better than using subselects.
          Off course one would like to fetch array graphs instead of objects for
          further optimization, but this hopefully shows my point.

          I have attached a test casefor a similar query, though without the additional
          join constraints for clarity. I surely hope you can consider it.

          One last note, you shouldn't be afraid that nesting joins is not in the
          ansi SQL spec. Select queries are about record sets and products
          between these sets, tables are just the basic means of providing record
          sets to the query. This is an important terminological difference to think about.
          Specifying precedence with parenthesis around joins is a logical and
          natural evolution of the ansi sql standard. For example views are a good
          proof of this concept, I could define book B INNER JOIN sale S as a view
          and LEFT JOIN that to authors to get effectively the same result
          set as the above example. The database server would internally perform the
          same query (though may additionally take indexes on the view into account).
          That said, rdbm's that support this syntax would certainly never drop the
          feature, as its not a feature but just plain logical and smart querying!

          P.S. I had a hard time finding out how to run the test cases, I could not find
          it in the Doctrine 2 documentation, development wiki, cookbook or any other
          place, while finally it was as easy as running phpunit Doctrine_Tests_AllTests
          from within the tests/ directory, or just phpunit Doctrine_Tests_ORM_Functional_Ticket_DDC349Test
          for my test. Could you please add some info about this somewhere, it might
          save others some googling.

          Show
          Dennis Verspuij added a comment - - edited Ok, I have not given up yet... , here's a "stupid" example. Imagine a book store that sells books of various authors and keeps track of those sales. Let's say you would have an admin page that lists all authors, and for each author its also shows the books and their sales dates since january 1st, but only for those books that were actually sold and contain an A in its name. An optimized SQL query to fetch all the information at once would be something like: SELECT A. , B. , S.* FROM author A LEFT JOIN ( book B INNER JOIN sale S ON S.book_id = B.id AND S.dt >= '2010-01-01' ) ON B.author_id = A.id AND A.name LIKE '%A%' In DQL it would then be something like: SELECT A. , B. , S.* FROM author A LEFT JOIN ( book B INNER JOIN sale S WITH S.dt >= '2010-01-01' ) WITH A.name LIKE '%A%' If the database would contain thousands of books, but sales for just a few books, this will definitely perform better than using subselects. Off course one would like to fetch array graphs instead of objects for further optimization, but this hopefully shows my point. I have attached a test casefor a similar query, though without the additional join constraints for clarity. I surely hope you can consider it. One last note, you shouldn't be afraid that nesting joins is not in the ansi SQL spec. Select queries are about record sets and products between these sets, tables are just the basic means of providing record sets to the query. This is an important terminological difference to think about. Specifying precedence with parenthesis around joins is a logical and natural evolution of the ansi sql standard. For example views are a good proof of this concept, I could define book B INNER JOIN sale S as a view and LEFT JOIN that to authors to get effectively the same result set as the above example. The database server would internally perform the same query (though may additionally take indexes on the view into account). That said, rdbm's that support this syntax would certainly never drop the feature, as its not a feature but just plain logical and smart querying! P.S. I had a hard time finding out how to run the test cases, I could not find it in the Doctrine 2 documentation, development wiki, cookbook or any other place, while finally it was as easy as running phpunit Doctrine_Tests_AllTests from within the tests/ directory, or just phpunit Doctrine_Tests_ORM_Functional_Ticket_DDC349Test for my test. Could you please add some info about this somewhere, it might save others some googling.
          Hide
          Dennis Verspuij added a comment -

          Hi Roman. I understand your doubts, and I have been breaking my head over
          creating a realistic example the last few hours that would hopefully convince
          you for implementing this feature. But actually I cannot find one that you wouldn't
          consider to be trivial. I do have a number of very complex optimized queries written
          for sportskickoff dot com (using Doctrine 1.2) but they are probably hard to understand
          because they may not be selfdescribing. Below is one example literally ripped from
          the application. Still they often can be broken down to my example query in this
          ticket's description, but applied grouping, additional other joins on the root component
          and/or other criteria made them impossible to rewrite using subselects or choosing
          another root component. Most often they just performed way best using the nested
          syntax and saved me a number of additional queries.

          SELECT A.id, A.username, A.balance, COALESCE(SUM(B.stake), 0) AS sumstake, COUNT(B.id) AS nrbets
          FROM account A
          LEFT JOIN (
          bet B
          INNER JOIN game G ON G.id = :GAMEID AND B.timestampcompletion BETWEEN G.timestampstart AND G.timestampend
          ) ON B.accountid = A.id AND B.timestampcompletion IS NOT NULL
          WHERE A.Status & :ACTIVEORDISQUALIFIED = :ACTIVE
          GROUP BY A.id, A.username, A.balance
          ORDER BY A.balance DESC, sumstake ASC, nrbets ASC, A.username ASC

          But let's put it another way. I would also like this feature to be supported in DQL
          because I just do not want to use native queries. Why would I want to use native
          queries if it can be done using DQL? In DQL I work with class names and field
          names, and they may differ from the underlying table and column names. Doctrine
          takes care of that mapping based on my schema/annotations and I do not
          have to "know" these mappings. In native queries I suddenly do have to "know"
          these mappings. I use Doctrine because it makes my application portable and
          enables me to work with my database in an OOP way like I do in my model,
          abstracting things. The need for native queries partly reverts the benefits Doctrine
          offers in the first place.

          Btw, I recall to have successfully used the nested join syntax in HQL (.NET Hibernate)
          but I cannot find examples on the web or a BNF notation.

          Furthermore, in reply to your stances:
          1) It indeed doesnt make sense (semantically) in DQL, it only makes the result
          set different, but not the way data is hydrated into objects;
          2) Its indeed rarely needed for inserting, updating and populating basic lists but
          it allows you to better select what combinations of associated rows are joined
          and which not in more optimized queries without having to use native queries,
          or because they perform better than using subseletcs and alike.
          3) Not having to use native queries is just an extra reason for using Doctrine and
          maintains the abstraction the ORM provides througout on'es whole application
          4) Why would it complicate DQL, if people do not know about or understand
          the feature it wouldn't matter because not using parenthesises is the default
          way to specify joins?

          Well, this is it, can't find any more words to promote and make you enthusiastic.... lol.

          Show
          Dennis Verspuij added a comment - Hi Roman. I understand your doubts, and I have been breaking my head over creating a realistic example the last few hours that would hopefully convince you for implementing this feature. But actually I cannot find one that you wouldn't consider to be trivial. I do have a number of very complex optimized queries written for sportskickoff dot com (using Doctrine 1.2) but they are probably hard to understand because they may not be selfdescribing. Below is one example literally ripped from the application. Still they often can be broken down to my example query in this ticket's description, but applied grouping, additional other joins on the root component and/or other criteria made them impossible to rewrite using subselects or choosing another root component. Most often they just performed way best using the nested syntax and saved me a number of additional queries. SELECT A.id, A.username, A.balance, COALESCE(SUM(B.stake), 0) AS sumstake, COUNT(B.id) AS nrbets FROM account A LEFT JOIN ( bet B INNER JOIN game G ON G.id = :GAMEID AND B.timestampcompletion BETWEEN G.timestampstart AND G.timestampend ) ON B.accountid = A.id AND B.timestampcompletion IS NOT NULL WHERE A.Status & :ACTIVEORDISQUALIFIED = :ACTIVE GROUP BY A.id, A.username, A.balance ORDER BY A.balance DESC, sumstake ASC, nrbets ASC, A.username ASC But let's put it another way. I would also like this feature to be supported in DQL because I just do not want to use native queries. Why would I want to use native queries if it can be done using DQL? In DQL I work with class names and field names, and they may differ from the underlying table and column names. Doctrine takes care of that mapping based on my schema/annotations and I do not have to "know" these mappings. In native queries I suddenly do have to "know" these mappings. I use Doctrine because it makes my application portable and enables me to work with my database in an OOP way like I do in my model, abstracting things. The need for native queries partly reverts the benefits Doctrine offers in the first place. Btw, I recall to have successfully used the nested join syntax in HQL (.NET Hibernate) but I cannot find examples on the web or a BNF notation. Furthermore, in reply to your stances: 1) It indeed doesnt make sense (semantically) in DQL, it only makes the result set different, but not the way data is hydrated into objects; 2) Its indeed rarely needed for inserting, updating and populating basic lists but it allows you to better select what combinations of associated rows are joined and which not in more optimized queries without having to use native queries, or because they perform better than using subseletcs and alike. 3) Not having to use native queries is just an extra reason for using Doctrine and maintains the abstraction the ORM provides througout on'es whole application 4) Why would it complicate DQL, if people do not know about or understand the feature it wouldn't matter because not using parenthesises is the default way to specify joins? Well, this is it, can't find any more words to promote and make you enthusiastic.... lol.
          Hide
          Roman S. Borschel added a comment - - edited

          On a side note I would still like to know/see the following for this issue:

          • Some realisitic DQL examples where this feature would be essential, i.e. there is no other way to do it.
            This also means explaining what the impact on the resulting object graph is and why it makes sense.
          • Which other ORMs support this on the OQL/Criteria level?

          So far, my stance on this issue is:

          1) It doesnt make sense (semantically) in DQL
          2) Its rarely needed
          3) When you really need it you can use a NativeQuery anyway and use this nesting in SQL, where it probably belongs and makes more sense
          4) It would (unnecessarily) complicate DQL

          Thus I am currently leaning towards "Wont fix" for this issue.

          Show
          Roman S. Borschel added a comment - - edited On a side note I would still like to know/see the following for this issue: Some realisitic DQL examples where this feature would be essential, i.e. there is no other way to do it. This also means explaining what the impact on the resulting object graph is and why it makes sense. Which other ORMs support this on the OQL/Criteria level? So far, my stance on this issue is: 1) It doesnt make sense (semantically) in DQL 2) Its rarely needed 3) When you really need it you can use a NativeQuery anyway and use this nesting in SQL, where it probably belongs and makes more sense 4) It would (unnecessarily) complicate DQL Thus I am currently leaning towards "Wont fix" for this issue.
          Hide
          Roman S. Borschel added a comment -

          So, no, this has nothing to do with DDC-512. DDC-512 can even be fixed differently as outlined in my comments there.

          Show
          Roman S. Borschel added a comment - So, no, this has nothing to do with DDC-512 . DDC-512 can even be fixed differently as outlined in my comments there.
          Hide
          Roman S. Borschel added a comment -

          Yes, this is a possible solution for DDC-512 but on the SQL level. I still don't see this as appropriate for DQL, it just doesnt make sense to me, DQL joins object associations, there is no precedence.

          Show
          Roman S. Borschel added a comment - Yes, this is a possible solution for DDC-512 but on the SQL level . I still don't see this as appropriate for DQL, it just doesnt make sense to me, DQL joins object associations, there is no precedence.
          Hide
          Guilherme Blanco added a comment - - edited

          This seems to be a valid issue to me.

          This implementation is the actual solution to associations retrieval that are inherited (type joined).

          Example:

          /** Joined */
          class Base {}
          
          class Foo extends Base {}
          
          class Bar {
              public $foo;
          }
          
          // This causes the CTI to link as INNER JOIN, which makes the result become 0
          // il if you have no Foo's defined (although it should ignore this)
          $q = $this->_em->createQuery('SELECT b, f FROM Bar b LEFT JOIN b.foo f'); 
          
          Show
          Guilherme Blanco added a comment - - edited This seems to be a valid issue to me. This implementation is the actual solution to associations retrieval that are inherited (type joined). Example: /** Joined */ class Base {} class Foo extends Base {} class Bar { public $foo; } // This causes the CTI to link as INNER JOIN, which makes the result become 0 // il if you have no Foo's defined (although it should ignore this ) $q = $ this ->_em->createQuery('SELECT b, f FROM Bar b LEFT JOIN b.foo f');

            People

            • Assignee:
              Roman S. Borschel
              Reporter:
              Dennis Verspuij
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: