Doctrine 2 - ORM
  1. Doctrine 2 - ORM
  2. DDC-176

Query::iterate is broken when using JOIN

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0-ALPHA4
    • Component/s: ORM
    • Security Level: All
    • Labels:
      None

      Description

      These two queries return different result counts:

      $q = $em->createQuery('SELECT c, t FROM Entity\General\Country c LEFT JOIN c.Translations t ON t.id = c.id');
      $results = $q->execute();
      echo count($results) . " Results" . PHP_EOL;
      
      $q = $em->createQuery('SELECT c, t FROM Entity\General\Country c LEFT JOIN c.Translations t ON t.id = c.id');
      $results = $q->iterate();
      $count = 0;
      while ($results->next()) { $count++; }
      echo $count . " Results" . PHP_EOL;
      

      After investigating a bit into it, I found out that $results->next() delivers "null" results when the joined table has more than one value (i.e. in this case a Country has more then one Translation).

      When there is more than 1 Translation, the iterator adds one Translation to the Country and then delivers "null" for the remaining Translations (which is why the while loop stops after the first Country with more than one Translation)...

        Issue Links

          Activity

          Hide
          Roman S. Borschel added a comment -

          This is actually known. The bug here rather is that an exception should be thrown if iterate() is used on a query with fetch-joined collections.

          What is a fetch-joined collections? "select a, b from A a join a.bs b" where "bs" is a collection and it is put in the select clause, so appears in the result.

          iterate() just cant work with those queries. The SQL result set of such queries can be in any order with elements of a collection spread across the result set.
          iterate() can only see 1 row at a time and this is not enough to properly parse results where you have collections in the result set.

          It only works with a) no fetch-joins or b) only single-valued fetch-joins

          Show
          Roman S. Borschel added a comment - This is actually known. The bug here rather is that an exception should be thrown if iterate() is used on a query with fetch-joined collections. What is a fetch-joined collections? "select a, b from A a join a.bs b" where "bs" is a collection and it is put in the select clause, so appears in the result. iterate() just cant work with those queries. The SQL result set of such queries can be in any order with elements of a collection spread across the result set. iterate() can only see 1 row at a time and this is not enough to properly parse results where you have collections in the result set. It only works with a) no fetch-joins or b) only single-valued fetch-joins
          Hide
          Roman S. Borschel added a comment -

          I linked this to 2 other issues you might be interested in also.

          Show
          Roman S. Borschel added a comment - I linked this to 2 other issues you might be interested in also.
          Hide
          Nico Kaiser added a comment -

          Hm. Can't iterate() "look ahead" at the next row and see if it's relevant to the current row (and if, load all joined entities into the current one)?

          I would assume that execute() and iterate() both work the same with any query...

          Show
          Nico Kaiser added a comment - Hm. Can't iterate() "look ahead" at the next row and see if it's relevant to the current row (and if, load all joined entities into the current one)? I would assume that execute() and iterate() both work the same with any query...
          Hide
          Roman S. Borschel added a comment -

          As soon as we "look ahead" we need to look ahead until the end of the result set, because like I said the rows can quickly be out of order. Noone guaruantees you that all the rows that contain collection values are in order.

          As a result, we need to process the whole result as soon as we start with the first row, so we're back to normal hydration.

          iterate() is a way to keep memory consumption better under control when querying large result sets (thousands of rows). But it just does not work with any query. When it comes to raw performance, it is slower. It just helps to keep memory usage low but only if you detach every object after it has been hydrated.

          So this will keep the peek memory usage very low, even if the result set is very large:

          $q = $em->createQuery('select u from MyProject\Model\User u');
          $iterableResult = $q->iterate();
          while ($iterableResult as $row) {
             $user = $row[0];
             // do stuff with $user
            $em->detach($user); // detach so that it can be GC'ed
          }
          
          Show
          Roman S. Borschel added a comment - As soon as we "look ahead" we need to look ahead until the end of the result set, because like I said the rows can quickly be out of order. Noone guaruantees you that all the rows that contain collection values are in order. As a result, we need to process the whole result as soon as we start with the first row, so we're back to normal hydration. iterate() is a way to keep memory consumption better under control when querying large result sets (thousands of rows). But it just does not work with any query. When it comes to raw performance, it is slower. It just helps to keep memory usage low but only if you detach every object after it has been hydrated. So this will keep the peek memory usage very low, even if the result set is very large: $q = $em->createQuery('select u from MyProject\Model\User u'); $iterableResult = $q->iterate(); while ($iterableResult as $row) { $user = $row[0]; // do stuff with $user $em->detach($user); // detach so that it can be GC'ed }
          Hide
          Roman S. Borschel added a comment -
          Show
          Roman S. Borschel added a comment - See also this blog post: http://www.doctrine-project.org/blog/doctrine2-batch-processing
          Hide
          Roman S. Borschel added a comment - - edited

          Here is an example from our hydration tests:

              /**
               * Tests that the hydrator does not rely on a particular order of the rows
               * in the result set.
               *
               * DQL:
               * select c, b from Doctrine\Tests\Models\Forum\ForumCategory c inner join c.boards b
               * order by c.position asc, b.position asc
               *
               * Checks whether the boards are correctly assigned to the categories.
               *
               * The 'evil' result set that confuses the object population is displayed below.
               *
               * c.id  | c.position | c.name   | boardPos | b.id | b.category_id (just for clarity)
               *  1    | 0          | First    | 0        |   1  | 1
               *  2    | 0          | Second   | 0        |   2  | 2   <--
               *  1    | 0          | First    | 1        |   3  | 1
               *  1    | 0          | First    | 2        |   4  | 1
               */
          

          The first category (1) has boards: (1, 3, 4)
          The second category (2) has only one board: (2)

          You see in that due to the ordering by positions a row of the second category appears in the middle of a series of rows of the first category and its boards.
          And thats just one possible example. Now if we would look ahead during iterate() we dont know where to stop. If we would stop when the ID of the root object changes, you would get the first category with only 1 board in the collection, which is wrong.

          Show
          Roman S. Borschel added a comment - - edited Here is an example from our hydration tests: /** * Tests that the hydrator does not rely on a particular order of the rows * in the result set. * * DQL: * select c, b from Doctrine\Tests\Models\Forum\ForumCategory c inner join c.boards b * order by c.position asc, b.position asc * * Checks whether the boards are correctly assigned to the categories. * * The 'evil' result set that confuses the object population is displayed below. * * c.id | c.position | c.name | boardPos | b.id | b.category_id (just for clarity) * 1 | 0 | First | 0 | 1 | 1 * 2 | 0 | Second | 0 | 2 | 2 <-- * 1 | 0 | First | 1 | 3 | 1 * 1 | 0 | First | 2 | 4 | 1 */ The first category (1) has boards: (1, 3, 4) The second category (2) has only one board: (2) You see in that due to the ordering by positions a row of the second category appears in the middle of a series of rows of the first category and its boards. And thats just one possible example. Now if we would look ahead during iterate() we dont know where to stop. If we would stop when the ID of the root object changes, you would get the first category with only 1 board in the collection, which is wrong.
          Hide
          Benjamin Eberlei added a comment -

          An exception with detailed information is now thrown on iterating fetch join collections.

          Show
          Benjamin Eberlei added a comment - An exception with detailed information is now thrown on iterating fetch join collections.

            People

            • Assignee:
              Benjamin Eberlei
              Reporter:
              Nico Kaiser
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: