Doctrine 2 - ORM
  1. Doctrine 2 - ORM
  2. DDC-1958

pager produces wrong results on postgresql

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.3
    • Fix Version/s: 2.3.1
    • Component/s: Tools
    • Security Level: All
    • Labels:
    • Environment:
      * Postgres 9.1, 9.2
      * PHP 5.4

      Description

      The query build by pager to get the subset of PKs to fetch produces wrong results on potgresql (and probably any database), that conforms to the SQL standard. The standard says, that if you wish to have the results in specific order, then you have to specify that by using an ORDER BY clause. If such a clause is not present the database can return the results in whatever order it sees fit.

      Testcase fixtures:

      CREATE TABLE test (
          id integer,
          name text
      );
      
      INSERT INTO test VALUES (1, 'c');
      INSERT INTO test VALUES (2, 'a');
      INSERT INTO test VALUES (3, 'e');
      INSERT INTO test VALUES (4, 'b');
      INSERT INTO test VALUES (5, 'd');
      INSERT INTO test VALUES (6, 'a');
      INSERT INTO test VALUES (7, 'g');
      INSERT INTO test VALUES (8, 'h');
      INSERT INTO test VALUES (9, 'e');
      INSERT INTO test VALUES (10, 'j');
      

      Passing f.e.

      $qb = $this->repository
          ->createQueryBuilder('t')
          ->select('t')
          ->setFirstResult(0)
          ->setMaxResults(5)
          ->addOrderBy('t.name', 'ASC')
      

      to pager produces SQL like this modified for readability

      SELECT DISTINCT id FROM (
          SELECT id, name FROM test ORDER BY name
        ) dctrn_result
        LIMIT 5 OFFSET 0
      

      Now there is nothing wrong with this modified query per se, but there is no ORDER BY clause in the outer query so according to the standard the DB can choose whatever order it seems fit. Now mysql chooses the same order, but postgresql does not and it's probably not the only DB doing so.

      If you are interested in the results, this is the output I'm seeing:

      • postgresql: 8,4,1,5,3
      • mysql : 2,6,4,1,5

      I and my coworker came to the standard compliant solution it was also tested on the dataset above on both postgresql and mysql and it produced equal results. We have found only one corner case this won't work and IMHO that can't be fixed. The problem is when you do a sort on a field from a table that is in 1:n relation to the main table.. e.g tables posts and tags, where one post can have a multiple tags and you want your results sorted by a tag.

      Recipe for a correct query is:

      • remember the ORDER BY fields from original query and then remove them
      • wrap the original query with a DISTINCT query, but add the fields from ORDER BY to the SELECT part of that query and add the whole ORDER BY to the end of it, also add the PK to the order by clause, and add the LIMIT clause
      • wrap the resulting query into another query and select just the id.

      so if I take the example from above the SQL should look like this:

      SELECT id FROM (
        SELECT DISTINCT id, name FROM (
          SELECT id, name FROM test
        ) dctrn_result_inner
        ORDER BY name, id LIMIT 5 OFFSET 0
      ) dctrn_result
      

        Issue Links

          Activity

          Miha Vrhovnik created issue -
          Miha Vrhovnik made changes -
          Field Original Value New Value
          Summary wrong results produced in postgresql pager produces wrong results on postgresql
          Miha Vrhovnik made changes -
          Environment * Postgres 9.2
          * PHP 5.4
          * Postgres 9.1, 9.2
          * PHP 5.4
          Description I'm trying to use the pager with postgres, but the order is all messed up and it even depends on the limit part of the query.

                  $qb = $this->repository
                          ->createQueryBuilder('a')
                          ->select('a', 't')
                          ->join('a.permissions', 'p')
                          ->leftJoin('a.tags', 't')
                          ->where('p.account = :account')
                          ->andWhere('p.flag IN (:flags)')
                          ->setParameter('account', $idAccount)
                          ->setParameter('flags', array(PermissionFlagEnum::Owner))
                          ->setFirstResult($resultOffset)
                          ->setMaxResults($resultLimit)
                          ->addOrderBy('a.id', 'DESC')

          I'm going to list just the 2nd quiery because this is the one that fails.

          SELECT DISTINCT id_asset5 FROM (SELECT a0_.name AS name0, a0_.type AS type1, a0_.is_public AS is_public2, a0_.created_at AS created_at3, a0_.updated_at AS updated_at4, a0_.id_asset AS id_asset5, a1_.tag AS tag6, a1_.tag_canonical AS tag_canonical7, a1_.created_at AS created_at8, a1_.updated_at AS updated_at9, a1_.id_asset_tag AS id_asset_tag10 FROM asset a0_ INNER JOIN asset_permission a2_ ON a0_.id_asset = a2_.ref_asset LEFT JOIN asset_tag a1_ ON a0_.id_asset = a1_.ref_asset WHERE a2_.ref_account = ? AND a2_.flag IN (?) ORDER BY a0_.id_asset DESC) dctrn_result LIMIT 20 OFFSET 0

          This is what I had debugged. The inner query returns the results in the correct order, but the DISTINCT part of outer query returns the 20 results in a completely random order.
          Removing the DISTINCT keyword shows the ids in correct order, but that's not solution as there can be multiple ids and that would also result in the wrong results being returned.

          The solution for postgres I have found to be working is to rewrite the SELECT part of the subselect to:
          SELECT * FROM (SELECT DISTINCT a0_.id_asset FROM asset a0_ INNER JOIN asset_permission a2_ ON a0_.id_asset = a2_.ref_asset LEFT JOIN asset_tag a1_ ON a0_.id_asset = a1_.ref_asset WHERE a2_.ref_account = ? AND a2_.flag IN (?) ORDER BY a0_.id_asset DESC) dctrn_result LIMIT 20 OFFSET 0;
          The query build by pager to get the subset of PKs to fetch produces wrong results on potgresql (and probably any database), that conforms to the SQL standard. The standard says, that if you wish to have the results in specific order, then you have to specify that by using an ORDER BY clause. If such a clause is not present the database can return the results in whatever order it sees fit.

          Testcase fixtures:
          {{
          CREATE TABLE test (
              id integer,
              name text
          );

          INSERT INTO test VALUES (1, 'c');
          INSERT INTO test VALUES (2, 'a');
          INSERT INTO test VALUES (3, 'e');
          INSERT INTO test VALUES (4, 'b');
          INSERT INTO test VALUES (5, 'd');
          INSERT INTO test VALUES (6, 'a');
          INSERT INTO test VALUES (7, 'g');
          INSERT INTO test VALUES (8, 'h');
          INSERT INTO test VALUES (9, 'e');
          INSERT INTO test VALUES (10, 'j');

          }}

          Passing f.e. {{
          $qb = $this->repository
              ->createQueryBuilder('t')
              ->select('t')
              ->setFirstResult(0)
              ->setMaxResults(5)
              ->addOrderBy('t.name', 'ASC')
          }}
          to pager produces SQL like this modified for readability {{ SELECT DISTINCT id FROM (
              SELECT id, name FROM test ORDER BY name
            ) dctrn_result
            LIMIT 5 OFFSET 0
          }}
          Now there is nothing wrong with this modified query per se, but there is no ORDER BY clause in the outer query so according to the standard the DB can choose whatever order it seems fit. Now mysql chooses the same order, but postgresql does not and it's probably not the only DB doing so.

          If you are interested in the results, this is the output I'm seeing:
          * postgresql: 8,4,1,5,3
          * mysql : 2,6,4,1,5

          I and my coworker came to the standard compliant solution it was also tested on the dataset above on both postgresql and mysql and it produced equal results. We have found only one corner case this won't work and IMHO that can't be fixed. The problem is when you do a sort on a field from a table that is in 1:n relation to the main table.. e.g tables posts and tags, where one post can have a multiple tags and you want your results sorted by a tag.

          Recipe for a correct query is:
          * remember the ORDER BY fields from original query and then remove them
          * wrap the original query with a DISTINCT query, but add the fields from ORDER BY to the SELECT part of that query and add the whole ORDER BY to the end of it, also add the PK to the order by clause, and add the LIMIT clause
          * wrap the resulting query into another query and select just the id.

          so if I take the example from above the SQL should look like this:
          {{
          SELECT id FROM (
            SELECT DISTINCT id, name FROM (
              SELECT id, name FROM test
            ) dctrn_result_inner
            ORDER BY name, id LIMIT 5 OFFSET 0
          ) dctrn_result

          }}
          Benjamin Eberlei made changes -
          Description The query build by pager to get the subset of PKs to fetch produces wrong results on potgresql (and probably any database), that conforms to the SQL standard. The standard says, that if you wish to have the results in specific order, then you have to specify that by using an ORDER BY clause. If such a clause is not present the database can return the results in whatever order it sees fit.

          Testcase fixtures:
          {{
          CREATE TABLE test (
              id integer,
              name text
          );

          INSERT INTO test VALUES (1, 'c');
          INSERT INTO test VALUES (2, 'a');
          INSERT INTO test VALUES (3, 'e');
          INSERT INTO test VALUES (4, 'b');
          INSERT INTO test VALUES (5, 'd');
          INSERT INTO test VALUES (6, 'a');
          INSERT INTO test VALUES (7, 'g');
          INSERT INTO test VALUES (8, 'h');
          INSERT INTO test VALUES (9, 'e');
          INSERT INTO test VALUES (10, 'j');

          }}

          Passing f.e. {{
          $qb = $this->repository
              ->createQueryBuilder('t')
              ->select('t')
              ->setFirstResult(0)
              ->setMaxResults(5)
              ->addOrderBy('t.name', 'ASC')
          }}
          to pager produces SQL like this modified for readability {{ SELECT DISTINCT id FROM (
              SELECT id, name FROM test ORDER BY name
            ) dctrn_result
            LIMIT 5 OFFSET 0
          }}
          Now there is nothing wrong with this modified query per se, but there is no ORDER BY clause in the outer query so according to the standard the DB can choose whatever order it seems fit. Now mysql chooses the same order, but postgresql does not and it's probably not the only DB doing so.

          If you are interested in the results, this is the output I'm seeing:
          * postgresql: 8,4,1,5,3
          * mysql : 2,6,4,1,5

          I and my coworker came to the standard compliant solution it was also tested on the dataset above on both postgresql and mysql and it produced equal results. We have found only one corner case this won't work and IMHO that can't be fixed. The problem is when you do a sort on a field from a table that is in 1:n relation to the main table.. e.g tables posts and tags, where one post can have a multiple tags and you want your results sorted by a tag.

          Recipe for a correct query is:
          * remember the ORDER BY fields from original query and then remove them
          * wrap the original query with a DISTINCT query, but add the fields from ORDER BY to the SELECT part of that query and add the whole ORDER BY to the end of it, also add the PK to the order by clause, and add the LIMIT clause
          * wrap the resulting query into another query and select just the id.

          so if I take the example from above the SQL should look like this:
          {{
          SELECT id FROM (
            SELECT DISTINCT id, name FROM (
              SELECT id, name FROM test
            ) dctrn_result_inner
            ORDER BY name, id LIMIT 5 OFFSET 0
          ) dctrn_result

          }}
          The query build by pager to get the subset of PKs to fetch produces wrong results on potgresql (and probably any database), that conforms to the SQL standard. The standard says, that if you wish to have the results in specific order, then you have to specify that by using an ORDER BY clause. If such a clause is not present the database can return the results in whatever order it sees fit.

          Testcase fixtures:
          {code}
          CREATE TABLE test (
              id integer,
              name text
          );

          INSERT INTO test VALUES (1, 'c');
          INSERT INTO test VALUES (2, 'a');
          INSERT INTO test VALUES (3, 'e');
          INSERT INTO test VALUES (4, 'b');
          INSERT INTO test VALUES (5, 'd');
          INSERT INTO test VALUES (6, 'a');
          INSERT INTO test VALUES (7, 'g');
          INSERT INTO test VALUES (8, 'h');
          INSERT INTO test VALUES (9, 'e');
          INSERT INTO test VALUES (10, 'j');
          {code}

          Passing f.e.

          {code}
          $qb = $this->repository
              ->createQueryBuilder('t')
              ->select('t')
              ->setFirstResult(0)
              ->setMaxResults(5)
              ->addOrderBy('t.name', 'ASC')
          {code}

          to pager produces SQL like this modified for readability

          {code}
          SELECT DISTINCT id FROM (
              SELECT id, name FROM test ORDER BY name
            ) dctrn_result
            LIMIT 5 OFFSET 0
          {code}

          Now there is nothing wrong with this modified query per se, but there is no ORDER BY clause in the outer query so according to the standard the DB can choose whatever order it seems fit. Now mysql chooses the same order, but postgresql does not and it's probably not the only DB doing so.

          If you are interested in the results, this is the output I'm seeing:

          * postgresql: 8,4,1,5,3
          * mysql : 2,6,4,1,5

          I and my coworker came to the standard compliant solution it was also tested on the dataset above on both postgresql and mysql and it produced equal results. We have found only one corner case this won't work and IMHO that can't be fixed. The problem is when you do a sort on a field from a table that is in 1:n relation to the main table.. e.g tables posts and tags, where one post can have a multiple tags and you want your results sorted by a tag.

          Recipe for a correct query is:

          * remember the ORDER BY fields from original query and then remove them
          * wrap the original query with a DISTINCT query, but add the fields from ORDER BY to the SELECT part of that query and add the whole ORDER BY to the end of it, also add the PK to the order by clause, and add the LIMIT clause
          * wrap the resulting query into another query and select just the id.

          so if I take the example from above the SQL should look like this:

          {code}
          SELECT id FROM (
            SELECT DISTINCT id, name FROM (
              SELECT id, name FROM test
            ) dctrn_result_inner
            ORDER BY name, id LIMIT 5 OFFSET 0
          ) dctrn_result
          {code}
          Hide
          Jean-Philippe THEVENOUX added a comment - - edited

          I reproduce same problem with Postgres 7.4, Doctrine 2.3 whereas with doctrine 2.2, all is fine
          Hope there'll a fix in next doctrine version

          Show
          Jean-Philippe THEVENOUX added a comment - - edited I reproduce same problem with Postgres 7.4, Doctrine 2.3 whereas with doctrine 2.2, all is fine Hope there'll a fix in next doctrine version
          Benjamin Eberlei made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 2.3.1 [ 10323 ]
          Resolution Fixed [ 1 ]
          Hide
          Raymond Kolbe added a comment -

          http://www.doctrine-project.org/jira/browse/DDC-1800 This relates.

          I just published a PR for an Oracle fix, but your solution appears to work for Oracle as well (issue is the same).

          Show
          Raymond Kolbe added a comment - http://www.doctrine-project.org/jira/browse/DDC-1800 This relates. I just published a PR for an Oracle fix, but your solution appears to work for Oracle as well (issue is the same).
          Hide
          Bojidar Hristov added a comment -

          Same bug occurs in MariaDB 5.5.

          I commented out check for PostgreSQL and it works fine. Can you fix it for MariaDB too? Thanks

          Show
          Bojidar Hristov added a comment - Same bug occurs in MariaDB 5.5. I commented out check for PostgreSQL and it works fine. Can you fix it for MariaDB too? Thanks
          Hide
          Guilherme Santos added a comment - - edited

          I make a ORDER BY with a HIDDEN field and this still happen to me! Like this:
          $qb->addSelect('CASE WHEN p.description LIKE :description THEN 0 ELSE 1 END HIDDEN relevance')->addOrderBy('relevance');

          This order by is ignored causing the same error!

          In this line https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/Query/SqlWalker.php#L1310
          you don't add to ResultSetMapping so when you verifying in function preserveSqlOrdering the field doesn't exists!

          Show
          Guilherme Santos added a comment - - edited I make a ORDER BY with a HIDDEN field and this still happen to me! Like this: $qb->addSelect('CASE WHEN p.description LIKE :description THEN 0 ELSE 1 END HIDDEN relevance')->addOrderBy('relevance'); This order by is ignored causing the same error! In this line https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/Query/SqlWalker.php#L1310 you don't add to ResultSetMapping so when you verifying in function preserveSqlOrdering the field doesn't exists!
          Hide
          Liam O'Boyle added a comment -

          I've just been bitten by the "corner case" described above, "the problem is when you do a sort on a field from a table that is in 1:n relation to the main table.. e.g tables posts and tags, where one post can have a multiple tags and you want your results sorted by a tag.".

          This is a pretty significant bug, as the end result is that data that should come back from the query doesn't. While there probably isn't a good universal workaround, the MySQL behaviour before this was already correct because the outer query was returning the ids in the same order as the internal query (even though it isn't required to by the standard). Is it possible to avoid having this apply to MySQL so that it doesn't introduce an additional bug in an attempt to fix an issue that doesn't apply to that platform anyway?

          Show
          Liam O'Boyle added a comment - I've just been bitten by the "corner case" described above, "the problem is when you do a sort on a field from a table that is in 1:n relation to the main table.. e.g tables posts and tags, where one post can have a multiple tags and you want your results sorted by a tag.". This is a pretty significant bug, as the end result is that data that should come back from the query doesn't. While there probably isn't a good universal workaround, the MySQL behaviour before this was already correct because the outer query was returning the ids in the same order as the internal query (even though it isn't required to by the standard). Is it possible to avoid having this apply to MySQL so that it doesn't introduce an additional bug in an attempt to fix an issue that doesn't apply to that platform anyway?
          Hide
          Miha Vrhovnik added a comment - - edited

          @Liam As you can se above the same applies to mariadb and if you look at the issues on the githubs doctrine project page you'll see that there is the same problem with newer mysql releases. AS I've written above. this corner case cannot be solved.

          Show
          Miha Vrhovnik added a comment - - edited @Liam As you can se above the same applies to mariadb and if you look at the issues on the githubs doctrine project page you'll see that there is the same problem with newer mysql releases. AS I've written above. this corner case cannot be solved.
          Hide
          Liam O'Boyle added a comment -

          Thanks Miha. I couldn't find this on the github page so didn't realise that it was affecting some newer MySQL releases (it didn't seem to affect mine, 5.5). If that's the case, then as you point out it can't even be fixed for MySQL.

          Perhaps the lack of support could be more explicit instead? If you attempt to use the paginator with two FROM tables then a RuntimeException is thrown, if we did the same when the ORDER BY conditions applied to tables joined via a 1:m relationship then at least users would know that things were going wrong rather than getting strangely unpredictable results.

          Show
          Liam O'Boyle added a comment - Thanks Miha. I couldn't find this on the github page so didn't realise that it was affecting some newer MySQL releases (it didn't seem to affect mine, 5.5). If that's the case, then as you point out it can't even be fixed for MySQL. Perhaps the lack of support could be more explicit instead? If you attempt to use the paginator with two FROM tables then a RuntimeException is thrown, if we did the same when the ORDER BY conditions applied to tables joined via a 1:m relationship then at least users would know that things were going wrong rather than getting strangely unpredictable results.
          Marco Pivetta made changes -
          Link This issue is referenced by DDC-3336 [ DDC-3336 ]

          This list may be incomplete, as errors occurred whilst retrieving source from linked applications:

          • Request to http://www.doctrine-project.org/fisheye/ failed: Error in remote call to 'FishEye 0 (http://www.doctrine-project.org/fisheye/)' (http://www.doctrine-project.org/fisheye) [AbstractRestCommand{path='/rest-service-fe/search-v1/crossRepositoryQuery', params={query=DDC-1958, expand=changesets[0:20].revisions[0:29],reviews}, methodType=GET}] : Received status code 503 (Service Temporarily Unavailable)

            People

            • Assignee:
              Benjamin Eberlei
              Reporter:
              Miha Vrhovnik
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: