Doctrine 2 - ORM
  1. Doctrine 2 - ORM
  2. DDC-2381

Pagination query can be simplified when simple joins are applied

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.3, 2.4
    • Fix Version/s: None
    • Component/s: ORM
    • Security Level: All
    • Labels:

      Description

      Hi.
      In mysql db table i have > 200,000 items.
      I use native doctrine pagination for paging the items list.
      But generated query that gets ids for items list in paging works more then 150 sec on my workstation

      SELECT DISTINCT id0 FROM (SELECT m0_.id AS id0, m0_.title AS title1, m0_.text AS text2, m0_.price AS price3, m0_.originalPrice AS originalPrice4, m0_.condition_type AS condition_type5, m0_.image_1 AS image_16, m0_.image_2 AS image_27, m0_.image_3 AS image_38, m0_.image_4 AS image_49, m0_.image_5 AS image_510, m0_.video AS video11, m0_.contact_email AS contact_email12, m0_.contact_name AS contact_name13, m0_.contact_phone AS contact_phone14, m0_.contact_type AS contact_type15, m0_.published AS published16, m0_.type AS type17, m0_.status AS status18, m0_.highlight AS highlight19, m0_.urgent AS urgent20, m0_.topads AS topads21, m0_.period AS period22, m0_.hits AS hits23, m0_.ip AS ip24, m0_.created_at AS created_at25, m0_.updated_at AS updated_at26 FROM milla_message m0_ INNER JOIN milla_currency m1_ ON m0_.currency_id = m1_.id INNER JOIN milla_category m2_ ON m0_.category_id = m2_.id INNER JOIN milla_region m3_ ON m0_.region_id = m3_.id INNER JOIN milla_city m4_ ON m0_.city_id = m4_.id WHERE m0_.status = 1 ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0

      source code
      https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/Tools/Pagination/LimitSubqueryOutputWalker.php#L141

      why SELECT DISTINCT %s FROM (%s) dctrn_result ???
      why not SELECT DISTINCT m0_.id AS id0 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC LIMIT 20 OFFSET 0

      1. EXPLAIN.htmL
        7 kB
        Sergey Gerdel
      2. EXPLAIN1.html
        3 kB
        Sergey Gerdel
      3. EXPLAIN2.html
        5 kB
        Sergey Gerdel
      4. EXPLAIN3.htm
        4 kB
        Sergey Gerdel
      5. EXPLAIN4.htm
        3 kB
        Sergey Gerdel

        Activity

        Sergey Gerdel created issue -
        Sergey Gerdel made changes -
        Field Original Value New Value
        Description Hi.
        In mysql db table i have > 200,000 items.
        I use native doctrine pagination for paging the items list.
        But generated query that gets ids for items list in paging works more then 150 sec on my workstation

        SELECT DISTINCT id0 FROM (SELECT m0_.id AS id0, m0_.title AS title1, m0_.text AS text2, m0_.price AS price3, m0_.originalPrice AS originalPrice4, m0_.condition_type AS condition_type5, m0_.image_1 AS image_16, m0_.image_2 AS image_27, m0_.image_3 AS image_38, m0_.image_4 AS image_49, m0_.image_5 AS image_510, m0_.video AS video11, m0_.contact_email AS contact_email12, m0_.contact_name AS contact_name13, m0_.contact_phone AS contact_phone14, m0_.contact_type AS contact_type15, m0_.published AS published16, m0_.type AS type17, m0_.status AS status18, m0_.highlight AS highlight19, m0_.urgent AS urgent20, m0_.topads AS topads21, m0_.period AS period22, m0_.hits AS hits23, m0_.ip AS ip24, m0_.created_at AS created_at25, m0_.updated_at AS updated_at26 FROM milla_message m0_ INNER JOIN milla_currency m1_ ON m0_.currency_id = m1_.id INNER JOIN milla_category m2_ ON m0_.category_id = m2_.id INNER JOIN milla_region m3_ ON m0_.region_id = m3_.id INNER JOIN milla_city m4_ ON m0_.city_id = m4_.id WHERE m0_.status = 1 ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0

        source code
        https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/Tools/Pagination/LimitSubqueryOutputWalker.php#L141

        why SELECT DISTINCT %s FROM (%s) dctrn_result ???
        why not SELECT DISTINCT m0_.id AS id0, m0_.id AS id1 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC LIMIT 20 OFFSET 0
        Hi.
        In mysql db table i have > 200,000 items.
        I use native doctrine pagination for paging the items list.
        But generated query that gets ids for items list in paging works more then 150 sec on my workstation

        SELECT DISTINCT id0 FROM (SELECT m0_.id AS id0, m0_.title AS title1, m0_.text AS text2, m0_.price AS price3, m0_.originalPrice AS originalPrice4, m0_.condition_type AS condition_type5, m0_.image_1 AS image_16, m0_.image_2 AS image_27, m0_.image_3 AS image_38, m0_.image_4 AS image_49, m0_.image_5 AS image_510, m0_.video AS video11, m0_.contact_email AS contact_email12, m0_.contact_name AS contact_name13, m0_.contact_phone AS contact_phone14, m0_.contact_type AS contact_type15, m0_.published AS published16, m0_.type AS type17, m0_.status AS status18, m0_.highlight AS highlight19, m0_.urgent AS urgent20, m0_.topads AS topads21, m0_.period AS period22, m0_.hits AS hits23, m0_.ip AS ip24, m0_.created_at AS created_at25, m0_.updated_at AS updated_at26 FROM milla_message m0_ INNER JOIN milla_currency m1_ ON m0_.currency_id = m1_.id INNER JOIN milla_category m2_ ON m0_.category_id = m2_.id INNER JOIN milla_region m3_ ON m0_.region_id = m3_.id INNER JOIN milla_city m4_ ON m0_.city_id = m4_.id WHERE m0_.status = 1 ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0

        source code
        https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/Tools/Pagination/LimitSubqueryOutputWalker.php#L141

        why SELECT DISTINCT %s FROM (%s) dctrn_result ???
        why not SELECT DISTINCT m0_.id AS id0 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC LIMIT 20 OFFSET 0
        Hide
        Marco Pivetta added a comment -

        Not a blocker

        Show
        Marco Pivetta added a comment - Not a blocker
        Marco Pivetta made changes -
        Assignee Benjamin Eberlei [ beberlei ] Marco Pivetta [ ocramius ]
        Priority Blocker [ 1 ] Major [ 3 ]
        Hide
        Marco Pivetta added a comment -

        What's the result of `EXPLAIN` on a query without the subquery?

        Show
        Marco Pivetta added a comment - What's the result of `EXPLAIN` on a query without the subquery?
        Hide
        Sergey Gerdel added a comment -

        explain without the subquery

        Show
        Sergey Gerdel added a comment - explain without the subquery
        Sergey Gerdel made changes -
        Attachment EXPLAIN1.html [ 11515 ]
        Hide
        Marco Pivetta added a comment -

        Sergey Gerdel that's not the same query.

        Show
        Marco Pivetta added a comment - Sergey Gerdel that's not the same query.
        Sergey Gerdel made changes -
        Attachment EXPLAIN2.html [ 11516 ]
        Sergey Gerdel made changes -
        Attachment EXPLAIN2.html [ 11517 ]
        Sergey Gerdel made changes -
        Attachment EXPLAIN2.html [ 11516 ]
        Hide
        Marco Pivetta added a comment -

        Sergey Gerdel this is still using

        Using index; Using temporary; Using filesort
        

        Check your indexes

        Show
        Marco Pivetta added a comment - Sergey Gerdel this is still using Using index; Using temporary; Using filesort Check your indexes
        Hide
        Sergey Gerdel added a comment -

        Not in the index problem

        SELECT DISTINCT id0 FROM (SELECT m0_.id AS id0, m0_.title AS title1, m0_.text AS text2, m0_.price AS price3, m0_.originalPrice AS originalPrice4, m0_.condition_type AS condition_type5, m0_.image_1 AS image_16, m0_.image_2 AS image_27, m0_.image_3 AS image_38, m0_.image_4 AS image_49, m0_.image_5 AS image_510, m0_.video AS video11, m0_.contact_email AS contact_email12, m0_.contact_name AS contact_name13, m0_.contact_phone AS contact_phone14, m0_.contact_type AS contact_type15, m0_.published AS published16, m0_.type AS type17, m0_.status AS status18, m0_.highlight AS highlight19, m0_.urgent AS urgent20, m0_.topads AS topads21, m0_.period AS period22, m0_.hits AS hits23, m0_.ip AS ip24, m0_.created_at AS created_at25, m0_.updated_at AS updated_at26 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0

        Time: 104.614s explain 3

        SELECT DISTINCT m0_.id AS id0 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC LIMIT 20 OFFSET 0;

        Time: 0.001s explain 4

        Show
        Sergey Gerdel added a comment - Not in the index problem SELECT DISTINCT id0 FROM (SELECT m0_.id AS id0, m0_.title AS title1, m0_.text AS text2, m0_.price AS price3, m0_.originalPrice AS originalPrice4, m0_.condition_type AS condition_type5, m0_.image_1 AS image_16, m0_.image_2 AS image_27, m0_.image_3 AS image_38, m0_.image_4 AS image_49, m0_.image_5 AS image_510, m0_.video AS video11, m0_.contact_email AS contact_email12, m0_.contact_name AS contact_name13, m0_.contact_phone AS contact_phone14, m0_.contact_type AS contact_type15, m0_.published AS published16, m0_.type AS type17, m0_.status AS status18, m0_.highlight AS highlight19, m0_.urgent AS urgent20, m0_.topads AS topads21, m0_.period AS period22, m0_.hits AS hits23, m0_.ip AS ip24, m0_.created_at AS created_at25, m0_.updated_at AS updated_at26 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0 Time: 104.614s explain 3 SELECT DISTINCT m0_.id AS id0 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC LIMIT 20 OFFSET 0; Time: 0.001s explain 4
        Sergey Gerdel made changes -
        Attachment EXPLAIN3.htm [ 11518 ]
        Attachment EXPLAIN4.htm [ 11519 ]
        Hide
        Marco Pivetta added a comment -

        Sergey Gerdel the ORM cannot simplify a complex query that way. There may be a conditional on one of the joined results, or generally usage of one of the joined results.

        Things that could be optimized here are:

        • Removal of the `ORDER BY` clause when grouping (check ORM master, I think somebody already did that)
        • Trying to simplify the query by doing some serious hacking on the AST.

        The problem I see here is that the chance to spawn random bugs because of the optimization is very high, and you'd have to rewrite `walkSelectStatement`

        Show
        Marco Pivetta added a comment - Sergey Gerdel the ORM cannot simplify a complex query that way. There may be a conditional on one of the joined results, or generally usage of one of the joined results. Things that could be optimized here are: Removal of the `ORDER BY` clause when grouping (check ORM master, I think somebody already did that) Trying to simplify the query by doing some serious hacking on the AST. The problem I see here is that the chance to spawn random bugs because of the optimization is very high, and you'd have to rewrite `walkSelectStatement`
        Hide
        Marco Pivetta added a comment -

        Marking as improvement

        Show
        Marco Pivetta added a comment - Marking as improvement
        Marco Pivetta made changes -
        Summary Pagination query goes work for 150sec Pagination query can be simplified when simple joins are applied
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Priority Major [ 3 ] Minor [ 4 ]
        Hide
        Sergey Gerdel added a comment -

        Minor?
        i have 100 sec for this query.
        200k items are selected for temporary table. wtf?

        OK. Programmers may be mistaken in parser
        expect ORDER BY m0_.published DESC LIMIT 20 OFFSET 0) dctrn_result
        Time: 0.001s

        reality ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0

        Show
        Sergey Gerdel added a comment - Minor? i have 100 sec for this query. 200k items are selected for temporary table. wtf? OK. Programmers may be mistaken in parser expect ORDER BY m0_.published DESC LIMIT 20 OFFSET 0) dctrn_result Time: 0.001s reality ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0
        Hide
        Marco Pivetta added a comment - - edited

        Sergey Gerdel this problem does not introduce security issues and can be worked around by you while using your own pagination logic. It does not stop you from doing anything, that's why it's minor.

        Show
        Marco Pivetta added a comment - - edited Sergey Gerdel this problem does not introduce security issues and can be worked around by you while using your own pagination logic. It does not stop you from doing anything, that's why it's minor.
        Hide
        Sergey Gerdel added a comment -

        ok)
        i have already created my own paginator.
        at last
        please see how to fix this problem
        https://github.com/Sergic/doctrine2/commit/2733c815387273d3bd199a68acb717e0cbc8ccfe

        Show
        Sergey Gerdel added a comment - ok) i have already created my own paginator. at last please see how to fix this problem https://github.com/Sergic/doctrine2/commit/2733c815387273d3bd199a68acb717e0cbc8ccfe
        Hide
        Tom Pryor added a comment -

        I've also run into this problem which makes Doctrine's Paginator useless for large datasets. The actual query takes 0.002ms but the SELECT DISTINCT query doctrine executes takes over 30s because MySQL creates a temporary table with 200k+ records.

        You don't need to remove the joined tables from the paginator query (I have conditions on the joined tables anyway), this has a negligible impact performance, but rather it is caused by SELECT DISTINCT and ORDER BY which no index configuration can solve. Rather, perhaps a flag could be added to the paginator to indicate my query does not fetch join any has many collections (i.e each row returned will be unique) negating the need for the SELECT DISTINCT. The pagination would only then need to perform the original query with LIMIT and OFFSET applied along with a separate COUNT query on the primary key, both of which are very fast as they'd use the indexes setup for the the original query.

        Show
        Tom Pryor added a comment - I've also run into this problem which makes Doctrine's Paginator useless for large datasets. The actual query takes 0.002ms but the SELECT DISTINCT query doctrine executes takes over 30s because MySQL creates a temporary table with 200k+ records. You don't need to remove the joined tables from the paginator query (I have conditions on the joined tables anyway), this has a negligible impact performance, but rather it is caused by SELECT DISTINCT and ORDER BY which no index configuration can solve. Rather, perhaps a flag could be added to the paginator to indicate my query does not fetch join any has many collections (i.e each row returned will be unique) negating the need for the SELECT DISTINCT. The pagination would only then need to perform the original query with LIMIT and OFFSET applied along with a separate COUNT query on the primary key, both of which are very fast as they'd use the indexes setup for the the original query.
        Hide
        Christophe Coevoet added a comment -

        This flag already exists for the select query. See the second argument of the constructor.

        For the count query, you should call $paginator->setUseOutputWalkers(false) to make it use a DQL AST walker instead of the SQL Output walker (the AST walker does not support counting on queries using HAVING which is why it is not selected by default)

        Show
        Christophe Coevoet added a comment - This flag already exists for the select query. See the second argument of the constructor. For the count query, you should call $paginator->setUseOutputWalkers(false) to make it use a DQL AST walker instead of the SQL Output walker (the AST walker does not support counting on queries using HAVING which is why it is not selected by default)
        Hide
        Robert (Jamie) Munro added a comment -

        I think this is more important than "minor", as I've experienced this when upgrading from 2.2. My site became unusably slow.

        I can't easily work around it because I am using Symfony bundles that use this, I am not using this directly. None of the workarounds mentioned so far seem to have helped.

        Show
        Robert (Jamie) Munro added a comment - I think this is more important than "minor", as I've experienced this when upgrading from 2.2. My site became unusably slow. I can't easily work around it because I am using Symfony bundles that use this, I am not using this directly. None of the workarounds mentioned so far seem to have helped.

        This list may be incomplete, as errors occurred whilst retrieving source from linked applications:

        • Request to http://www.doctrine-project.org/fisheye/ failed: Error in remote call to 'FishEye 0 (http://www.doctrine-project.org/fisheye/)' (http://www.doctrine-project.org/fisheye) [AbstractRestCommand{path='/rest-service-fe/search-v1/crossRepositoryQuery', params={query=DDC-2381, expand=changesets[0:20].revisions[0:29],reviews}, methodType=GET}] : Received status code 503 (Service Temporarily Unavailable)

          People

          • Assignee:
            Marco Pivetta
            Reporter:
            Sergey Gerdel
          • Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated: