[DDC-2381] Pagination query can be simplified when simple joins are applied Created: 31/Mar/13  Updated: 29/Nov/13

Status: Open
Project: Doctrine 2 - ORM
Component/s: ORM
Affects Version/s: 2.3, 2.4
Fix Version/s: None
Security Level: All

Type: Improvement Priority: Minor
Reporter: Sergey Gerdel Assignee: Marco Pivetta
Resolution: Unresolved Votes: 2
Labels: paginator

Attachments: HTML File EXPLAIN.htmL     HTML File EXPLAIN1.html     HTML File EXPLAIN2.html     HTML File EXPLAIN3.htm     HTML File EXPLAIN4.htm    

 Description   

Hi.
In mysql db table i have > 200,000 items.
I use native doctrine pagination for paging the items list.
But generated query that gets ids for items list in paging works more then 150 sec on my workstation

SELECT DISTINCT id0 FROM (SELECT m0_.id AS id0, m0_.title AS title1, m0_.text AS text2, m0_.price AS price3, m0_.originalPrice AS originalPrice4, m0_.condition_type AS condition_type5, m0_.image_1 AS image_16, m0_.image_2 AS image_27, m0_.image_3 AS image_38, m0_.image_4 AS image_49, m0_.image_5 AS image_510, m0_.video AS video11, m0_.contact_email AS contact_email12, m0_.contact_name AS contact_name13, m0_.contact_phone AS contact_phone14, m0_.contact_type AS contact_type15, m0_.published AS published16, m0_.type AS type17, m0_.status AS status18, m0_.highlight AS highlight19, m0_.urgent AS urgent20, m0_.topads AS topads21, m0_.period AS period22, m0_.hits AS hits23, m0_.ip AS ip24, m0_.created_at AS created_at25, m0_.updated_at AS updated_at26 FROM milla_message m0_ INNER JOIN milla_currency m1_ ON m0_.currency_id = m1_.id INNER JOIN milla_category m2_ ON m0_.category_id = m2_.id INNER JOIN milla_region m3_ ON m0_.region_id = m3_.id INNER JOIN milla_city m4_ ON m0_.city_id = m4_.id WHERE m0_.status = 1 ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0

source code
https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/Tools/Pagination/LimitSubqueryOutputWalker.php#L141

why SELECT DISTINCT %s FROM (%s) dctrn_result ???
why not SELECT DISTINCT m0_.id AS id0 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC LIMIT 20 OFFSET 0



 Comments   
Comment by Marco Pivetta [ 31/Mar/13 ]

Not a blocker

Comment by Marco Pivetta [ 31/Mar/13 ]

What's the result of `EXPLAIN` on a query without the subquery?

Comment by Sergey Gerdel [ 31/Mar/13 ]

explain without the subquery

Comment by Marco Pivetta [ 31/Mar/13 ]

Sergey Gerdel that's not the same query.

Comment by Marco Pivetta [ 31/Mar/13 ]

Sergey Gerdel this is still using

Using index; Using temporary; Using filesort

Check your indexes

Comment by Sergey Gerdel [ 31/Mar/13 ]

Not in the index problem

SELECT DISTINCT id0 FROM (SELECT m0_.id AS id0, m0_.title AS title1, m0_.text AS text2, m0_.price AS price3, m0_.originalPrice AS originalPrice4, m0_.condition_type AS condition_type5, m0_.image_1 AS image_16, m0_.image_2 AS image_27, m0_.image_3 AS image_38, m0_.image_4 AS image_49, m0_.image_5 AS image_510, m0_.video AS video11, m0_.contact_email AS contact_email12, m0_.contact_name AS contact_name13, m0_.contact_phone AS contact_phone14, m0_.contact_type AS contact_type15, m0_.published AS published16, m0_.type AS type17, m0_.status AS status18, m0_.highlight AS highlight19, m0_.urgent AS urgent20, m0_.topads AS topads21, m0_.period AS period22, m0_.hits AS hits23, m0_.ip AS ip24, m0_.created_at AS created_at25, m0_.updated_at AS updated_at26 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0

Time: 104.614s explain 3

SELECT DISTINCT m0_.id AS id0 FROM milla_message m0_ WHERE m0_.status = 1 ORDER BY m0_.published DESC LIMIT 20 OFFSET 0;

Time: 0.001s explain 4

Comment by Marco Pivetta [ 01/Apr/13 ]

Sergey Gerdel the ORM cannot simplify a complex query that way. There may be a conditional on one of the joined results, or generally usage of one of the joined results.

Things that could be optimized here are:

  • Removal of the `ORDER BY` clause when grouping (check ORM master, I think somebody already did that)
  • Trying to simplify the query by doing some serious hacking on the AST.

The problem I see here is that the chance to spawn random bugs because of the optimization is very high, and you'd have to rewrite `walkSelectStatement`

Comment by Marco Pivetta [ 01/Apr/13 ]

Marking as improvement

Comment by Sergey Gerdel [ 07/Apr/13 ]

Minor?
i have 100 sec for this query.
200k items are selected for temporary table. wtf?

OK. Programmers may be mistaken in parser
expect ORDER BY m0_.published DESC LIMIT 20 OFFSET 0) dctrn_result
Time: 0.001s

reality ORDER BY m0_.published DESC) dctrn_result LIMIT 20 OFFSET 0

Comment by Marco Pivetta [ 07/Apr/13 ]

Sergey Gerdel this problem does not introduce security issues and can be worked around by you while using your own pagination logic. It does not stop you from doing anything, that's why it's minor.

Comment by Sergey Gerdel [ 08/Apr/13 ]

ok)
i have already created my own paginator.
at last
please see how to fix this problem
https://github.com/Sergic/doctrine2/commit/2733c815387273d3bd199a68acb717e0cbc8ccfe

Comment by Tom Pryor [ 11/Oct/13 ]

I've also run into this problem which makes Doctrine's Paginator useless for large datasets. The actual query takes 0.002ms but the SELECT DISTINCT query doctrine executes takes over 30s because MySQL creates a temporary table with 200k+ records.

You don't need to remove the joined tables from the paginator query (I have conditions on the joined tables anyway), this has a negligible impact performance, but rather it is caused by SELECT DISTINCT and ORDER BY which no index configuration can solve. Rather, perhaps a flag could be added to the paginator to indicate my query does not fetch join any has many collections (i.e each row returned will be unique) negating the need for the SELECT DISTINCT. The pagination would only then need to perform the original query with LIMIT and OFFSET applied along with a separate COUNT query on the primary key, both of which are very fast as they'd use the indexes setup for the the original query.

Comment by Christophe Coevoet [ 11/Oct/13 ]

This flag already exists for the select query. See the second argument of the constructor.

For the count query, you should call $paginator->setUseOutputWalkers(false) to make it use a DQL AST walker instead of the SQL Output walker (the AST walker does not support counting on queries using HAVING which is why it is not selected by default)

Comment by Robert (Jamie) Munro [ 29/Nov/13 ]

I think this is more important than "minor", as I've experienced this when upgrading from 2.2. My site became unusably slow.

I can't easily work around it because I am using Symfony bundles that use this, I am not using this directly. None of the workarounds mentioned so far seem to have helped.

Generated at Wed Aug 27 10:56:52 UTC 2014 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.