Doctrine 2 - ORM
  1. Doctrine 2 - ORM
  2. DDC-821

Consider adding Query-Join as another join method for DQL

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.x
    • Component/s: ORM
    • Security Level: All
    • Labels:
      None

      Description

      Some ORM systems support an alternative to fetch-join queries, called a "query-join". See http://www.avaje.org/ebean/introquery_joinquery.html.

      A query-join accomplishes the same as a fetch-join (hydrating a larger object graph across all associations types) but executes more than one SQL query in sequence in order to hydrate the requested portions of the graph in a result set. The first query retrieves data from the base entity/table and the next queries retrieve the data for the requested associations.

      In some cases this approach is more efficient to a fetch-join:

      (1) No data duplication in the SQL result as occurs in a fetch-join on to-many associations. Instead, this data is loaded through a second query. This saves network traffic, memory and general overhead in hydrating the returned results. In the case where large TEXT data is included in result sets, the savings here may be substantial.

      (2) setFirstResult() and setMaxResult() are again effective (for pagination) and more importantly more efficient on these query-joins. The current DoctrineExtension solution to enable pagination on fetch-joins requires a series of queries to determine the target primary keys of the root entity of the query. The primary key lookup query requires DISTINCT or GROUP BY – which often triggers filesorts, temporary tables, etc (at least on MySQL) and greatly slows down the query. Query joins would not require this.

      Possible implementation example:

      // existing fetch-join
      $query = $em->createQuery('SELECT c, o FROM Customers c JOIN c.orders o');
      $query->setFirstResult(10)->setMaxResult(20); // doesn't do what you'd hope it would do, no ability to use this for pagination
      $customersAndOrders = $query->getResult(); // array of Customer objects with Orders hydrated
      
      // proposed query-join
      $query = $em->createQuery('SELECT c, o FROM Customers c QUERY JOIN c.orders o');
      $query->setFirstResult(10)->setMaxResult(20); // now works for pagination
      $customersAndOrders = $query->getResult(); // array of Customer objects with Orders hyrdated
      // this would execute a series of queries -- i.e. in SQL
      // SELECT ... FROM customers LIMIT 10, 20
      // SELECT ... FROM orders WHERE customer_id IN (.....)
      

      and/or, could there be a way to trigger a "query-join" against an existing array of entities? for example

      $query = $em->createQuery('SELECT c FROM Customers c'); // single query to fetch customers
      $customers = $query->getResult(); // array of Customer objects
      $em->join($customers, 'orders'); // fetch and hydrate the 'orders' association on each Customer using a single query
      

      Perhaps at some point in the future Doctrine/DBAL could even make use of asynchronous queries (i.e. mysqlnd supports this) to allow these query-joins to run in parallel and the result would be more efficient paginated resultsets.

      Thoughts/feedback?

        Activity

        Marc Hodgins created issue -
        Marc Hodgins made changes -
        Field Original Value New Value
        Description Some ORM systems support an alternative to fetch-join queries, called a "query-join". See [http://www.avaje.org/ebean/introquery_joinquery.html].

        A query-join accomplishes the same as a fetch-join (hydrating a larger object graph across all associations types) but executes more than one SQL query in sequence in order to hydrate the requested portions of the graph in a result set. The first query retrieves data from the base entity/table and the next queries retrieve the data for the requested associations.

        In some cases this approach is more efficient to a fetch-join:

        (1) *No data duplication in the SQL result* as occurs in a fetch-join on to-many associations. Instead, this data is loaded through a second query. This saves network traffic, memory and general overhead in hydrating the returned results. In the case where large TEXT data is included in result sets, the savings here may be substantial.

        (2) *setFirstResult() and setMaxResult() are again effective (for pagination)* and more importantly more efficient on these query-joins. The current DoctrineExtension solution to enable pagination on fetch-joins requires a series of queries to determine the target primary keys of the root entity of the query. The primary key lookup query requires DISTINCT or GROUP BY -- which often triggers filesorts, temporary tables, etc (at least on MySQL) and greatly slows down the query. Query joins would not require this.

        Possible implementation example:
        {code}

        // existing fetch-join
        $query = $em->createQuery('SELECT c, o FROM Customers c JOIN c.orders o');
        $query->setFirstResult(10)->setMaxResult(20); // doesn't do what you'd hope it would do
        $customersAndOrders = $query->getResult(); // array of Customer objects with Orders hydrated

        // proposed query-join
        $query = $em->createQuery('SELECT c, o FROM Customers c QUERY JOIN c.orders o');
        // this would have executed a series of queries -- i.e. in SQL
        // SELECT ... FROM customers
        // SELECT ... FROM orders WHERE customer_id IN (.....)
        $query->setFirstResult(10)->setMaxResult(20); // now works for pagination
        $customersAndOrders = $query->getResult(); // array of Customer objects with Orders hyrdated
        {code}

        and/or, could there be a way to trigger a "query-join" against an existing array of entities? for example

        {code}
        $query = $em->createQuery('SELECT c FROM Customers c'); // single query to fetch customers
        $customers = $query->getResult(); // array of Customer objects
        $em->join($customers, 'orders'); // fetch and hydrate the 'orders' association on each Customer using a single query
        {code}

        Perhaps at some point in the future Doctrine/DBAL could even make use of asynchronous queries (i.e. mysqlnd supports this) to allow these query-joins to run in parallel and the result would be more efficient paginated resultsets.

        Thoughts/feedback?
        Some ORM systems support an alternative to fetch-join queries, called a "query-join". See [http://www.avaje.org/ebean/introquery_joinquery.html].

        A query-join accomplishes the same as a fetch-join (hydrating a larger object graph across all associations types) but executes more than one SQL query in sequence in order to hydrate the requested portions of the graph in a result set. The first query retrieves data from the base entity/table and the next queries retrieve the data for the requested associations.

        In some cases this approach is more efficient to a fetch-join:

        (1) *No data duplication in the SQL result* as occurs in a fetch-join on to-many associations. Instead, this data is loaded through a second query. This saves network traffic, memory and general overhead in hydrating the returned results. In the case where large TEXT data is included in result sets, the savings here may be substantial.

        (2) *setFirstResult() and setMaxResult() are again effective (for pagination)* and more importantly more efficient on these query-joins. The current DoctrineExtension solution to enable pagination on fetch-joins requires a series of queries to determine the target primary keys of the root entity of the query. The primary key lookup query requires DISTINCT or GROUP BY -- which often triggers filesorts, temporary tables, etc (at least on MySQL) and greatly slows down the query. Query joins would not require this.

        Possible implementation example:
        {code}

        // existing fetch-join
        $query = $em->createQuery('SELECT c, o FROM Customers c JOIN c.orders o');
        $query->setFirstResult(10)->setMaxResult(20); // doesn't do what you'd hope it would do, no ability to use this for pagination
        $customersAndOrders = $query->getResult(); // array of Customer objects with Orders hydrated

        // proposed query-join
        $query = $em->createQuery('SELECT c, o FROM Customers c QUERY JOIN c.orders o');
        $query->setFirstResult(10)->setMaxResult(20); // now works for pagination
        $customersAndOrders = $query->getResult(); // array of Customer objects with Orders hyrdated
        // this would execute a series of queries -- i.e. in SQL
        // SELECT ... FROM customers LIMIT 10, 20
        // SELECT ... FROM orders WHERE customer_id IN (.....)
        {code}

        and/or, could there be a way to trigger a "query-join" against an existing array of entities? for example

        {code}
        $query = $em->createQuery('SELECT c FROM Customers c'); // single query to fetch customers
        $customers = $query->getResult(); // array of Customer objects
        $em->join($customers, 'orders'); // fetch and hydrate the 'orders' association on each Customer using a single query
        {code}

        Perhaps at some point in the future Doctrine/DBAL could even make use of asynchronous queries (i.e. mysqlnd supports this) to allow these query-joins to run in parallel and the result would be more efficient paginated resultsets.

        Thoughts/feedback?
        Hide
        Benjamin Eberlei added a comment -

        There is another approach for this using several subqueries to build an IN clause, the Paginator extension supports this: http://github.com/beberlei/DoctrineExtensions

        I rather go the extension approach than changing the DQL for this feature.

        Show
        Benjamin Eberlei added a comment - There is another approach for this using several subqueries to build an IN clause, the Paginator extension supports this: http://github.com/beberlei/DoctrineExtensions I rather go the extension approach than changing the DQL for this feature.
        Hide
        Benjamin Eberlei added a comment -

        I just saw your second example, that is rather cool though and gets +1 from me.

        I had the same idea for "not initialized proxies", i.e.

        $em->getUnitOfWork()->initializeProxies('Customer');
        
        Show
        Benjamin Eberlei added a comment - I just saw your second example, that is rather cool though and gets +1 from me. I had the same idea for "not initialized proxies", i.e. $em->getUnitOfWork()->initializeProxies('Customer');
        Hide
        Marc Hodgins added a comment -

        Second example is a duplicate of DDC-734

        Show
        Marc Hodgins added a comment - Second example is a duplicate of DDC-734
        Benjamin Eberlei made changes -
        Workflow jira [ 11962 ] jira-feedback [ 13879 ]
        Benjamin Eberlei made changes -
        Workflow jira-feedback [ 13879 ] jira-feedback2 [ 15743 ]
        Benjamin Eberlei made changes -
        Workflow jira-feedback2 [ 15743 ] jira-feedback3 [ 18000 ]
        Guilherme Blanco made changes -
        Assignee Roman S. Borschel [ romanb ] Guilherme Blanco [ guilhermeblanco ]
        Hide
        Guilherme Blanco added a comment -

        Fixed based on this related issue: http://www.doctrine-project.org/jira/browse/DDC-734

        Show
        Guilherme Blanco added a comment - Fixed based on this related issue: http://www.doctrine-project.org/jira/browse/DDC-734
        Guilherme Blanco made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Resolution Fixed [ 1 ]

        This list may be incomplete, as errors occurred whilst retrieving source from linked applications:

        • Request to http://www.doctrine-project.org/fisheye/ failed: Error in remote call to 'FishEye 0 (http://www.doctrine-project.org/fisheye/)' (http://www.doctrine-project.org/fisheye) [AbstractRestCommand{path='/rest-service-fe/search-v1/crossRepositoryQuery', params={query=DDC-821, expand=changesets[0:20].revisions[0:29],reviews}, methodType=GET}] : Received status code 503 (Service Temporarily Unavailable)

          People

          • Assignee:
            Guilherme Blanco
            Reporter:
            Marc Hodgins
          • Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: