Doctrine Performance Revisited

Posted on March 17, 2010 by beberlei

In our ever-lasting quest to provide a powerful, flexible and yet performant ORM experience we are often confronted with benchmarks and have been talking about performance topics since last year in several talks at many different conferences, and Roman has talked about his opinion on such benchmarks on this blog.

Recently Francois Zaninotto, lead developer of the soon to be released Propel 1.5 (currently in beta) wrote a blog post comparing performance mainly of the different Propel 1.x versions with and without caching and against a PDO benchmark. The benchmark also contains a test for Doctrine 1.2.

It is important to note that the PDO test only shows the "baseline" performance, that is, it does not even remotely "do the same thing" as the others. No object creation, no hydration of objects from result rows, no identity management, no change tracking, nothing. So dont get the numbers wrong. If you would want to get at least remotely the same result as the ORMs provide with a raw PDO/SQL "benchmark", you would need quite some custom coding and, if you dont want to copy/paste all day, introduce some abstraction.

The following scenarios are compared in the benchmark:

Scenario 1: Create a new Model object, set its columns, and save it. Tests Model object speed, and INSERT SQL generation.

Scenario 2: Lookup a record by its primary key. Tests basic query and hydration.

Scenario 3: Lookup a record using a complex query. Tests object query speed.

Scenario 4: Lookup 5 records on a simple criterion. Tests hydration speed.

Scenario 5: Lookup a record and hydrate it together with its related record in another table. Tests join hydration speed.

I reproduced the complete table of results here for comparison since my machine is generating very different overall times than the ones generated by Francois. Each Scenario is executed several times and the sum of execution times is printed. After each run the identity maps are wiped so that objects are not reused. All the tests use an SQLite In-Memory database, are run on PHP 5.3 and of course use an opcode cache (APC).

A first version of the corresponding Doctrine 2 benchmarks was added today to the SVN repository by Roman They can all be run from your machine directly after checkout.

Here are my results:

| Insert | findPk | complex| hydrate|  with  ||--------|--------|--------|--------|--------|

PDOTestSuite | 132 | 149 | 112 | 107 | 109 |

Propel14TestSuite | 953 | 436 | 133 | 270 | 280 |

Propel15aLa14TestSuite | 926 | 428 | 143 | 264 | 282 | : Propel15TestSuite | 923 | 558 | 171 | 356 | 385 |

Propel15WithCacheTestSuite | 932 | 463 | 189 | 342 | 327 | : Doctrine12TestSuite | 1673 | 2661 | 449 | 1710 | 1832 |

Doctrine12WithCacheTestSuite | 1903 | 1179 | 550 | 957 | 722 | : Doctrine2TestSuite | 165 | 426 | 412 | 1048 | 1042 |

Doctrine2WithCacheTestSuite | 176 | 423 | 148 | 606 | 383 |

These are the key observations for the Doctrine 2 results.

Doctrine 2 Insert Performance

This is mainly a result of the rather strange test. Its basically a mass-insert. All the insert tests seem to use a single database transaction, so its comparable to a mass-insert on a single request. As such the result is not surprising since we know that Doctrine 2 can effectively batch inserts. Mind you that mass-inserts are not really a focus of an ORM and not a realistic scenario in most applications. So take this test with a grain of salt, its a mass-insert test. If you're looking for the ORM with the fastest mass-inserts, you can stop now, you found it.

Doctrine 2 Find By Primary Key Performance

Doctrine 2 Find Entity By Primary Key performance seems to be roughly three times as slow as handcrafted PDO (that doesnt do anything besides executing the query, mind you...). The good results in this test, especially compared to Doctrine 1, come from the fact that there is not much abstraction for all kinds of find*() operations going on. SQL is created, executed and the results turned into objects without much hoopla.

Doctrine 2 Complex Query Performance

The complex query is a scalar count query. See the Doctrine 2 code for this scenario:

<?php$authors = $this->em->createQuery(    'SELECT count(a.id) AS num FROM Author a WHERE a.id > ?1 OR CONCAT(a.firstName, a.lastName) = ?2')->setParameter(1, $this->authors[array_rand($this->authors)]->id) ->setParameter(2, 'John Doe') ->getSingleScalarResult();

The getSingleScalarResult() method that executes the query uses a very minimalistic hydration mode that only grabs the first value of the first result column. Therefore in combination with the DQL to SQL Query Parser Cache (Doctrine2WithCacheTestSuite) we get a result almost as fast result as the PDO handcrafted scenario, because we essentially get the transformed SQL query from the cache for this DQL, execute it and grab the value.

Hydration Performance (Scenario 4 and 5)

In the field of hydration Doctrine 2 is either equally fast or seems "only" up to 40% slower than Propel 1.4 or Propel 1.5 based on the two scenarios. The main reason here is really only that since Doctrine 2 provides transparent persistence, it can not give lazy-loading through base classes, instead it needs to inject proxy objects as stubs into the entities. That simply means Doctrine needs to create more objects than propel, thats it. Note that once the objects would actually be lazy-loaded, Propel would need to create these objects, too. The difference is that Doctrine needs to create them beforehand. When they lazy-load, no new object is created, the proxies simply populate themselves with the data.

A main difference, however, is that the hydration code of Doctrine is completely generic. That means this same code can handle all kinds of different SQL results correctly, no matter how many nested joins, scalar values, aggregate values there are in the result and it can even deal with strangely ordered collections in result sets (You get such stuff with multiple order by clauses on different fields which order in different directions. Combine such ordering with joining collections and you get a pretty funky SQL result set).

The general approach in algorithms from the Doctrine 1.2 Hydrators were re-used in Doctrine 2. However, optimizations in the data structures and use of the fastest internal php methods (as fast as you can get with php, you know ;)) made it possible to optimize the code to yield the shown results.

Interesting here is maybe that Doctrine 2 without caching is all in all still a lot faster than Doctrine 1 with caching, so this looks like a good improvement. Furthermore, the query cache in Doctrine 2 is very effective and almost completely removes all the overhead of DQL. The query cache is what allows us to provide this extremely powerful abstraction that is immensely flexible. If you dont like DQL yet, you should read up on domain-specific languages and object query languages in particular. It's a gem and cornerstone of this project and if you dont like it we can't help you.

Hydration with non Object Results

Putting aside the boring Propel comparisons, lets get to something Doctrine-specific. Because we know that read performance is very important and object instances are not necessary all the time, Doctrine 2, just like Doctrine 1, provides many different levels of abstraction in-between objects and raw PDO/SQL result sets that you can go up and down as you wish.

The main two intermediate levels are array graphs and flat, scalar result sets (which are still not the same as the raw SQL result sets because type conversions and column name to field name conversions still take place).

These alternative result formats perform as follows:

| Insert | findPk | complex| hydrate|  with  ||--------|--------|--------|--------|--------|

Dc2ArrayHydrateTestSuite | 172 | 421 | 145 | 332 | 285 | Dc2ScalarHydrateTestSuite | 175 | 424 | 145 | 251 | 245 | Dc2WithoutProxiesTestSuite | 174 | 423 | 148 | 483 | 628 |

The first method "Without Proxies" still creates object instances, however, it does not replace loose ends of the object graph with lazy-load proxies. Be careful with such optimizations in practice because partial objects can be fragile to work with. The important point here is that different levels of optimization are there when needed, before you need to finally drop all abstraction and deal with PDO/SQL directly (which is not bad, you know, just often not very convenient, flexible and/or robust against refactorings or schema changes).

The Array Hydration (getArrayResult()) returns a nested array structure that is comparable to an object graph. Most of the time you can think of it as a performant read-only "view" of an object graph. In the case of Books with Authors the result looks like:

array(1) {  [0]=>  &array(5) {    ["id"]=>    int(1)    ["title"]=>    string(6) "Hello0"    ["isbn"]=>    string(4) "1234"    ["price"]=>    float(0)    ["author"]=>    &array(4) {      ["id"]=>      int(1143)      ["firstName"]=>      string(8) "John1142"      ["lastName"]=>      string(7) "Doe1142"      ["email"]=>      NULL    }  }

These array graphs can be built from basically any query. Its backed by roughly the same algorithm that allows the arbitrary object hydration with indefinite joins and even scalar and aggregate values in between.

In the case where your objects implement ArrayAccess, you can often use object and array results interchangeably without the need to update view code.

Conclusion

What that all means is mainly that if you have an application that looks (almost) exactly like the benchmarking code used here, then you (maybe) got some useful numbers to look at, otherwise ... not.

Apart from that we hope this convinces you that we're not wasting your CPU cycles on purpose. Doctrine 2 is a huge balancing act between flexibility, features and performance and it worked out well so far.