Working with Large Collections in Doctrine2

Posted about 1 year ago by beberlei

If you access a collection of Entity A pointing to Entity B, Doctrine2 always initializes the complete collection for you. For small collections up to around 100 entities this won't be a problem, however as soon as collections get (much) bigger than this you can get into serious trouble.

By default Doctrine2 can only optimize adding new entities to a collection for you. This operation does not initialize the collection. This will only get you bigger collections though, reading them is still a pain.

We already got requests from several development teams for better functionality in this regard and we are planning to add a solution to this problem that is not constraining your domain model with technical blurp. However this solution is currently on our schedule for the 2.1 release of Doctrine only.

Until then I wrote a very little extension for Doctrine2 that allows you to work with large collections. It has two methods that compute the following data for any given PersistentCollection:

  • Total Number of Elements in the Collection
  • A slice of entities from the collection using a sql limit (or alternative)

You can get this extension from the DoctrineExtensions Github repository.

Working with a LargeCollection

The LargeCollection class is a handler to work with large PersistentCollections. You can instantiate it by passing an EntityManager instance:

<?php

use DoctrineExtensions\LargeCollections\LargeCollection;

$lc = new LargeCollection($em);

LargeCollection only works with instances of PersistentCollection, not with other implementations of the Doctrine\Common\Collections\Collection interface. That means that you can only pass collections to it, whose owning entities have been persisted before or are retrieved from the EntityManager.

You can compute the total number of elements in a given collection by passing it to the count method:

<?php

$size = $lc->count($article->getComments());

You can retrieve a slice of entities from the collection by calling:

<?php

$slice = $lc->getSliceQuery($article->getComments(), $limit = 30);

As you can see this is very simple to use, but also missing some bits:

  • In your domain models you sometimes don't want to return the Collections instance but call toArray() to encapsulate the Collections API inside the Entity. For this two new methods are required to access to the persistent collections from the inside of an entity.
  • The remove, removeElement, contains and containsKey methods could also be added to the large collection handler, making direct calls to the underlying UnitOfWork API.
  • A method that returns an IterableResult for any given collection. This would allow to iterate the complete collection on a row-by-row basis, which would eliminate possible max memory problems compared to the complete hydration of a collection.
  • Methods link()/unlink() like described in DDC-128

I hope I got your attention and maybe someone has an interest in extending the LargeCollection a little bit more.


Comments (6) [ add comment ]

LargeCollection Posted by Marc about about 1 year ago.

Thanks for the LargeCollection extension. I really was looking for the functionality to limit a result with Doctrine2.

After looking through the API a few times, I really thought, such methods weren't there yet (was looking for limit($limit, $offset = 0) or sth. similar).

Who would have thought that the methods are called setFirstResult() and setMaxResults() - I'm really on the blind side sometimes.

Thanks for the indirect tip!

Marc

ugh Posted by jon about about 1 year ago.

Anyone knows that typical web sites have a need to work with large collections of data at one point or another. It's the typical response of the doctrine team to completely overlook this use case, and state it isn't planned for a future release, leaving those using doctrine in production applications without a solution and up shit creek.

re jon Posted by jwage about about 1 year ago.

The blog post clearly states that the functionality is planned for 2.1 and until then you can use the extension.

Btw, you are welcome for this free software that people work on for free in their spare time.

re jon Posted by beberlei about about 1 year ago.

All that the LargeCollection "extension" does is execute DQL queries, something that you can already do manually without these convenience methods.

Is Doctrine for Large Sites Posted by Ki about about 1 year ago.

I have an active site with text search on close to 1 mill active entries in mysql. Still trying to learn whether Doctrine is right for me. Running on CI. Anybody has experience with implementing Doctrine in large environments?

How about sorting ? Posted by makm about about 1 year ago.

mm.. I' think LargeCollections needed OrderBy ! ;)

Create Comment