[MODM-92] Changing all data in embedMany collections can create duplicate embedded objects Created: 22/Oct/10  Updated: 23/Nov/10  Resolved: 23/Nov/10

Status: Resolved
Project: Doctrine MongoDB ODM
Component/s: Persister, UnitOfWork
Affects Version/s: None
Fix Version/s: 1.0.0BETA2

Type: Bug Priority: Major
Reporter: Jeremy Mikola Assignee: Jonathan H. Wage
Resolution: Fixed Votes: 0
Labels: None

MongoDB 1.6.3, with Mongo ODM master as of 10/22/2010


Repeatable in the following use case, when the default pushPull strategy is used for an embedMany collection:

  • Persist an object with a single embedded object in its embedMany collection. Flush/clear/refetch.
  • Clear the collection and add a new embedded object. Flush/clear/refetch.
  • The containing object's collection now has two copies of the replacement object from the previous step.

I investigated this as best I could, and traced the problem to BasicDocumentPersister::prepareUpdateData() and the data it receives from UnitOfWork's changeset.

The UoW changeset simply reports the old/new embedded collections as arrays of raw data. This makes BasicDocumentPersister unable to distinguish whether a $set or $pullAll/$pushAll is an appropriate course of action, since it cannot discern whether the embedded objects are actually different (by SPL object hash) or merely changed. Following the logic in prepareUpdateData(), we end up doing two things when processing the update for the collection field:

  • Since old !== new, and we're a collection field type, generate pull/push commands based on the delete/insert diffs.
  • Since we're embedded and new evalutes to true, recursively call prepareUpdateData() on ourself, which will then go on to generate set commands.

This chain of commands will then reach BasicDocumentPersister::update() later on and given the logic check within:

> if ((isset($update[$this->cmd . 'pushAll']) || isset($update[$this->cmd . 'pullAll'])) && isset($update[$this->cmd . 'set'])) ...

The $set command will be executed first, followed by $pullAll/$pushAll later. This $set actually changes the very data that $pullAll expects to remove, so $pullAll will do nothing but $pushAll will still insert a copy of our new data. The end result is that we now have two copies of our new data in the collection.

Comment by Jeremy Mikola [ 22/Oct/10 ]

Failing test case here: http://github.com/jmikola/mongodb-odm/commit/336a700ac6fe0cdf013fc864d8e70f43cb5b73e2
Pull request here: http://github.com/doctrine/mongodb-odm/pull/37

Temporary fix is to switch the mapping strategy to "set" instead of the default "pushPull"

I spoke with Bulat and believe the long-term fix for this will be having UoW preserve the SPL object hash of embedded objects in changsets, so that we can compare something more than arrays when deciding what MongoDB commands to utilize. For cases such as this, the $set is superfluous in my opinion - it should only be used if the same object remained in the collection and had its value(s) changed.

Comment by Jonathan H. Wage [ 23/Nov/10 ]

Thanks! This is now fixed with the refactoring we did the persisters last week!

Generated at Sun Jan 25 14:24:36 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.