Repeatable in the following use case, when the default pushPull strategy is used for an embedMany collection:
- Persist an object with a single embedded object in its embedMany collection. Flush/clear/refetch.
- Clear the collection and add a new embedded object. Flush/clear/refetch.
- The containing object's collection now has two copies of the replacement object from the previous step.
I investigated this as best I could, and traced the problem to BasicDocumentPersister::prepareUpdateData() and the data it receives from UnitOfWork's changeset.
The UoW changeset simply reports the old/new embedded collections as arrays of raw data. This makes BasicDocumentPersister unable to distinguish whether a $set or $pullAll/$pushAll is an appropriate course of action, since it cannot discern whether the embedded objects are actually different (by SPL object hash) or merely changed. Following the logic in prepareUpdateData(), we end up doing two things when processing the update for the collection field:
- Since old !== new, and we're a collection field type, generate pull/push commands based on the delete/insert diffs.
- Since we're embedded and new evalutes to true, recursively call prepareUpdateData() on ourself, which will then go on to generate set commands.
This chain of commands will then reach BasicDocumentPersister::update() later on and given the logic check within:
> if ((isset($update[$this->cmd . 'pushAll']) || isset($update[$this->cmd . 'pullAll'])) && isset($update[$this->cmd . 'set'])) ...
The $set command will be executed first, followed by $pullAll/$pushAll later. This $set actually changes the very data that $pullAll expects to remove, so $pullAll will do nothing but $pushAll will still insert a copy of our new data. The end result is that we now have two copies of our new data in the collection.