[DDC-763] Cascade merge on associated entities can insert too many rows through "Persistence by Reachability" Created: 23/Aug/10  Updated: 04/Jul/11

Status: Open
Project: Doctrine 2 - ORM
Component/s: ORM
Affects Version/s: None
Fix Version/s: 2.x
Security Level: All

Type: Improvement Priority: Major
Reporter: Dave Keen Assignee: Benjamin Eberlei
Resolution: Unresolved Votes: 2
Labels: None

Attachments: Text File 0149-DDC-763.patch     File DDC763Test.php     File multipleaddmerge.diff    

 Description   

I think that the UnitOfWork needs to maintain a map of spl_object_hash($newEntity)->$managedEntity for entities that were persisted via reachability during a merge. doMerge should then only call persistNew if the original entity has not already been persisted (if it has already been persisted it should merge the managed entity from the map). The map should be maintained until a flush() or until the UnitOfWork is cleared. The reasoning is as follows.

Imagine we have a simple doctor object with no associations:

$doctor = new Doctor();
$em->persist($doctor);
$em->persist($doctor);
$em->flush();

After the first persist() $doctor is MANAGED so the second persist has no effect and this results in a single Doctor row.

If we do the same thing using merge and persistence by reachability:

$doctor = new Doctor();
$em->merge($doctor);
$em->merge($doctor);
$em->flush();

we get 2 Doctor rows being added.

Obviously in this particular case we should use the return value from the first merge() as the parameter of the second merge which would give correct behaviour.

However, now imagine one Doctor has many Patients and many Patients have one Doctor, all the associations have cascade merge enabled, and further assume that $d1 (Doctor id=1) is already in the database. We now attempt to create two patients and assign them to the existing doctor:

$d1= new Doctor(); $d1->id = 1; // This is a DETACHED entity

$p1 = new Patient();
$p2 = new Patient();

$d1->patients->add($p1); $p1->doctor = $d1;
$d1->patients->add($p2); $p2->doctor = $d1;

$em->merge($p1);
$em->merge($p2);

$em->flush();

This actually results in 4 rows being added to the 'patients' table instead of 2, I think because $p1 and $p2 are getting persisted both as the root objects and then again from the patient->doctor->patients array. Since the cascade merging happens internally we can't replace the array contents with the managed return values without walking through the object graph (in which case there is no point in using cascade merge in the first place). Maintaining a map in UnitOfWork will allow doMerge to ensure it doesn't persist the same entities twice.

I'm not sure, but this might be relevant for cascade persist too.

P.S. Another bug report on this can be found at http://code.google.com/p/flextrine2/issues/detail?id=32 (it basically says the same thing with different entities).



 Comments   
Comment by Benjamin Eberlei [ 29/Aug/10 ]

@Roman A possible fix for this in my opinion is another map in UnitOfWork $mergedEntities = array(); and a patch like this:

diff --git a/lib/Doctrine/ORM/UnitOfWork.php b/lib/Doctrine/ORM/UnitOfWork.php
index 242d84b..1d0d8b3 100644
--- a/lib/Doctrine/ORM/UnitOfWork.php
+++ b/lib/Doctrine/ORM/UnitOfWork.php
@@ -1340,6 +1340,10 @@ class UnitOfWork implements PropertyChangedListener
             return; // Prevent infinite recursion
         }
 
+        if (isset($this->mergedEntities[$oid])) {
+            return $this->mergedEntities[$oid];
+        }
+
         $visited[$oid] = $entity; // mark visited
 
         $class = $this->em->getClassMetadata(get_class($entity));
@@ -1468,6 +1472,8 @@ class UnitOfWork implements PropertyChangedListener
 
         $this->cascadeMerge($entity, $managedCopy, $visited);
 
+        $this->mergedEntities[$oid] = $managedCopy;
+
         return $managedCopy;
     }
Comment by Dave Keen [ 29/Aug/10 ]

I have tested this patch with my application and it fixes the problem in all my relevant test cases apart from one. The test case that's failing is one that persists a bi-directional many to many relationship, so the associations interweave with each other (if you know what I mean).

I wonder if perhaps doMerge need to continue cascading even if it finds an item in $this->mergedEntities

This is the Flextrine code that fails - it results in no entries in movie_artist. This might also be related to DDC-758?

m1 = new Movie();
m1.title = "Movie 1";

m2 = new Movie();
m2.title = "Movie 2";

a1 = new Artist();
a1.name = "Artist 1";

a2 = new Artist();
a2.name = "Artist 2";

m1.artists.addItem(a1); a1.movies.addItem(m1);
m1.artists.addItem(a2); a2.movies.addItem(m1);

m2.artists.addItem(a1); a1.movies.addItem(m2);
m2.artists.addItem(a2); a2.movies.addItem(m2);

// These translate to cascade merges on the server
em.persist(m1);
em.persist(m2);
em.persist(a1);
em.persist(a2);

// Now flush
em.flush();

Comment by Dave Keen [ 29/Aug/10 ]

P.S. This test passes if I translate em.persist() to $em->persist() (not cascading) on the server instead of translating it to a cascade merge; not sure if that helps

Comment by Roman S. Borschel [ 30/Aug/10 ]

I'd really like to avoid introducing an additional instance variable just to solve this issue but I did not find the time yet to really look into it.

Does someone have a unit test for this already and can attach it to the issue?

Comment by Roman S. Borschel [ 31/Aug/10 ]

Rescheduling for RC1.

Comment by Dave Keen [ 13/Sep/10 ]

Here is a functional test case containing three tests:

testMultiMerge tests basic merging of two new entities, checking that only a single entity ends up in the database. This passes with Benjamin's patch.

testMultiCascadeMerge tests the more complex case of merging a OneToMany association. This also passes with Benjamin's patch.

testManyToManyPersistByReachability tests the ManyToMany case described above and this fails with Benjamin's patch, probably because doMerge doesn't cascade down entities that it has already merged and some ManyToMany associations are being ignored. Its a bit hard to be certain what is causing this as even without Benjamin's patch this test would fail due to DDC-758.

Comment by Benjamin Eberlei [ 15/Sep/10 ]

@Roman i thought about this issue, its not possible without that additional map of merged entities. There is no way we can get that information from other sources.

Problem is rather that the use-case probably only applies in mass-merging scenarios and client-server serialization.

Comment by Dave Keen [ 21/Sep/10 ]

Added another failing test case - adding the same entity from different ends of a many to many bi-directional association to check that there isn't an integrity constraint violation caused by Doctrine trying to add the same row twice.

Comment by Dave Keen [ 21/Sep/10 ]

Attached a patch for this issue.

Comment by Benjamin Eberlei [ 22/Sep/10 ]

can you comment why all the additionall stuff is necessary compared to my patch?

Comment by Dave Keen [ 22/Sep/10 ]

It fixes the two additional test cases - testManyToManyPersistByReachability and testManyToManyDuplicatePersistByReachability.

testManyToManyPersistByReachability was failing with your original patch because there are ManyToMany cases where an entity may have already been merged, but its still necessary to add it to an association and continue to cascade. Running the following with the original patch will miss out some of the associations.

$m1 = new Movie();
$m1->title = "Movie 1";

$m2 = new Movie();
$m2->title = "Movie 2";

$a1 = new Artist();
$a1->name = "Artist 1";

$a2 = new Artist();
$a2->name = "Artist 2";

$m1->artists->add($a1); $a1->movies->add($m1);
$m1->artists->add($a2); $a2->movies->add($m1);
$m2->artists->add($a1); $a1->movies->add($m2);
$m2->artists->add($a2); $a2->movies->add($m2);

$em->merge($a1);
$em->merge($a2);
$em->flush();

The other change in my patch is to protect against this case. It ensures that the following code doesn't add the same entity twice to a collection.

$em->merge($m1);
$em->merge($m2);
$em->merge($a2);
$em->merge($a2);
$em->flush();
Comment by Benjamin Eberlei [ 31/Oct/10 ]

I am not sure if the issue here is rather multiple calls to merge that contain different parts of the same object-graph.

There should be a very simple fix for this, call ->clear() after each merge.

I am not sure if this patch drags us into a blackhole of issues with merging.

Comment by Dave Keen [ 31/Oct/10 ]

Calling ->clear() and ->flush() after each merge is a workaround for the simple case, but unless I am misunderstanding I don't think its a solution for cases where the merging is happening automatically in cascadeMerge. I've actually encountered this issue in another project and scenario to do with creating REST APIs and merging JSON objects into entities, and applying the patch fixed it so a) I think this issue might be a more common that we first thought and b) the patch basically seems to work (plus it doesn't introduce any failing cases in the existing test suite). I can actually still find one edge case to do with cascading merging interlinked many to many associations that this doesn't fix, but I was planning to open that as a new ticket after this My feeling is that the current merge already has issues and this definitely improves it.

Comment by Benjamin Eberlei [ 01/Nov/10 ]

It cannot happen inside a single merge, single merges use the $visited to avoid infinite recursions, each entity can only be merged once inside a single merge operation.

Comment by Benjamin Eberlei [ 10/Nov/10 ]

Added a note into the documentation about using EntityManager#clear between merging of entities which share subgraphs and cascade merge.

Handling this issue in UnitOfwork will be declared an improvement, not a bug anymore and be scheduled for later releases. The required changes to the core are to dangerous and big.

Comment by Dave Keen [ 11/Nov/10 ]

Where in the docs is that?

Just to summarize, the equivalent operation to having multiple merges and a single flush is to call merge followed by flush each time, with the whole thing surrounded by a transaction? Does this have a big impact on performance?

Comment by Dave Keen [ 11/Nov/10 ]

Ben - even given the decision not to implement this (and I do understand your thinking, as it is a major change), is there any reason not to implement the bit that ensures that the same entity isn't added to a collection twice during a merge? I can't think of a situation where this should be allowed, and I have a use case where I get 'DUPLICATE KEY' errors if this isn't there.

Please see attached patch.

Comment by Benjamin Eberlei [ 11/Nov/10 ]

What bit of that huge patch is that? Can you extract it into another ticket if thats possible?

Comment by Benjamin Eberlei [ 11/Nov/10 ]

I added it to "Working with Objects" and the descripton of Merge. Its not yet live on the site.

Using this current workaround has a performance impact, since more SELECT statements have to be issued against the database.

Comment by Dave Keen [ 11/Nov/10 ]

Apologies for not being clear - only the 3rd patch (multipleaddmerge.diff) is relevant to the 'DUPLICATE KEY' error I am now talking about, but I'll put it in a nother ticket if you prefer.

Comment by Benjamin Eberlei [ 11/Nov/10 ]

please add a new ticket, patch looks good.

Comment by Dave Keen [ 11/Nov/10 ]

Created as DDC-875

Generated at Fri Aug 01 12:11:36 UTC 2014 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.