Doctrine 2 - ORM
  1. Doctrine 2 - ORM
  2. DDC-763

Cascade merge on associated entities can insert too many rows through "Persistence by Reachability"

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 2.x
    • Component/s: ORM
    • Security Level: All
    • Labels:
      None

      Description

      I think that the UnitOfWork needs to maintain a map of spl_object_hash($newEntity)->$managedEntity for entities that were persisted via reachability during a merge. doMerge should then only call persistNew if the original entity has not already been persisted (if it has already been persisted it should merge the managed entity from the map). The map should be maintained until a flush() or until the UnitOfWork is cleared. The reasoning is as follows.

      Imagine we have a simple doctor object with no associations:

      $doctor = new Doctor();
      $em->persist($doctor);
      $em->persist($doctor);
      $em->flush();
      

      After the first persist() $doctor is MANAGED so the second persist has no effect and this results in a single Doctor row.

      If we do the same thing using merge and persistence by reachability:

      $doctor = new Doctor();
      $em->merge($doctor);
      $em->merge($doctor);
      $em->flush();
      

      we get 2 Doctor rows being added.

      Obviously in this particular case we should use the return value from the first merge() as the parameter of the second merge which would give correct behaviour.

      However, now imagine one Doctor has many Patients and many Patients have one Doctor, all the associations have cascade merge enabled, and further assume that $d1 (Doctor id=1) is already in the database. We now attempt to create two patients and assign them to the existing doctor:

      $d1= new Doctor(); $d1->id = 1; // This is a DETACHED entity
      
      $p1 = new Patient();
      $p2 = new Patient();
      
      $d1->patients->add($p1); $p1->doctor = $d1;
      $d1->patients->add($p2); $p2->doctor = $d1;
      
      $em->merge($p1);
      $em->merge($p2);
      
      $em->flush();
      

      This actually results in 4 rows being added to the 'patients' table instead of 2, I think because $p1 and $p2 are getting persisted both as the root objects and then again from the patient->doctor->patients array. Since the cascade merging happens internally we can't replace the array contents with the managed return values without walking through the object graph (in which case there is no point in using cascade merge in the first place). Maintaining a map in UnitOfWork will allow doMerge to ensure it doesn't persist the same entities twice.

      I'm not sure, but this might be relevant for cascade persist too.

      P.S. Another bug report on this can be found at http://code.google.com/p/flextrine2/issues/detail?id=32 (it basically says the same thing with different entities).

      1. 0149-DDC-763.patch
        16 kB
        Dave Keen
      2. DDC763Test.php
        7 kB
        Dave Keen
      3. multipleaddmerge.diff
        1 kB
        Dave Keen

        Activity

        Dave Keen created issue -
        Dave Keen made changes -
        Field Original Value New Value
        Description I think that the UnitOfWork needs to maintain a map of spl_object_hash($newEntity)->$managedEntity for entities that were persisted via reachability during a merge. doMerge should then only call persistNew is the original entity has not already been persisted (if so it merged the managed entity from the map). The map should be maintained until a flush() or until the UnitOfWork is cleared. The reasoning is as follows.

        Imagine we have a simple doctor object with no associations:

        $doctor = new Doctor();
        $em->persist($doctor);
        $em->persist($doctor);
        $em->flush();

        After the first persist() $doctor is MANAGED so the second persist has no effect and this results in a single Doctor row.

        If we do the same thing using merge and persistence by reachability:

        $doctor = new Doctor();
        $em->merge($doctor);
        $em->merge($doctor);
        $em->flush();

        we get 2 Doctor rows being added.

        Obviously in this particular case we should use the return value from the first merge() as the parameter of the second merge which would give correct behaviour.

        However, now imagine one Doctor has many Patients and many Patients have one Doctor, all the associations have cascade merge enabled, and further assume that $d1 (Doctor id=1) is already in the database. We now attempt to create two patients and assign them to the existing doctor:

        $d1= new Doctor(); $d1->id = 1; // This is a DETACHED entity

        $p1 = new Patient();
        $p2 = new Patient();

        $d1->patients->add($p1); $p1->doctor = $d1;
        $d1->patients->add($p2); $p2->doctor = $d1;

        $em->merge($p1);
        $em->merge($p2);

        $em->flush();

        This actually results in 4 rows being added to the 'patients' table instead of 2, I think because $p1 and $p2 are getting persisted both as the root objects and then again from the patient->doctor->patients array. Since the cascade merging happens internally we can't replace the array contents with the managed return values without walking through the object graph (in which case there is no point in using cascade merge in the first place). Maintaining a map in UnitOfWork will allow doMerge to ensure it doesn't persist the same entities twice.

        I'm not sure, but this might be relevant for cascade persist too.

        P.S. Another bug report on this can be found at http://code.google.com/p/flextrine2/issues/detail?id=32 (it basically says the same thing with different entities).
        I think that the UnitOfWork needs to maintain a map of spl_object_hash($newEntity)->$managedEntity for entities that were persisted via reachability during a merge. doMerge should then only call persistNew is the original entity has not already been persisted (if so it merged the managed entity from the map). The map should be maintained until a flush() or until the UnitOfWork is cleared. The reasoning is as follows.

        Imagine we have a simple doctor object with no associations:

        {code}
        $doctor = new Doctor();
        $em->persist($doctor);
        $em->persist($doctor);
        $em->flush();
        {code}

        After the first persist() $doctor is MANAGED so the second persist has no effect and this results in a single Doctor row.

        If we do the same thing using merge and persistence by reachability:

        {code}
        $doctor = new Doctor();
        $em->merge($doctor);
        $em->merge($doctor);
        $em->flush();
        {code}

        we get 2 Doctor rows being added.

        Obviously in this particular case we should use the return value from the first merge() as the parameter of the second merge which would give correct behaviour.

        However, now imagine one Doctor has many Patients and many Patients have one Doctor, all the associations have cascade merge enabled, and further assume that $d1 (Doctor id=1) is already in the database. We now attempt to create two patients and assign them to the existing doctor:

        {code}
        $d1= new Doctor(); $d1->id = 1; // This is a DETACHED entity

        $p1 = new Patient();
        $p2 = new Patient();

        $d1->patients->add($p1); $p1->doctor = $d1;
        $d1->patients->add($p2); $p2->doctor = $d1;

        $em->merge($p1);
        $em->merge($p2);

        $em->flush();
        {code}

        This actually results in 4 rows being added to the 'patients' table instead of 2, I think because $p1 and $p2 are getting persisted both as the root objects and then again from the patient->doctor->patients array. Since the cascade merging happens internally we can't replace the array contents with the managed return values without walking through the object graph (in which case there is no point in using cascade merge in the first place). Maintaining a map in UnitOfWork will allow doMerge to ensure it doesn't persist the same entities twice.

        I'm not sure, but this might be relevant for cascade persist too.

        P.S. Another bug report on this can be found at http://code.google.com/p/flextrine2/issues/detail?id=32 (it basically says the same thing with different entities).
        Dave Keen made changes -
        Description I think that the UnitOfWork needs to maintain a map of spl_object_hash($newEntity)->$managedEntity for entities that were persisted via reachability during a merge. doMerge should then only call persistNew is the original entity has not already been persisted (if so it merged the managed entity from the map). The map should be maintained until a flush() or until the UnitOfWork is cleared. The reasoning is as follows.

        Imagine we have a simple doctor object with no associations:

        {code}
        $doctor = new Doctor();
        $em->persist($doctor);
        $em->persist($doctor);
        $em->flush();
        {code}

        After the first persist() $doctor is MANAGED so the second persist has no effect and this results in a single Doctor row.

        If we do the same thing using merge and persistence by reachability:

        {code}
        $doctor = new Doctor();
        $em->merge($doctor);
        $em->merge($doctor);
        $em->flush();
        {code}

        we get 2 Doctor rows being added.

        Obviously in this particular case we should use the return value from the first merge() as the parameter of the second merge which would give correct behaviour.

        However, now imagine one Doctor has many Patients and many Patients have one Doctor, all the associations have cascade merge enabled, and further assume that $d1 (Doctor id=1) is already in the database. We now attempt to create two patients and assign them to the existing doctor:

        {code}
        $d1= new Doctor(); $d1->id = 1; // This is a DETACHED entity

        $p1 = new Patient();
        $p2 = new Patient();

        $d1->patients->add($p1); $p1->doctor = $d1;
        $d1->patients->add($p2); $p2->doctor = $d1;

        $em->merge($p1);
        $em->merge($p2);

        $em->flush();
        {code}

        This actually results in 4 rows being added to the 'patients' table instead of 2, I think because $p1 and $p2 are getting persisted both as the root objects and then again from the patient->doctor->patients array. Since the cascade merging happens internally we can't replace the array contents with the managed return values without walking through the object graph (in which case there is no point in using cascade merge in the first place). Maintaining a map in UnitOfWork will allow doMerge to ensure it doesn't persist the same entities twice.

        I'm not sure, but this might be relevant for cascade persist too.

        P.S. Another bug report on this can be found at http://code.google.com/p/flextrine2/issues/detail?id=32 (it basically says the same thing with different entities).
        I think that the UnitOfWork needs to maintain a map of spl_object_hash($newEntity)->$managedEntity for entities that were persisted via reachability during a merge. doMerge should then only call persistNew if the original entity has not already been persisted (if it has already been persisted it should merge the managed entity from the map). The map should be maintained until a flush() or until the UnitOfWork is cleared. The reasoning is as follows.

        Imagine we have a simple doctor object with no associations:

        {code}
        $doctor = new Doctor();
        $em->persist($doctor);
        $em->persist($doctor);
        $em->flush();
        {code}

        After the first persist() $doctor is MANAGED so the second persist has no effect and this results in a single Doctor row.

        If we do the same thing using merge and persistence by reachability:

        {code}
        $doctor = new Doctor();
        $em->merge($doctor);
        $em->merge($doctor);
        $em->flush();
        {code}

        we get 2 Doctor rows being added.

        Obviously in this particular case we should use the return value from the first merge() as the parameter of the second merge which would give correct behaviour.

        However, now imagine one Doctor has many Patients and many Patients have one Doctor, all the associations have cascade merge enabled, and further assume that $d1 (Doctor id=1) is already in the database. We now attempt to create two patients and assign them to the existing doctor:

        {code}
        $d1= new Doctor(); $d1->id = 1; // This is a DETACHED entity

        $p1 = new Patient();
        $p2 = new Patient();

        $d1->patients->add($p1); $p1->doctor = $d1;
        $d1->patients->add($p2); $p2->doctor = $d1;

        $em->merge($p1);
        $em->merge($p2);

        $em->flush();
        {code}

        This actually results in 4 rows being added to the 'patients' table instead of 2, I think because $p1 and $p2 are getting persisted both as the root objects and then again from the patient->doctor->patients array. Since the cascade merging happens internally we can't replace the array contents with the managed return values without walking through the object graph (in which case there is no point in using cascade merge in the first place). Maintaining a map in UnitOfWork will allow doMerge to ensure it doesn't persist the same entities twice.

        I'm not sure, but this might be relevant for cascade persist too.

        P.S. Another bug report on this can be found at http://code.google.com/p/flextrine2/issues/detail?id=32 (it basically says the same thing with different entities).
        Roman S. Borschel made changes -
        Fix Version/s 2.0-BETA4 [ 10072 ]
        Roman S. Borschel made changes -
        Fix Version/s 2.0-RC1 [ 10091 ]
        Fix Version/s 2.0-BETA4 [ 10072 ]
        Dave Keen made changes -
        Attachment PersistByReachabilityMergeTest.php [ 10785 ]
        Dave Keen made changes -
        Attachment PersistByReachabilityMergeTest.php [ 10785 ]
        Dave Keen made changes -
        Attachment DDC763Test.php [ 10793 ]
        Dave Keen made changes -
        Attachment DDC763Test.php [ 10793 ]
        Dave Keen made changes -
        Comment [ Also to confirm, Benjamin's fix seems to work fine, and passes the test case. The many-many issue I mentioned above is definitely a different bug (DDC-758). ]
        Dave Keen made changes -
        Comment [ Here is a functional unit test case for this issue (both the simple and complex cases). ]
        Dave Keen made changes -
        Attachment DDC763Test.php [ 10795 ]
        Dave Keen made changes -
        Attachment DDC763Test.php [ 10795 ]
        Dave Keen made changes -
        Attachment DDC763Test.php [ 10811 ]
        Dave Keen made changes -
        Attachment 0149-DDC-763.patch [ 10812 ]
        Benjamin Eberlei made changes -
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Assignee Roman S. Borschel [ romanb ] Benjamin Eberlei [ beberlei ]
        Fix Version/s 2.1 [ 10022 ]
        Fix Version/s 2.0-RC1 [ 10091 ]
        Priority Critical [ 2 ] Major [ 3 ]
        Dave Keen made changes -
        Attachment multipleaddmerge.diff [ 10858 ]
        Benjamin Eberlei made changes -
        Fix Version/s 2.x [ 10090 ]
        Fix Version/s 2.1 [ 10022 ]
        Benjamin Eberlei made changes -
        Workflow jira [ 11812 ] jira-feedback [ 13871 ]
        Benjamin Eberlei made changes -
        Workflow jira-feedback [ 13871 ] jira-feedback2 [ 15735 ]
        Benjamin Eberlei made changes -
        Workflow jira-feedback2 [ 15735 ] jira-feedback3 [ 17992 ]

          People

          • Assignee:
            Benjamin Eberlei
            Reporter:
            Dave Keen
          • Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated: