Doctrine 1
  1. Doctrine 1
  2. DC-366

Error with tokenizer for JOINs (comments include proposed solutions)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.2
    • Component/s: Query
    • Labels:
      None
    • Environment:
      Symfony-1.3, reporting below is based on the symfony-1.3.1 sandbox, but same problem occurs in other (non-sandbox) symfony-1.3 environments

      Description

      Using the attached schema.yml works fine with Symfony-1.2 and Doctrine-1.0. It even works when I have tested it with Symfony-1.2 and Doctrine-1.1.

      However, when using Symfony-1.3 and Doctrine-1.2 the following happens:

       
      > php symfony doctrine:build-all-reload
      >> doctrine  Dropping "doctrine" database
      >> doctrine  Creating "dev" environment "doctrine" database
      >> doctrine  generating model classes
      >> file+     C:\Users\XXXXXXXXXXXXXXXX\AppDa...\Temp/doctrine_schema_30398.yml
      >> tokens    C:/web/sf_sandbox-1.3/lib/model/doctrine/base/BaseBar.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/model/doctrine/base/BaseFoo.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/model...e/base/BaseJoinFooBar.class.php
      >> autoload  Resetting application autoloaders
      >> file-     C:/web/sf_sandbox-1.3/cache/fro.../config/config_autoload.yml.php
      >> doctrine  generating form classes
      >> tokens    C:/web/sf_sandbox-1.3/lib/form/BaseForm.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/form/doctrine/BarForm.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/form/...rine/base/BaseBarForm.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/form/...rine/base/BaseFooForm.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/form/...se/BaseJoinFooBarForm.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/form/...rine/BaseFormDoctrine.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/form/doctrine/FooForm.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/form/doctrine/JoinFooBarForm.class.php
      >> autoload  Resetting application autoloaders
      >> file-     C:/web/sf_sandbox-1.3/cache/fro.../config/config_autoload.yml.php
      >> doctrine  generating filter form classes
      >> tokens    C:/web/sf_sandbox-1.3/lib/filte...octrine/BarFormFilter.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/filte...ase/BaseBarFormFilter.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/filte...ase/BaseFooFormFilter.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/filte...eJoinFooBarFormFilter.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/filte...aseFormFilterDoctrine.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/filte...octrine/FooFormFilter.class.php
      >> tokens    C:/web/sf_sandbox-1.3/lib/filte.../JoinFooBarFormFilter.class.php
      >> autoload  Resetting application autoloaders
      >> file-     C:/web/sf_sandbox-1.3/cache/fro.../config/config_autoload.yml.php
      >> doctrine  generating sql for models
      >> doctrine  Generated SQL successfully for models
      >> doctrine  created tables successfully
      >> doctrine  Loading data fixtures from "C:\web\sf_sandbox-1.3\data/fixtures"
      
      
        Couldn't find class FooBar
      
      

      The attached schema.yml looks like this:

      # /config/doctrine/schema.yml
      Foo:
        columns:
          id:
            type:                    integer(8)
            primary:                 true
            autoincrement:           true
          foo_field:                 string(100)
        relations:
          Bars:
            class:                   Bar
            refClass:                JoinFooBar
            local:                   foo_id
            foreign:                 bar_id
      Bar:
        columns:
          id:
            type:                    integer(8)
            primary:                 true
            autoincrement:           true
          bar_field:               string(30)
      JoinFooBar:
        columns:
          foo_id:
            type:                    integer(8)
            primary:                 true
          bar_id:
            type:                    integer(8)
            primary:                 true
      
      

      As you can see, there is not meant to be any class called FooBar (the reference class for the many-to-many join is called JoinFooBar) so where is the name "FooBar" being picked up from and why is the code complaining that it cannot be found?

      The fixture file is empty so there shouldn't be anything causing problems there.

      This bug is making it impossible to upgrade our existing projects to Symfony-1.3 + Doctrine-1.2 so I'd be most grateful if you could look into this.

      If the schema is changed from:

      ...
          bar_field:               string(30)
      JoinFooBar:
        columns:
      ...
      

      To:

      ...
          bar_field:               string(30)
      FooBar:
        columns:
      ...
      

      it seems to work, (that error disappears - even though the refClass is still set to *Join*FooBar) but we cannot easily justify changing all the classnames of all our Join tables in the projects we want to upgrade!

      I hope you can see some simple solution to this and I'm happy to answer any questions you have.

      (To reproduce this bug use the symfony sandbox, create a database and then use the attached schema.yml file.)

      C

      1. jira-ticket-dc366.patch
        2 kB
        Christian Seaman
      2. schema.yml
        0.7 kB
        Christian Seaman

        Activity

        Hide
        Christian Seaman added a comment -

        As reported here:
        http://forum.symfony-project.org/index.php/t/24441/

        It seems that the problem is with the doctrine:build-all-reload in symfony and not anything deeper.

        As such, I think it would make sense to disactivate that task in symfony (rather than marking it as deprecated) since it is broken and a note in it directing people to use the new tasks would be far more useful.

        C

        Show
        Christian Seaman added a comment - As reported here: http://forum.symfony-project.org/index.php/t/24441/ It seems that the problem is with the doctrine:build-all-reload in symfony and not anything deeper. As such, I think it would make sense to disactivate that task in symfony (rather than marking it as deprecated) since it is broken and a note in it directing people to use the new tasks would be far more useful. C
        Hide
        Christian Seaman added a comment -

        As noted above, this is not a Critical error with Doctrine, but rather a problem with a symfony task.

        Show
        Christian Seaman added a comment - As noted above, this is not a Critical error with Doctrine, but rather a problem with a symfony task.
        Hide
        Christian Seaman added a comment -

        On further inspection, this is not an error with the symfony task, but a problem with the Tokenizer class, in particular where Doctrine_Query_From::parse() calls Doctrine_Query_Tokenizer::bracketExplode() with the delimiter set to 'JOIN'.

        Show
        Christian Seaman added a comment - On further inspection, this is not an error with the symfony task, but a problem with the Tokenizer class, in particular where Doctrine_Query_From::parse() calls Doctrine_Query_Tokenizer::bracketExplode() with the delimiter set to 'JOIN'.
        Hide
        Christian Seaman added a comment -

        OK... I think I have found and solved this one but a more experienced member of the Doctrine team should review what I propose before adding it to the codebase.

        The problem occurs because Doctrine_Query_Tokenizer::getSplitRegExpFromArray() is being passed a string with no word boundaries (the parameter being passed to it by Doctrine_Query_From::parse() is "JOIN"). getSplitRegExpFromArray() then treats this string as a case insensitive regex so if any of your class or field names contain this string they are treated as a split.

        E.g. "JoinFooBar JOIN BanjoIndia" would be split into array ("", "FooBar", "Ban", "dia").

        This is clearly wrong and would mean that no table or fieldnames could contain the letters j-o-i-n in that order.

        The proposed solution is as follows:

        Current version of Doctrine_Query_Tokenizer::getSplitRegExpFromArray() (Doctrine 1.2, as of r7034):

         
            private function getSplitRegExpFromArray(array $d){
                $d = array_map('preg_quote', $d);
        
                if (in_array(' ', $d)) {
                    $d[] = '\s';
                }
        
                return '#(' . implode('|', $d) . ')#';
            }
        
        

        Proposed modification checks each delimiter given. If that delimiter consists only of \w characters (i.e. [0-9a-zA-Z_]) then a word boundary is required before and after that delimiter if it's going to be used as a match:

         
            private function getSplitRegExpFromArray(array $d){
                foreach ($d as $key => $string) {
                    $escapedString = preg_quote($string);
                    if (preg_match('#^\w+$#', $string)) $escapedString = "\W$escapedString\W";
                    $d[$key] = $escapedString;
                }
        
                if (in_array(' ', $d)) {
                    $d[] = '\s';
                }
        
                return '#(' . implode('|', $d) . ')#';
            }
        
        

        Now, this seems to work for me in brief testing for my own purposes. However, the test cases in TokenizerTestCase.php (particularly testBracketExplode()) are far from exhaustive so I'm not sure if this proposed solution would suit all cases. E.g. do you ever need the delimiter to match at the very beginning or end of a string? At the moment it would not match but you could deal with this by using this instead:

         if (preg_match('#^\w+$#', $string)) $escapedString = "(^|\W)$escapedString($|\W)";

        An alternative solution, but far less robust, would be just to modify Doctrine_Query_From::parse() from

            public function parse($str, $return = false)
            {
                $str = trim($str);
                $parts = $this->_tokenizer->bracketExplode($str, 'JOIN');
        
                $from = $return ? array() : null;
        ...
        
        

        to this, with spaces around the JOIN string passed to bracketExplode():

            public function parse($str, $return = false)
            {
                $str = trim($str);
                $parts = $this->_tokenizer->bracketExplode($str, ' JOIN ');
        
                $from = $return ? array() : null;
        ...
        
        

        This is probably going to run marginally faster, but it leaves the risk that some change in future will bring up the same problem again, so the former (fixed regexp) solution would seem to be better for stability.

        I hope you can pick this up, review it and implement the most suitable change (and update the test cases too) - at the moment this is a big bug since any schema which defines table or fieldnames that contain the consecutive letters j-o-i-n will break.

        C

        Show
        Christian Seaman added a comment - OK... I think I have found and solved this one but a more experienced member of the Doctrine team should review what I propose before adding it to the codebase. The problem occurs because Doctrine_Query_Tokenizer::getSplitRegExpFromArray() is being passed a string with no word boundaries (the parameter being passed to it by Doctrine_Query_From::parse() is "JOIN"). getSplitRegExpFromArray() then treats this string as a case insensitive regex so if any of your class or field names contain this string they are treated as a split. E.g. "JoinFooBar JOIN BanjoIndia" would be split into array ("", "FooBar", "Ban", "dia"). This is clearly wrong and would mean that no table or fieldnames could contain the letters j-o-i-n in that order. The proposed solution is as follows: Current version of Doctrine_Query_Tokenizer::getSplitRegExpFromArray() (Doctrine 1.2, as of r7034): private function getSplitRegExpFromArray(array $d){ $d = array_map('preg_quote', $d); if (in_array(' ', $d)) { $d[] = '\s'; } return '#(' . implode('|', $d) . ')#'; } Proposed modification checks each delimiter given. If that delimiter consists only of \w characters (i.e. [0-9a-zA-Z_] ) then a word boundary is required before and after that delimiter if it's going to be used as a match: private function getSplitRegExpFromArray(array $d){ foreach ($d as $key => $string) { $escapedString = preg_quote($string); if (preg_match('#^\w+$#', $string)) $escapedString = "\W$escapedString\W"; $d[$key] = $escapedString; } if (in_array(' ', $d)) { $d[] = '\s'; } return '#(' . implode('|', $d) . ')#'; } Now, this seems to work for me in brief testing for my own purposes. However, the test cases in TokenizerTestCase.php (particularly testBracketExplode()) are far from exhaustive so I'm not sure if this proposed solution would suit all cases. E.g. do you ever need the delimiter to match at the very beginning or end of a string? At the moment it would not match but you could deal with this by using this instead: if (preg_match('#^\w+$#', $string)) $escapedString = "(^|\W)$escapedString($|\W)"; An alternative solution, but far less robust, would be just to modify Doctrine_Query_From::parse() from public function parse($str, $return = false) { $str = trim($str); $parts = $this->_tokenizer->bracketExplode($str, 'JOIN'); $from = $return ? array() : null; ... to this, with spaces around the JOIN string passed to bracketExplode(): public function parse($str, $return = false) { $str = trim($str); $parts = $this->_tokenizer->bracketExplode($str, ' JOIN '); $from = $return ? array() : null; ... This is probably going to run marginally faster, but it leaves the risk that some change in future will bring up the same problem again, so the former (fixed regexp) solution would seem to be better for stability. I hope you can pick this up, review it and implement the most suitable change (and update the test cases too) - at the moment this is a big bug since any schema which defines table or fieldnames that contain the consecutive letters j-o-i-n will break. C
        Hide
        Matthias Steinböck added a comment - - edited

        I found the same bug.

        The error i got was "Couldn't find class ed". at first i wondered where i used "ed" but i did nowhere. the code i used was:

        $from = 'Media';
        $alias = 'Content';
        
        $qry = Doctrine_Query::create();
        $qry->from("$from orig");
        $qry->leftJoin("orig.$alias joined"); // here the error occoures using "joined"
        

        I located the Bug in Doctrine_Query_From using $this->_tokenizer->bracketExplode for checking the right arm of the join so i only can confirm this bug.

        Show
        Matthias Steinböck added a comment - - edited I found the same bug. The error i got was "Couldn't find class ed". at first i wondered where i used "ed" but i did nowhere. the code i used was: $from = 'Media'; $alias = 'Content'; $qry = Doctrine_Query::create(); $qry->from( "$from orig" ); $qry->leftJoin( "orig.$alias joined" ); // here the error occoures using "joined" I located the Bug in Doctrine_Query_From using $this->_tokenizer->bracketExplode for checking the right arm of the join so i only can confirm this bug.
        Hide
        Christian Seaman added a comment -

        Matthias,

        feel free to use the fix detailed above. I have been using this in my local copy of Doctrine and it seems to work well so far.

        We just need a member of Doctrine's dev team to decide which of the fixes above makes most sense and then to update the code for the next Doctrine release.

        The fix I am using at the moment is the first one:

                  if (preg_match('#^\w+$#', $string)) $escapedString = "\W$escapedString\W";
        
        Show
        Christian Seaman added a comment - Matthias, feel free to use the fix detailed above. I have been using this in my local copy of Doctrine and it seems to work well so far. We just need a member of Doctrine's dev team to decide which of the fixes above makes most sense and then to update the code for the next Doctrine release. The fix I am using at the moment is the first one: if (preg_match('#^\w+$#', $string)) $escapedString = "\W$escapedString\W";
        Hide
        Jonathan H. Wage added a comment -

        Hi, before I can test your proposed solution. I need to see a patch with your changes. I can't very easily and reliably copy and paste your changes from the comments.

        Show
        Jonathan H. Wage added a comment - Hi, before I can test your proposed solution. I need to see a patch with your changes. I can't very easily and reliably copy and paste your changes from the comments.
        Hide
        Christian Seaman added a comment -

        Hi Jon,

        Thanks for looking into this.

        I will generate a patch and attach it to this ticket for your review.

        Feel free to rip it apart and change the test cases or regex used.

        C

        Show
        Christian Seaman added a comment - Hi Jon, Thanks for looking into this. I will generate a patch and attach it to this ticket for your review. Feel free to rip it apart and change the test cases or regex used. C
        Hide
        Christian Seaman added a comment -

        Patch with suggested fix for the tokenizer and basic test case.

        Show
        Christian Seaman added a comment - Patch with suggested fix for the tokenizer and basic test case.

          People

          • Assignee:
            Jonathan H. Wage
            Reporter:
            Christian Seaman
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: