Uploaded image for project: 'Doctrine DBAL'
  1. Doctrine DBAL
  2. DBAL-96

Make approach towards identifier quoting consistent

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 2.6
    • Component/s: Platforms, Schema Managers
    • Labels:
      None

      Description

      • Make the use of `` a general approach for explicit quoting of identifiers
      • introduce AbstractPlatform::getRegularSQLIdentifierCase($identifier)
      • Introduce AbstractPlatform::isRegularIdentifier($identifier)
      • Fix Schema Assets not to lower-case, but to check for explicit quoting before.
      • Filter values of identifiers passed to all platform functions when they are used in information schema queries according to `` explicit quoting rules.

      Problem: Schema is independent of a vendor, this means we have to pick a behavior, i propose SQL-92

      This means:

      • strtoupper() ALL tables, column, index, foreign key names that are not quoted by ``
      • For any Quoted identifiers by `` the case is kept.
      • We can introduce a validator to detect a schema that cannot be implemented with a given vendor platform.

      In conjunction with the SQL reserved keywords tickets we can then improve the DatabaseDriver considerably to detect identifier casings

        Issue Links

          Activity

          Hide
          ancpru Andreas Prucha added a comment -

          IMO the big problem is, that behaviour across the RDBMs may be completely different:

          Some preserve case and are case-insenstive if not quoted (the nicest approach)
          Some do not preserve case and normalize to lower or uppercase and are case-insensitive
          And some do not preserve case and are case-sensitive (the worst)

          The biggest issue arises if the DB needs to be used outside the Doctrine-Environment and all identifiers need to be quoted in statements.

          Show
          ancpru Andreas Prucha added a comment - IMO the big problem is, that behaviour across the RDBMs may be completely different: Some preserve case and are case-insenstive if not quoted (the nicest approach) Some do not preserve case and normalize to lower or uppercase and are case-insensitive And some do not preserve case and are case-sensitive (the worst) The biggest issue arises if the DB needs to be used outside the Doctrine-Environment and all identifiers need to be quoted in statements.
          Hide
          thinkscape Arthur Bodera added a comment - - edited

          Sebastien, ad 3. that is incorrect. Read the ticket more closely, look at the PR, look inside schema tool and platform classes. There is already a lot of quoting+unquoting being performed in 2.* and a lot of assumptions. Having quoting enabled across the board might actually increase performance in some cases, because there will be less scanning for keywords (see platform classes) and possibly less quoting/unquoting across Schema*.

          The problem is, the quoting right now works in some places and in some platforms and is being performed only when schema/schematool/dql needs it, but is being ignored in all other cases. This means that columns like "group" or table names like "platform" will fail randomly depending on platform/rdbms you actually use. It's a nightmare with cross-platform apps and a struggle for single-platform apps, where your tables are named according to domain-rules and happen to overlap with some rdbms.

          Quoting identifiers being "a bloat" is similar to saying, that implicit quoting values is a bloat. Although from security standpoint the former is much rarer, it's the same for portability and stability of the DBAL across platforms.

          Show
          thinkscape Arthur Bodera added a comment - - edited Sebastien, ad 3. that is incorrect. Read the ticket more closely, look at the PR, look inside schema tool and platform classes. There is already a lot of quoting+unquoting being performed in 2.* and a lot of assumptions. Having quoting enabled across the board might actually increase performance in some cases, because there will be less scanning for keywords (see platform classes) and possibly less quoting/unquoting across Schema*. The problem is, the quoting right now works in some places and in some platforms and is being performed only when schema/schematool/dql needs it, but is being ignored in all other cases. This means that columns like "group" or table names like "platform" will fail randomly depending on platform/rdbms you actually use. It's a nightmare with cross-platform apps and a struggle for single-platform apps, where your tables are named according to domain-rules and happen to overlap with some rdbms. Quoting identifiers being "a bloat" is similar to saying, that implicit quoting values is a bloat. Although from security standpoint the former is much rarer, it's the same for portability and stability of the DBAL across platforms.
          Hide
          lavoiesl Sebastien Lavoie added a comment - - edited

          My 2 cents:

          1. Users should not have to worry about platform-specific quoting when using the query builder or helpers, the DBAL should do that for you.
          2. Users should be able to explicitly quote using a standard quote (`), the quote would then be converted to the platform’s quote upon SQL generation, without any case change.
          3. DBAL should not needlessly quote, it adds bloat and it has been said in DBAL-40 that there is a performance hit.
          4. DBAL should not change the case without the user’s knowledge.
          5. A connection configuration option (normalize_case) could be added:
          • uppercase: always convert unquoted identifiers to uppercase
          • lowercase: always convert unquoted identifiers to lowercase
          • platform: will use the default value for specific platform. For the case of case-sensitive platform, even when unquoted (MySQL on UNIX), do nothing.
          • null (default): no normalization
          6. Future versions of DBAL could change the default value to platform, but this would greatly reduce the risk of causing BC breaks at the beginning, giving time to test everything.
          7. When using Doctrine\DBAL\Connection::query directly, you must do the quoting yourself since the SQL is executed directly.

          Show
          lavoiesl Sebastien Lavoie added a comment - - edited My 2 cents: 1. Users should not have to worry about platform-specific quoting when using the query builder or helpers, the DBAL should do that for you. 2. Users should be able to explicitly quote using a standard quote (`), the quote would then be converted to the platform’s quote upon SQL generation, without any case change. 3. DBAL should not needlessly quote, it adds bloat and it has been said in DBAL-40 that there is a performance hit. 4. DBAL should not change the case without the user’s knowledge. 5. A connection configuration option (normalize_case) could be added: • uppercase: always convert unquoted identifiers to uppercase • lowercase: always convert unquoted identifiers to lowercase • platform: will use the default value for specific platform. For the case of case-sensitive platform, even when unquoted (MySQL on UNIX), do nothing. • null (default): no normalization 6. Future versions of DBAL could change the default value to platform, but this would greatly reduce the risk of causing BC breaks at the beginning, giving time to test everything. 7. When using Doctrine\DBAL\Connection::query directly, you must do the quoting yourself since the SQL is executed directly.
          Hide
          deeky666 Steve Müller added a comment -

          Benjamin Eberlei this is an interesting approach and I like it. But I have some complaints about it.
          1. I doubt users will be happy about forced default casing rules (ALL upper or ALL lower). Therefore we should think about adding a simple configuration option in DBAL allowing to override the default casing behaviour to the user's preference.
          2. Using a consistent default casing means we ALWAYS have to quote identifiers as otherwise the underlying database could silently change the case again (don't know if this is an issue).
          3. Introducing this approach in 2.x branch is a BC break as it breaks users' mixed-case identifier mappings.

          For 2.x we should maybe at least make use of Identifier class throughout the platforms where necessary.

          Show
          deeky666 Steve Müller added a comment - Benjamin Eberlei this is an interesting approach and I like it. But I have some complaints about it. 1. I doubt users will be happy about forced default casing rules (ALL upper or ALL lower). Therefore we should think about adding a simple configuration option in DBAL allowing to override the default casing behaviour to the user's preference. 2. Using a consistent default casing means we ALWAYS have to quote identifiers as otherwise the underlying database could silently change the case again (don't know if this is an issue). 3. Introducing this approach in 2.x branch is a BC break as it breaks users' mixed-case identifier mappings. For 2.x we should maybe at least make use of Identifier class throughout the platforms where necessary.

            People

            • Assignee:
              beberlei Benjamin Eberlei
              Reporter:
              beberlei Benjamin Eberlei
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: