[DBAL-40] Transparent table&column names escaping Created: 05/Aug/10 Updated: 28/Mar/15
|Affects Version/s:||2.3, 2.4, 2.4.1|
|Reporter:||Jan Tichý||Assignee:||Benjamin Eberlei|
Hello, I would like to re-open the discussion about automatic transparent escaping of all table/column names sent from DBAL to database. It was already discussed in http://www.doctrine-project.org/jira/browse/DDC-88 without any satisfactory result.
Why do I have to quote any reserved word used in table or column name? Why Doctrine doesn't do this automatically for all table and column
Before you start to explain how complicated it is and what problems you will be faced with, try to look at excellent DIBI database layer - how it acts in this way - it's behaviour is very cool. Unfortunally at the moment the full documentation is in czech only, but here is a brief automatic google-translation to english - http://dibiphp.com/en/quick-start.
My suggestion to Doctrine 2 ORM/DBAL solution is:
1. Developer should never care about any escaping or avoiding any reserved words - it is not his business, the DBAL shoult solve it transparently and safely.
2. So there should be no need and even no possibility to add any quotation chars in @column or @table annotations as well as in DQL queries. ORM layer has nothing to do with escaping, it is all a business of the DBAL layer. Current possibility for manual escaping the names in mentioned annotations is totally wrong and should be discontinued.
3. DBAL should escape ALL table and column names transparently and automatically. There should be ne option to enable or disable the escaping, there is no reason for disabling it.
4. The escaping should be performed just in the final translation of DBAL queries to native SQL query, not earlier. This is the right place to do that.
So what do you think about that?
|Comment by Roman S. Borschel [ 05/Aug/10 ]|
My point of view (and the reason for the current implementation) is as follows:
So, supporting selective quoting in the name of a (slightly) better interoperability with legacy schemas looked (and still looks) like the best solution for us. The support is limited, explicit, does not require much implementation or overhead and does not unnecessarily bloat the SQL.
There is only one solution for reserved words: not using them. Quoting is a workaround, not a solution and especially not a good one.
ps: I really wish quoting reserved words would not be available in SQL It's not available in most programming languages and noone cares, people just don't use reserved words, because they simply can't.
|Comment by Jan Tichý [ 05/Aug/10 ]|
Hi Roman, thank you very much for your response! I storngly disagree with most of your points .
There is no doubt that using reserved words is bad practice - FROM THE VIEW OF DATABASE SYSTEM.
But we are discussing about ORM and DBAL. One of the biggest goals of ORM/DBAL is to provide transparent usage of the storage behind the scene. No matter if it is MySQL or PostgreSQL or even maybe something completely diferent.
The ORM/DBAL layer should prevent me from any specifics of particular storage as much as possible. I don't want to remember (and I never should to) that I cannot create entity Order because "order" is reserved word in some weird technology far away from me as ORM programmer.
It is strictly consistent with what you have written above in your PS - "It's not available in most programming languages and noone cares, people just don't use reserved words, because they simply can't" - just consider Doctrine 2 to be another programming language - and there is no real systematic reason in Doctrine 2 itself to prevent developers create entities named "Order".
Here is an analogy - It is the same as if you would say that you cannot use associative arrays in PHP because C-language or Assembler behind PHP doesn't support associative arrays. Yes, they don't support them but it is the responsibility of PHP to provide them. In the same way I don't want to respect this weird limitations of particular RDBMS behind Doctrine 2. This is Doctrine's responsibility to transparently cover the limitation.
Moreover, when list of registered keywords is different from one to the other RDBMS, so the naming of entities is strongly dependent on current database server.
Moreover, when I realize that I have used a registered keyword as lately as an error returns from database engine, not earlier.
I suppose here is probably no risk of SQL injection, but I feel the current Doctrine 2 acting to be "vulnerable" in very similar way, on principle. Simply - you are sending an unescaped piece of SQL query to the database without any warranty what it is. And sometimes it fails, sometimes not. From this view I don't consider overall escaping to be overkill at all, I consider it to be a necessity.
I am strongly convinced that developer working upon DBAL or even ORM layer should never think about such naming limitations and he even shouldn't know anything about reserved words in his particular DBMS.
Now to mentioned problems with case sensitivity. Resulting from the fact that Doctrine 2 entity names are case insensitive I belive that all table definitions and SQL queries comming from Doctrine 2 to database should act as case insensitive too. And that the only practicable way is to normalize (lowercase) all table and column names just on DBAL side before it is passed as SQL query to database.
|Comment by Benjamin Eberlei [ 05/Aug/10 ]|
There is actually a very good reason for not quoting. Oracle columns behave differently in their internal structure when escaped.
With quoting it would lead to a column "foo" being lower-cased IN the database and even returned so from resultsets. Without casing it would be a column "FOO". We would essentially need to implement lots of glue code just to get this annoying Oracle feature to work and i think Postgres has the same with lower-cased columns.
|Comment by Roman S. Borschel [ 05/Aug/10 ]|
@"Hi Roman, thank you very much for your response! I storngly disagree with most of your points"
I guess we can agree to disagree then
@"But we are discussing about ORM and DBAL. One of the biggest goals of ORM/DBAL is to provide transparent usage of the storage behind the scene. No matter if it is MySQL or PostgreSQL or even maybe something completely diferent."
Actually, no, "hiding" the storage completely from the developer is not the goal just as it is not the goal to "hide" SQL. There is an object model on one side and a relational database on the other side. The goal is to provide a mapping between them which is not the same as "hiding" one from the other. In order to create good applications that use ORM technology you need to know both very well, OOP and relational databases. The goal is not to make relational database knowledge "unnecessary". This only results in inefficient use of the databases. The goal is to give people who know both sides equally well a tool to map between the two. Not even "portability" between different relational database vendors is a main goal of an ORM technology, it is just obvious to provide assistance with that as part of the mapping.
@"and there is no real systematic reason in Doctrine 2 itself to prevent developers create entities named "Order".
Noone prevents you from naming domain classes anything you want. Class naming is different from table naming. That the table name defaults to the class name is just that, a default, that can and should be changed if necessary.
@"Moreover, when list of registered keywords is different from one to the other RDBMS, so the naming of entities is strongly dependent on current database server."
Correct, and if you want to create a portable application that works, and will be deployed on, a different set of vendors, you need to have some knowledge of these databases and consider their characteristics. An ORM/DBAL technology does not give you any guarantee for complete and transparent portability between vendors and especially not that it will perform equally well on all of them. The ORM/DBAL technology helps you for the most part in a lot of cases with portability issues but it is no free ticket.
@"I suppose here is probably no risk of SQL injection, but I feel the current Doctrine 2 acting to be "vulnerable" in very similar way, on principle. Simply - you are sending an unescaped piece of SQL query to the database without any warranty what it is. And sometimes it fails, sometimes not. From this view I don't consider overall escaping to be overkill at all, I consider it to be a necessity."
Do not confuse identifier quoting with quoting/escaping of special characters as it is used for security reasons on input. Identifier quoting is absolutely not a necessity, it is a workaround for using otherwise reserved words as schema element names. Speaking of goals, it is neither a "goal" of ORM/DBAL technology to completely remove the possibilities of SQL injections. You can't. It'll always be possible with wrong usage.
@"I am strongly convinced that developer working upon DBAL or even ORM layer should never think about such naming limitations and he even shouldn't know anything about reserved words in his particular DBMS."
And I am strongly convinced that a developer working with a DBAL/ORM should know the underlying databases pretty well.
I think you're really not aware of all the consequences it has across different database vendors to quote every identifier. If not for developers using Doctrine, you cause at least any developer or application pain that does not access the database through Doctrine and is thus feels the full pain of case-sensitivity and mandatory quoting you enforced on the whole schema. Ubiquitious access to the data is actually a strong point of a relational database and it is far from uncommon that the same database is accessed by many parties.
I think the approach taken by DIBI is a bad idea and even worse if there is no way to turn this behavior off. Do they have Oracle or DB2 users? I'm wondering what the sysadmins behind these databases might think if they see this quoting nightmare since to my knowledge this is considered bad practice among them as well.
Yes, we're disagreeing on many points but if you really think identifier quoting is a good idea then you're ignoring a whole lot of prior experience (not only mine).
|Comment by Lukas Kahwe [ 05/Aug/10 ]|
I was one of the lead developers of MDB2 and we just ran into tons of issues when we overly aggressively did identifier quoting by default. even the option caused lots of headaches. furthermore I agree that the ORM is not about turning an RDBMS into an Object Database, but instead to make a mapping possible. In this vain using reserved words or making all identifiers case sensitive will be a big pain for the people that do work one level lower aka the DBA's. heck even as a developer I frequently work on the DB's command line.
Now as for helping people prevent issues with reserved words. Back then I added some reserved word checking into MDB2_Schema. Obviously its hard to really keep track of all of the different reserved words for all RDBMS. Maybe its possible to work with this guy for this: http://www.petefreitag.com/item/290.cfm This way it could be possible to validate if the names chosen in the models will not cause issues with a certain list of RDBMS.
|Comment by Benjamin Eberlei [ 07/Aug/10 ]|
Reserved words checking sounds to be a fair compromise!
|Comment by Jan Tichý [ 30/Aug/10 ]|
Hello, thank you all for your responses.
This helped me understand much about Doctrine 2 basic objectives - especially that it is designed mainly to "make a mapping possible" only, not to be as much as possible transparent layer between database and application. And even if I don't like this conception (because I personally think ORM should provide all such features - like automatic reserved keywords escaping - to make the particular database as transparent as possible), at the same time I fully understand all metioned arguments for doing things in such way. Thank you again.
|Comment by Damian Boune [ 17/Jan/11 ]|
I would like to state an agreement with the OP.
I understand where there are difficulties in handling reserved words and backtick/quoting, and certainly one should always avoid the use of reserved words in their own schema designs. This is a given when one is able to exert control.
At present I am working on a project in which I am dealing with an outside database where I have no control over the schema, nor am I able to push the remote into making the most sensible changes to their schema. I must live with what they provide.
DBAL presents me with a set of invaluable tools that can not be used as-is, because it lacks the ability to handle quoting when generating schema sql. I'm sure there are some other places where I will find this lacking as well. This is disappointing.
Regardless of what we as developers should do when designing our own schema, we still need to be able to work and play with others who may not follow the same common sense conventions.
|Comment by Francesco Montefoschi [ 03/Feb/11 ]|
"its hard to really keep track of all of the different reserved words for all RDBMS"
That's the main point for me.
|Comment by Adrian Rudnik [ 26/Apr/11 ]|
@Damian thanks for the hint. I just ran into a similar situation.
Not every project is a startup. I tried to use doctrine2 on a customers database for a small web ui. Well I told them to rename their `iso3166-1` table and `alpha-2` field, then we had a good laugh. We made the mapping possible but i'll remember the one thing i learned: doctrine did not help, guide, prevent or cared at all. It did not even hesitate to spew invalid sql snippets when asked to dump. Its okay for me, but i've expected something more resilient from a DBAL.
|Comment by Robert (Jamie) Munro [ 02/Feb/13 ]|
What do you mean by "Quoting everything is like hitting all the SQL with a huge big hammer"? Is there a performance hit?
I have always quoted all names when working with PostGres. Not quoting them has always felt like not quoting strings in PHP (e.g. $foo[bar] instead of $foo['bar'] because unless the string is keyword or defined as a constant somewhere, you don't need to (although you will get a "Use of undefined constant" warning). In the early days of PHP, not quoting array keys was common example practise.
|Comment by Marco Pivetta [ 02/Feb/13 ]|
If you want quoting by default on everything we have a quoting strategy (in ORM) that you can use. I don't think quoting everything by default is a viable solution. Back in `Zend_Db` times this was eating up a lot of performance for no real reason. Users having a clean schema without horrors like columns called `order` or `group` should not be penalized because of users not using valid naming schemes.
|Comment by Steve Müller [ 24/Jun/13 ]|
Hello, if I understand correctly, the issue of quoting reserved keywords automatically is solved in https://github.com/doctrine/dbal/pull/302. Besides reserved keywords you can still decide quoting or not quoting identifier manually by passing quotes to the identifier or not.
|Comment by Arthur Bodera [ 26/Sep/13 ]|
It's still broken in 2.4.
PR 302 only selectively fixes indexes, PK and FK, but ALTER and all CRUD will still fail (and schema tool will produce invalid sql).
There is no performance hit, as all operations already hit `DefaultQuoteStrategy`.
Currently you have the following workarounds:
Here is a class you can use: https://gist.github.com/Thinkscape/6713196
|Comment by Arthur Bodera [ 26/Sep/13 ]|
QuoteStrategies are not used for ALTER queries. This means that using the EagerQuoteStrategy mentioned above won't fix invalid ALTER queries generated by schema tool.
For ALTER to work, we need this merged:
|Comment by Doctrine Bot [ 26/Sep/13 ]|
A related Github Pull-Request [GH-379] was closed:
|Comment by Sebastien Lavoie [ 28/Mar/15 ]|
My 2 cents from DBAL-96:
1. Users should not have to worry about platform-specific quoting when using the query builder or helpers, the DBAL should do that for you.
|Comment by Arthur Bodera [ 28/Mar/15 ]|
Sebastien, ad 3. that is incorrect. Read the ticket more closely, look at the PR, look inside schema tool and platform classes. There is already a lot of quoting+unquoting being performed in 2.* and a lot of assumptions. Having quoting enabled across the board might actually increase performance in some cases, because there will be less scanning for keywords (see platform classes) and possibly less quoting/unquoting across Schema*.
The problem is, the quoting right now works in some places and in some platforms and is being performed only when schema/schematool/dql needs it, but is being ignored in all other cases. This means that columns like "group" or table names like "platform" will fail randomly depending on platform/rdbms you actually use. It's a nightmare with cross-platform apps and a struggle for single-platform apps, where your tables are named according to domain-rules and happen to overlap with some rdbms.
Quoting identifiers being "a bloat" is similar to saying, that implicit quoting values is a bloat. Although from security standpoint the former is much rarer, it's the same for portability and stability of the DBAL across platforms.