[DDC-1096] "You may have observed that in a mixed result, the object always ends up on index 0 of a result row." Created: 03/Apr/11  Updated: 14/Nov/11  Resolved: 31/Oct/11

Status: Resolved
Project: Doctrine 2 - ORM
Component/s: ORM
Affects Version/s: 2.0.2
Fix Version/s: 2.0.4
Security Level: All

Type: Improvement Priority: Critical
Reporter: Daniel Alvarez Arribas Assignee: Guilherme Blanco
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Reference
relates to DDC-1424 Make order in DQL SELECT clause matte... Resolved

 Description   

The DQL reference at

http://www.doctrine-project.org/docs/orm/2.0/en/reference/dql-doctrine-query-language.html

defines that

"You may have observed that in a mixed result, the object always ends up on index 0 of a result row."

I find this too important to state it in such a parenthetic way.

The reordering is counterintuitive behaviour at its worst, and grossly in violation of the principle of least astonishment. I bet noone would even guess that this could be intended behaviour. I would not consider this a classic bug, as it is documented behaviour, but as a way of deliberately doing it wrong, making mixed results totally counterintuitive to use. From a usability standpoint, this is as bad as it gets.

I cannot think of any convincing reason why the ORM should decide on its own, and without any notice, to change the explicitly specified ordering from the select list, because it is absolutely confusing and IMO there is no advantage at all in doing so.

I explicitly told Doctrine 2 to put the entity element on position 3, so why should it end up at position 0 without further notice? If position 0 is a must (which I do not think it should be), then the user should not be able to specify entity elements at any other position than that. As long as other positions are permissible, I would rather have the element stay at the exact position declared, and not be shuffled around.



 Comments   
Comment by Daniel Alvarez Arribas [ 03/Apr/11 ]

And: Is the entry in the manual bad humor? I mean the way the DQL reference puts it ("You may have obverved...") - it even assumes it already took the user some experimentation. Really funny. I wish I had the time back I spent debugging.

Comment by Benjamin Eberlei [ 03/Apr/11 ]

I don't understand your confusion with this. How would you expect it to work? Scalar values have a name, so they are named. Objects dont have a "name" or alias and therefore key-based. Its just an convention.

Yes the sentence may be formulated strangely, but its still stating how it works.

Comment by Daniel Alvarez Arribas [ 03/Apr/11 ]

I would expect this to work using regular numeric indexing. If Doctrine can put something at position 0, it should as well be able to put it at position 3. I do not see where this requires string keys.

Comment by Daniel Alvarez Arribas [ 03/Apr/11 ]

The confusion is due to the total deviation from regular query semantics. The behaviour to be expected is that when selecting Elements (A, B, C), A is listed first in the result rows, B comes second, and then C. Now, in case of mixed results, it can end up (A, C, B), (C, A, B), or (B, A, C), depending on the types of A, B, and C. Now, tell me this isn' confusing.

Comment by Daniel Alvarez Arribas [ 03/Apr/11 ]

To support my point, Scalar expressions or Literals do not have a "name" either.

Comment by Benjamin Eberlei [ 03/Apr/11 ]

Its a requirement to put it on 0 the way how the hydration algorithm works. You would have to add some additional variables to the ResultSetMapping to save the name of an alias or the order with regards to scalar values, which adds additional variables that need to be cached and serialized/unserialized and going by a convention is just easier here.

Scalar expressions and literals can use AS name, and start by a scalar counter of 1 per the requirement that objects start at 0. Multiple objects on the main level come in multiple result rows, so there is no clash possible here.

This is quite different of how sql works with regards to conventions, but fetching an alias which is actually a set of many columns required this shift in conventions in our opinion.

Comment by Daniel Alvarez Arribas [ 03/Apr/11 ]

Multiple rows... geez, I did not think it would get any worse

In a word: counterintuitive. I am getting what you are explaining, nevertheless I perceive it as a way-too-complicated approach, without recompensating the user with advantages.

I would suggest not relying on names at all for positioning. Names should be names and positions should be positions. In fact, names are not even unique.

Consider this query:

SELECT A.somefield, A.somefield, A.somefield FROM \A

I would expect it to return any number of rows with three elements each. It does not. It returns any number of rows with only one element each. That's because the implicit "names" clash internally, and are aliased to the same value, which apparently is not only used to identify the content, but also the whole output field. So in fact the query shown above is equivalent to

SELECT A.somefield FROM \A

which is clearly not what the user asked for. Maybe it is a matter of "hydration", but in this case it would be more straightforward to give the fields unique indexes automatically like "somefield2", "somefield3".

I run into peculiarities like this one several times a week with Doctrine 2. I am not claiming that there is anything buggy about Doctrine 2 in this case. But I definitely have to say Doctrine 2 could behave much more intuitively. Even if it is intended behaviour, it is a nightmare to understand, and it is not something anyone would come up with intuitively by himself.

In the original query example, I did not ask Doctrine to change the order specified, and would never ever ask for the result being split up into multiple rows. From a user perspective, the behavior is in no way desirable.

From what I would expect, I would simply propose to always keep the ordering of fields in the result just as the user specified it, and always return one row for one match. I do not see any advantages in doing it any different, if not for purely technical reasons related to the existing implementation. What can I say - embrace change

Comment by Benjamin Eberlei [ 03/Apr/11 ]

1. The example with a.somefield, a.somefield, a.somefield breaks down with standard SQL aswell leading to exactly the same result, just one "somefield". This is why i would expect everyone coming from an SQL background working with scalar fields to alias everything, which ends with you getting all named fields and a zero key for the object.

2. You can write your own hydrator if you wish to handle the multiple rows per object in the from statement. In that hydrator you could even handle positioning differently (I don't know if thats possible with the current information though).

You can register it with the Configuration and pass its name to "getResult()" on any AbstractQuery. The default hydration has to stay this way obviously for BC reasons. Only negative thing about writing your own hydrator is that you have to copy paste lots of code because all the logic is inlined into one big method for performance reasons.

I will update the docs about mixed results as they clearly lack detail and statements with regard to the conventions.

Comment by Benjamin Eberlei [ 03/Apr/11 ]

Clarified in the docs.

Comment by Daniel Alvarez Arribas [ 04/Apr/11 ]

If I select the same field multiple times in SQL, I get multiple elements per row in the result set. That's an of-course.

Here's the result from PostgreSQL 8.4.7

NameOfMyDatabase=# select dbID, dbID From WPOMOrder limit 1;
 dbid | dbid
------+------
  363 |  363
(1 row)

And here's the result from MySQL Server 5.0

mysql> select dbID, dbID From WPOMOrder limit 1;
+------+------+
| dbID | dbID |
+------+------+
|  363 |  363 |
+------+------+
1 row in set (0.00 sec)

There is no intuitiveness whatsoever about the merge of multiple columns into a single one, neither in SQL, nor by any reasonable common-sense standard. If I specify it twice, I get it twice. And that's the only thing to expect, having explicitly defined it in the query. Now, imagine a user has to "register with the configuration" or "write his own "hydrator""...

Even using Aliasing, the result will still have two columns:

Again, PostgreSQL 8.4.7:

NameOfMyDatabase=# select dbID as A, dbID as A From WPOMOrder limit 1;
  a  |  a
-----+-----
 363 | 363
(1 row)

and here's the result from MySQL Server 5.0:

mysql> select dbID as a, dbID as a From WPOMOrder limit 1;
+-----+-----+
| a   | a   |
+-----+-----+
| 363 | 363 |
+-----+-----+
1 row in set (0.00 sec)
Comment by Benjamin Eberlei [ 04/Apr/11 ]

Well yes, but using mysql_fetch_assoc/pg_fetch_assoc you will getone field again of course. ORM has no concept of numeric fetching at least not with the current hydrators.

Comment by Daniel Alvarez Arribas [ 05/Apr/11 ]

Well, that's what this issue is all about: That a concept of numeric fetching would be highly desirable. That there currently is no such concept is obvious.

In case of the db drivers, there is almost always an option to simply return the row as-is, using numeric indexing, e. g. mysql_fetch_row, or pg_fetch_row. This avoids any unwanted merging of columns by name. If the user chooses the *_fetch_assoc functions instead, name clashes are to be expected. But you do not have to, by design, contrary to Doctrine 2, where this IMO highly broken behaviour is currently inherent.

Comment by Daniel Alvarez Arribas [ 05/Apr/11 ]

Not only that, what bugs me primarily is that it reorders the resulting columns at its own liking.

Comment by Benjamin Eberlei [ 05/Apr/11 ]

It seems that everything that does not work as you want is broken. However there is a perfectly legitmate way to work with the hydrator and youre bloating a problem that is just not there, since DQL is primarily about fetching Objects the numerical fetch mechanism is not nearly as important as in the RMDBS world.

Quoting a sentence from the docs that has nothing to do with numerical fetching as ticket title, saying nothing about that you want this as a feature in the ticket description imho doesnt constitute "this is what is this issue about", i.e. a feature request about adding a numerical fetch mode. You can add a feature request ticket for this and I'll look into it.

Additionally there is a consistent way how Doctrine names "unnamed" columns, its was just not documented good enough yet.

Comment by Daniel Alvarez Arribas [ 06/Apr/11 ]

I changed the issue type to "improvement" to not cause offence. The recent discussion was clearly heading in the wrong direction.

Seriously, if the notion of "index 0" has nothing to do with numerical indexing, I don't know what does. The statements "As long as other positions are permissible, I would rather have the element stay at the exact position declared, and not be shuffled around." and "From what I would expect, I would simply propose to always keep the ordering of fields in the result just as the user specified it" are, to me, as clear as it gets about desiring the described characteristic.

I certainly do not mean any offense. All I want is provide you with feedback. Of course, this is not always possible without voicing a different opinion. Issues are, by nature, mostly things that are wanted differently than they are. As I said right up-front, I did and do not mean to report this as a classical bug, because I knew from the documentation that it was intended, but rather as a severely couterintuitive behaviour, which I still insist it is. I did not enter it as an explicit feature request here, because maintaining the ordering specified is already implemented (in case of non-"mixed" results), and the solution of this issue would rather be to simply always maintain the ordering given, and getting rid of a special case of deliberately overriding the given order for "mixed" results. So I could not see the new feature here. Now, considering it, I think "improvement" nails it.

Getting back to what I originally wished would work differently, I wish that selecting the columns (A, B, C) would return result rows of the form (A, B, C) always, and under no circumstances of the form (A, C, B), (C A B) or (B A C). Column names would be just that, and a column having no name would still be rendered at the column position specified by the ordering in the select expression.

Comment by Benjamin Eberlei [ 06/Apr/11 ]

Do you mean the order regardless of keys?

Say SELECT A B C=> $a = array_shift($result); $b = array_shift($result); $c = array_shift($result) ?

Comment by Daniel Alvarez Arribas [ 07/Apr/11 ]

Yes, that's exactly what I mean (ignoring the shifting side effect on the result of course).

It would be:

$A = reset($result);
$B = next($result);
$C = next($result);

And never, e. g.:

$C = reset($result);
$A = next($result);
$B = next($result);

If you'd rely on numeric indexes, it would be:

Basically: SELECT A, B, C -> $A = $result[0], $B = $result[1], $C = $result[2].

Or: list ($A, $B, $C) = $result;

And never: SELECT A, B, C -> $A = $result[2], $B = $result[0], $C = $result[1].

Or: list ($C, $A, $B) = $result;

Numeric indexing is, though, of course not guaranteed to correctly identify positioning under all circumstances due to the associative-table nature of PHP arrays. E. g. if you populate an array with non-continuous numeric keys, say array (3 = > 'someValue', 5 = 'someOtherValue'), you end up with an array with numeric key 3 at position 0, where the key is clearly not suitable to identify the element's position. But provided you assign numeric keys to the elements in such a way that the numeric keys coincide with the element positions in the array (i. e. the element at position 0 has numeric key 0, and the element at position 1 has numeric key 1), then, of course, you could use numeric keys internally to identify element positions. Then again, you do not need to, the approach of using the de-facto element positions should be more fail-safe.

Comment by Daniel Alvarez Arribas [ 07/Apr/11 ]

Which reminds me... I will have to double-check that it is not the list() calls I do on the result that are screwing things up.

Comment by Daniel Alvarez Arribas [ 08/May/11 ]

Sorry for the late update.

I double checked that the effect is not caused by using the list construct on the result in application code.

For DQL queries of the following form:

SELECT someEntityReference.someStringValuedField, someEntityReference
    FROM \SomeEntity someEntityReference

the fields in the result rows will have got shuffled around as soon as the result is returned by Doctrine 2, i. e. before any list() function or the like gets called on it in application code.

Calling gettype() on the result elements, like:

$result = $aQueryOfTheFormShownAbove->getResult();

$firstResultElement = reset($result);

foreach ($firstResultElement as $field) {
  
   echo gettype($field) . "\n";
}

will output:

object
string

whereas the DQL query explicitly asked for a result of the form (string, object), not (object, string).

To my best knowledge, the foreach construct will rely only on the actual positioning regardless of the key values in the input array.

So the situation is just as I have previously described.

Comment by Benjamin Eberlei [ 31/Oct/11 ]

Result keys Will be implemented as part of DDC-1424

Comment by Guilherme Blanco [ 14/Nov/11 ]

Improvement implemented in https://github.com/doctrine/doctrine2/commit/81cc6d9da83d217ea62bd467053fd9885d1083cd

Generated at Thu Sep 18 03:51:22 UTC 2014 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.