SQL: What is the default Order By of queries? - mysql

What is the default order of a query when no ORDER BY is used?

There is no such order present. Taken from http://forums.mysql.com/read.php?21,239471,239688#msg-239688
Do not depend on order when ORDER BY is missing.
Always specify ORDER BY if you want a particular order -- in some situations the engine can eliminate the ORDER BY because of how it
does some other step.
GROUP BY forces ORDER BY. (This is a violation of the standard. It can be avoided by using ORDER BY NULL.)
SELECT * FROM tbl -- this will do a "table scan". If the table has
never had any DELETEs/REPLACEs/UPDATEs, the records will happen to be
in the insertion order, hence what you observed.
If you had done the same statement with an InnoDB table, they would
have been delivered in PRIMARY KEY order, not INSERT order. Again,
this is an artifact of the underlying implementation, not something to
depend on.

There's none. Depending on what you query and how your query was optimised, you can get any order. There's even no guarantee that two queries which look the same will return results in the same order: if you don't specify it, you cannot rely on it.

I've found SQL Server to be almost random in its default order (depending on age and complexity of the data), which is good as it forces you to specify all ordering.
(I vaguely remember Oracle being similar to SQL Server in this respect.)
MySQL by default seems to order by the record structure on disk, (which can include out-of-sequence entries due to deletions and optimisations) but it often initially fools developers into not bother using order-by clauses because the data appears to default to primary-key ordering, which is not the case!
I was surprised to discovere today, that MySQL 5.6 and 4.1 implicitly sub-order records which have been sorted on a column with a limited resolution in the opposite direction. Some of my results have identical sort-values and the overall order is unpredictable. e.g. in my case it was a sorted DESC by a datetime column and some of the entries were in the same second so they couldn't be explicitly ordered. On MySQL 5.6 they select in one order (the order of insertion), but in 4.1 they select backwards! This led to a very annoying deployment bug.
I have't found documentation on this change, but found notes on on implicit group order in MySQL:
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well.
However:
Relying on implicit GROUP BY sorting in MySQL 5.5 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause.
So in agreement with the other answers - never rely on default or implicit ordering in any database.

The default ordering will depend on indexes used in the query and in what order they are used. It can change as the data/statistics change and the optimizer chooses different plans.
If you want the data in a specific order, use ORDER BY

Related

What does MySQL do if no aggregating function specified?

Recently I found that despite the fact that patientID is duplicating in my Samples table, the following query works
SELECT * FROM Samples GROUP BY patientID
and returns multiple values for multiple columns.
What aggregation function it uses by default?
First, this is badly formed SQL and you should simply not use it.
But what does it do? It returns a result set with one row per PatientId. The additional columns specified by the SELECT * come from indeterminate rows in the data. There is no guarantee that the extra columns even come from the same row.
In practice, the values seem to come from the first row encountered. However, MySQL is quite explicit that you cannot depend on this behavior. In general, you should avoid using aggregation statements that have unaggregated columns in the SELECT that are not in the GROUP BY. Other databases do not support this syntax (unless the GROUP BY keys form a unique/primary key on the data being aggregated).
MySQL doesn't appear to use an aggregation function at all. The records chosen in this case are indeterminate, as the documentation states:
In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
But you might be wondering why this feature even exists in the first place. If you are writing a query where you know that all the values in a column be the same, then this feature can possibly save you some work by not having to write a join or subquery to make the GROUP BY strictly compliant.
You have tuo use an aggregate function such as SUM, AVG, COUNT
dev.mysql GROUP BY
None. If the ONLY_FULL_GROUP_BY sql mode is not enabled, then MySQL allows
MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
This sql mode is enabled by default rom v5.7.5 only.
Since you've not specified the version of the MySQL server, there are two possible answers.
Prior MySQL 5.7.5, the above query is valid, but with the following comment for all the columns not listed in GROUP BY nor aggregated:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
(https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html)
Since MySQL 5.7.5, this behaviour was changed and MySQL implements the SQL99 standard:
SQL99 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on GROUP BY columns
(https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html)
So some columns could be valid, however the query itself is not valid, since not all columns are functionally dependent on the patientID column (there could be both blood and skin sample).
In general, it is a bad behaviour to use SELECT *, and to not define what to do with all the resulting columns in an aggregating query.
TL;DR;
MySQL prior 5.7.5 will execute the query and the result is unpredictable, MySQL after 5.7.5 will throw an error.

Indeterminate sort ordering - MySQL vs Postgres

I have the following query
SELECT *
FROM table_1
INNER JOIN table_2 ON table_1.orders = table_2.orders
ORDER BY table_2.purchasetime;
The above query result is indeterminate i.e it can change with different queries when the purchase time is of same value as per the MySQL manual itself.To overcome this we give sort ordering on a unique column and combine it with the regular sort ordering.
The customer does not want to see different results with different page refreshes so we have put in the above fix specifically for MySQL which is unnecessary and needs extra compound indexes for both asc and desc.
I am not sure whether the same is applicable for postgres.So far I have not been able to reproduce the scenario.I would appreciate if someone could answer this for postgres or point me in the right direction.
Edit 1 : The sort column is indexed.So assuming the disk data has no ordering, but in the case of index (btree data structure) a constant ordering might be possible with postgres ?
No, it will not be different in PostgreSQL (or, in fact, in any other relational database that I know of).
See http://www.postgresql.org/docs/9.4/static/queries-order.html :
After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.
Even if by accident you manage to find a PostgreSQL version and index that will guarantee the order in all the test you run, please don't rely on it. Any database upgrade, data change or a change in the Maya calendar or the phase of the moon can suddenly upset your sorting order. And debugging it then is a true and terrible pain in the neck.
Your concern seems to be that order by table_2.purchasetime is indeterminate when there are multiple rows with the same value.
To fix this -- in any database or really any computer language -- you need a stable sort. You can turn any sort into a stable sort by adding a unique key. So, adding a unique column (typically an id of some sort) fixes this in both MySQL and Postgres (and any other database).
I should note that instability in sorts can be a very subtle problem, one that only shows up under certain circumstances. So, you could run the same query many times and it is fine. Then you insert or delete a record (perhaps even one not chosen by the query) and the order changes.

What drive the natural result order for an unordered MySQL request

How does mysql return lines when there is no ORDER BY in the request?
What drives the natural order?
There can obviously be many different queries but let's say a simple
select column from table where date < NOW()
There is no natural predictable order when you don't specify one.
Be very careful with this. For all SQL there is no defined implied order. Never count on this. Even if you see a specific behavior at a point in time, that could change in a future release or even with the adding of an index. If you are expecting an order and counting on it, the specify it explicitly.
Problem is that "natural order" of results is often affected completely or partly by the access plan the DB engine uses. For instance, if you do a group by FieldA there is a good chance (not a guarantee) that the results will come back in FieldA sequence. If you do a very simple select chances are the results will be in the sequence they are stored in the database, which may or may not be the order of the IDs or the primary key. IF you don't specify the order it is giving the DB engine the option to do whatever is most convenient for it at the time based on how it got the results. So really does become unpredictable and open to change.
Wish I could explain better, but trying to convey the real randomness of the process form an observer viewpoint.
If the query is using an index, it will prefer the ordering of that index. Group by forces an ordering. This is why combining group by and order can have a performance penalty.
In your case, if you have an index on date, it will probably order by that, hard to say how it handles tie breaks though. For more information, as usual explain the query.
Of course there's a caveat to ordering on the index used as well. If the index is on an autoincremented field and the data was added with prespecified ids, you may find it prefers the order the data was added in.

Rewrite a group-by over a randomly-ordered sub-query using only one select

Here's the thing. I'm having 3 tables, and I'm doing this query:
select t.nomefile, t.tipo_nome, t.ordine
from
(select nomefile, tipo_nome, categorie.ordine
from t_gallerie_immagini_new as immagini
join t_gallerie_new as collezioni on collezioni.id=immagini.id_ref
join t_gallerie_tipi as categorie on collezioni.type=categorie.id
order by RAND()
) as t
group by t.tipo_nome
order by t.ordine
It's applied to 3 tables, all in relationship 1-N, which need to be joined and then take 1 random result from each different result in the higher level table. This query works just fine, the problem is that I'm being asked to rewrite this query USING ONLY ONE SELECT. I've come with another way of doing this with only one select, the thing is that according to SQL sintax the GROUP BY must be before the ORDER BY, so it's pointless to order by random when you already have only the first record for each value in the higher level table.
Someone has a clue on how to write this query using only one select?
Generally, if I am not much mistaken, an ORDER BY clause in the subquery of a query like this has to do with a technique that allows you to pull non-GROUP BY columns (in the outer query) according the order specified. And so you may be out of luck here, because that means the subquery is important to this query.
Well, because in this specific case the order chosen is BY RAND() and not by a specific column/set of columns, you may have a very rough equivalent by doing both the joins and the grouping on the same level, like this:
select nomefile, tipo_nome, categorie.ordine
from t_gallerie_immagini_new as immagini
join t_gallerie_new as collezioni on collezioni.id=immagini.id_ref
join t_gallerie_tipi as categorie on collezioni.type=categorie.id
group by tipo_nome
order by categorie.ordine
You must understand, though, why this is not an exact equivalent. The thing is, MySQL does allow you to pull non-GROUP BY columns in a GROUP BY query, but if they are not correlated to the GROUP BY columns, then the values returned would be... no, not random, the term used by the manual is indeterminate. On the other hand, the technique mentioned in the first paragraph takes advantage of the fact that if the row set is ordered explicitly and unambiguously prior to grouping, then the non-GROUP BY column values will always be the same*. So indeterminateness has to do with the fact that "normally" rows are not ordered explicitly before grouping.
Now you can probably see the difference. The original version orders the rows explicitly. Even if it's BY RAND(), it is intentionally so, to ensure (as much as possible) different results in the output most of the times. But the modified version is "robbed" of the explicit ordering, and so you are likely to get identical results for many executions in a row, even if they are kind of "random".
So, in general, I consider your problem unsolvable for the above stated reasons, and if you choose to use something like the suggested modified version, then just be aware that it is likely to behave slightly differently from the original.
* The technique may not be well documented, by the way, and may have been found rather empirically than by following manuals.
I was not able to understand the reasons behind the request to rewrite this query, however, i found out that there is a solution which uses the "select" word only once. Here's the query:
SELECT g.type, SUBSTRING_INDEX(GROUP_CONCAT(
i.nomefile ORDER BY
RAND()),',',1) nomefile
FROM t_gallerie_new g JOIN t_gallerie_immagini_new i ON g.id=i.id_ref
GROUP BY g.type;
for anyone interested in this question.
NOTE: The use of GROUP_CONCAT has a couple of downsides: It is not recommended to use this keyword when using medium/large tables since it could increase the server side payload. Also, there is a limit to the size of the string returned by GROUP_CONTACT, by default 1024, so, it's necessary to modify a parameter in the mySql server to be able to receive a bigger string from this instruction.

SQL row return order

I have only used SQL rarely until recently when I began using it daily. I notice that if no "order by" clause is used:
When selecting part of a table the rows returned appear to be in the same order as they appear if I select the whole table
The order of rows returned by a selecting from a join seemes to be determined by the left most member of a join.
Is this behaviour a standard thing one can count on in the most common databases (MySql, Oracle, PostgreSQL, Sqlite, Sql Server)? (I don't really even know whether one can truly count on it in sqlite). How strictly is it honored if so (e.g. if one uses "group by" would the individual groups each have that ordering)?
If no ORDER BY clause is included in the query, the returned order of rows is undefined.
Whilst some RDBMSes will return rows in specific orders in some situations even when an ORDER BY clause is omitted, such behaviour should never be relied upon.
Section 20.2 <direct select statement: multiple rows>, subsection "General Rules" of
the SQL-92 specification:
4) If an <order by clause> is not specified, then the ordering of
the rows of Q is implementation-dependent.
If you want order, include an ORDER BY. If you don't include an ORDER BY, you're telling SQL Server:
I don't care what order you return the rows, just return the rows
Since you don't care, SQL Server is going to decide how to return the rows what it deems will be the most efficient manner possible right now (or according to the last time the plan for this specific query was cached). Therefore you should not rely on the behavior you observe. It can change from one run of the query to the next, with data changes, statistics changes, index changes, service packs, cumulative updates, upgrades, etc. etc. etc.
For PostgreSQL, if you omit the ORDER BY clause you could run the exact same query 100 times while the database is not being modified, and get one run in the middle in a different order than the others. In fact, each run could be in a different order.
One reason this could happen is that if the plan chosen involves a sequential scan of a table's heap, and there is already a seqscan of that table's heap in process, your query will start it's scan at whatever point the other scan is already at, to reduce the need for disk access.
As other answers have pointed out, if you want the data in a certain order, specify that order. PostgreSQL will take the requested order into consideration in choosing a plan, and may use an index that provides data in that order, if that works out to be cheaper than getting the rows some other way and then sorting them.
GROUP BY provides no guarantee of order; PostgreSQL might sort the data to do the grouping, or it might use a hash table and return the rows in order of the number generated by the hashing algorithm (i.e., pretty random). And that might change from one run to the next.
It never ceased to amaze me when I was a DBA that this feature of SQL was so often thought of as quirky. Consider a simple program that runs against a text file and produces some output. If the program never changes, and the data never changes, you'd expect the output to never change.
As for this:
If no ORDER BY clause is included in the query, the returned order of rows is undefined.
Not strictly true - on every RDBMS I've ever worked on (Oracle, Informix, SQL Server, DB2 to name a few) a DISTINCT clause also has the same effect as an ORDER BY as finding unique values involves a sort by definition.
EDIT (6/2/14):
Create a simple table
For DISTINCT and ORDER BY, both the plan and the cost is the same since it is ostensibly the same operation to be performed
And not surprisingly, the effect is thus the same