Is it safe to use LIMIT without ORDER BY - mysql

I'm using InnoDB. I need to query to get 10 records from a table is any order.
Is it safe to use LIMIT without ORDER BY? Would it be faster?

If you are not using the ORDER BY then you are not sorting your records, so it will definitely make your data retrieval faster. So if you simply use LIMIT then it would be faster as compared to the data retrieved through ORDER BY. But do note that, in that case the data will not be in any order.
As far as the safety is concerned, I am not sure about which safety you are thinking of, as there is no potential harm in a query which uses only LIMIT and does not uses an ORDER BY clause.
You can also look at the article: ORDER BY … LIMIT Performance Optimization

It depends what you consider as safe- if you want consistent result (meanning, getting the same result everytime as long as the table's content will not change) or if you want specific result (biggests, newest, oldest, whatever)- than this requires order. If by safety you meam that the query wont crush, and you dont care which X results you get- than yes, using limit is fine (this is actually done automatically by many sql tools, like mysql workbench, to speed things up).
In terms of speed- will make it faster without order for two reasons:
ordering takes time.
Using limit allow the server to stop as it finds the first X results. You can see that queries with limit 10 will run faster than limit 100000 on large tables. When you use order, the server must go through all result, so cant stop in the middle.
So yes, using limit without order will make it faster

Yes, you can and yes, it would be faster (assuming the order does not matter to you). order by requires sorting. This means the database has to do more work to get you the result. Most commonly limit is used with order by since want to put some ordering constraints on which 10 records you get (say most recent, highest rank of some sort, etc.)

It's not safe.
Without a order by, the results may not be consistent over consecutive excutions of the same query.
Refer to mysql limit collapse, which result in data interaction
For example:(course_id is the primay key).
Get the first page 10 rows;
select course_id,grade_id from sc_base_course where agency_id = 10000 limit 0,10;
+-----------+----------+
| course_id | grade_id |
+-----------+----------+
| 13 | 1 |
| 6 | 3 |
| 12 | 4 |
| 8 | 2 |
| 7 | 2 |
| 9 | 4 |
| 16 | 1 |
| 1 | 2 |
| 17 | 1 |
| 14 | 5 |
+-----------+----------+
Get the second page 7 rows
select course_id,grade_id from sc_base_course where agency_id = 10000 limit 10,10;
+-----------+----------+
| course_id | grade_id |
+-----------+----------+
| 11 | 4 |
| 12 | 4 |
| 13 | 1 |
| 14 | 5 |
| 15 | 1 |
| 16 | 1 |
| 17 | 1 |
+-----------+----------+

The 3 answers so far are good. But they are not totally correct.
Contrary to the other answers, sometimes leaving out ORDER BY will not make it faster. The optimizer might happen to generate the rows in that order anyway. Examples:
GROUP BY usually orders the results. So, if the ORDER BY would match the GROUP BY, there is no extra effort.
ORDER BY the_primary_key might match the order of fetching. This is almost guaranteed to be the case for InnoDB without a WHERE clause.
"Safe" does not compute.

Related

MySQL Left Join / explain why original order of first table not been kept? [duplicate]

i have a mysql db with a table 'difficulties' with a few records. If i do "select * from difficulties" i get them back in the order they were added, ordered by primary key id:
mysql> select * from difficulties;
+----+-------+-----------+--------+----------+-----------+
| id | value | name | letter | low_band | high_band |
+----+-------+-----------+--------+----------+-----------+
| 1 | 1 | very_easy | VE | 1 | 1 |
| 2 | 2 | easy | E | 2 | 5 |
| 3 | 3 | medium | M | 6 | 10 |
| 4 | 4 | hard | H | 11 | 12 |
| 5 | 0 | na | NA | 0 | 0 |
+----+-------+-----------+--------+----------+-----------+
However, if i do "select name from difficulties" i get them back in a different order:
mysql> select name from difficulties;
+-----------+
| name |
+-----------+
| easy |
| hard |
| medium |
| na |
| very_easy |
+-----------+
My question is: what determines this order? Is there any logic to it? Is it something like "the order the files representing the records happen to be in within the filesystem" or something else that is to all intents and purposes random?
thanks, max
This is correct and by design: if you don't ask for sorting, the server doesn't bother with sorting (sorting can be an expensive operation), and it will return the rows in whatever order it sees fit. Without a requested order, the way the records are ordered can even differ from one query to the next (although that's not too likely).
The order is definitely not random - it's just whatever way the rows come out of the query, and as you see, even minor modifications can change this un-order significantly. This "undefined" ordering is implementation dependent, unpredictable and should not be relied upon.
If you want the elements to be ordered, use the ORDER BY clause (that's its purpose) - e.g.
SELECT name FROM difficulties ORDER BY name ASC;
That will always return the result sorted by name, in ascending order. Or, if you want them ordered by the primary key, last on top, use:
SELECT name FROM difficulties ORDER BY id DESC;
You can even sort by function - if you actually want random order, do this (caveat: horrible performance with largish tables):
SELECT name FROM difficulties ORDER BY RAND();
For more details see this tutorial and the documentation.
As Piskvor said, MySQL will order the query however it finds most convenient. To address the "why" part of your question, the different result orders are probably a side effect of different execution plans. If you have an index on difficulties, the second query would make use of it but the first would not.
Without the ORDER BY clause, the results are returned in random order. However, it seems logical to me that the easiest (and the fastest) way for db engine to return data as it's stored. So it's why the fist resultset is ordered by PK (no fragmentation, logical order is the same as physical). In the second case I would assume that you have an index on field name, and for the query select name from difficulties this index is covering, so db engine scans this index, and it's why you see results ordered by name. Anyway, you shouldn't rely on such "default" ordering.
select name from difficulties should return the values in alphabetical order as it is a text field.
And select * from difficulties will return in numeric order i believe. dont hold me to that lol
best thing to do is use ORDER BY if you care about what order things are

MySQL query results returned are semi-random / inconsistently ordered

I'm working with an ndb cluster setup that uses proxysql. There are 4 mysql servers, 4 data nodes, and 2 management nodes. The following happens when I access one of the mysql servers directly, so I think that I can safely rule out proxysql as the root cause, but beyond that I'm just lost.
Here's a table I set up to help illustrate my problem:
mysql> describe delain;
+----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+----------------+
| album_id | tinyint(2) | NO | PRI | NULL | auto_increment |
| album | varchar(30) | YES | | NULL | |
+----------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
It contains the following data; note that I specified an order by clause:
mysql> select * from delain order by album_id;
+----------+-------------------------+
| album_id | album |
+----------+-------------------------+
| 1 | Lucidity |
| 2 | April Rain |
| 3 | We Are the Others |
| 4 | The Human Contradiction |
| 5 | Moonbathers |
+----------+-------------------------+
5 rows in set (0.00 sec)
If I don't specify an order clause, the results returned are seemingly random, such as this:
mysql> select * from delain;
+----------+-------------------------+
| album_id | album |
+----------+-------------------------+
| 3 | We Are the Others |
| 5 | Moonbathers |
| 1 | Lucidity |
| 2 | April Rain |
| 4 | The Human Contradiction |
+----------+-------------------------+
5 rows in set (0.00 sec)
When I repeat the query (sans order clause) I get a different ordering pretty much every time. It doesn't seem to be truly random, but there sure as heck isn't any sort of discernible pattern to me.
Why is this happening? My experience with mysql has always been that the default ordering is essentially according to the primary key, but this is also the first time I've used an ndb cluster in particular; I don't know if there's a difference there, or if there's a setting inside a config file that got missed or what. Any help is greatly appreciated!
This is standard SQL behavior.
https://mariadb.com/kb/en/library/sql-99/order-by-clause/ says in part:
An ORDER BY clause may optionally appear after a query expression: it specifies the order rows should have when returned from that query (if you omit the clause, your DBMS will return the rows in some random order).
(emphasis mine)
It'd be more accurate to say it will return the rows in some arbitrary order, instead of random order. Random implies that the order will change from one execution to the next.
In the case of InnoDB, the order tends to be the index order in which the rows were accessed. The index it reads is not necessarily the primary key. So the order is unchanging and somewhat predictable if you know something about the internals. But it's not random.
In the case of MyISAM, the order tends to be the order the rows are stored in the table, which can vary depending on the order the rows were inserted, and also depending on where there was space in the file at the time of insertion, after row deletions.
In the case of NDB, I don't know as much about its internals, so I can't describe its rule for "default" order, but it's still true that without an explicit ORDER BY, the storage engine is allowed to return rows in whatever order it wants to.
For NDB the order depends on timing in the case of a
SELECT * from table;
SELECT * from table is implemented as a parallelised
full table scan within the data nodes and their database
threads with one
MySQL thread receiving results.
So with a filtered query like
SELECT * from table where filter_column = 2;
the filter gets evaluated in many threads in parallel.
Each of those threads return rows to the MySQL thread in any
order that depends on OS scheduler, networking and many
other things. So there is no default ordering unless you
use ORDER BY.
So for NDB order is truly random and not just arbitrary.
You'll see this in all NDB test suites using MTR that
queries mostly use SELECT * from table ORDER BY some_field;

Using LIMIT 1 would be useful when there is an unique column on WHERE clause?

All my question is the title above.
Actually I want to know how limit works in mysql? suppose this table:
// colors
+----+-------+
| id | color |
+----+-------+
| 1 | blue |
| 2 | red |
| 3 | green |
| 4 | white |
| 5 | grey |
| 6 | brown |
| 7 | black |
| 8 | pink |
+----+-------+
As you know id column is unique (it is PK). And this is my query:
SELECT color FROM colors WHERE id = 5;
Now I want to know, would query above be more efficient if I use LIMIT 1 in the end of that?
no, because you are retrieving one row,
if you add limit 1 your dbms will count how many rows and decide if it is necessary to limit result or not.
this is an unnecessary work.
MySQL (and all other rdbms) are very efficient locating records based on primary keys. Adding a limit 1 will not have a significant impact in terms of speed.
However, a limit 1 clause will be handy if the above (or similar) query is used a correlated subquery in the select list. Such queries must return a single record and limit 1 will tell MySQL explicitly that the query cannot return more than 1 row.
select ..., (SELECT color FROM colors WHERE id = outer_query_field limit 1)
from ...

mysql returns wrong results with random duplicate values

i need to return the best 5 scores in each category from a table.so far i have tried query below following an example from this site: selecting top n records per group
query:
select
subject_name,substring_index(substring_index
(group_concat(exams_scores.admission_no order by exams_scores.score desc),',',value),',',-1) as names,
substring_index(substring_index(group_concat(score order by score desc),',',value),',',-1)
as orderedscore
from exams_scores,students,subjects,tinyint_asc
where tinyint_asc.value >=1 and tinyint_asc.value <=5 and exam_id=2
and exams_scores.admission_no=students.admission_no and students.form_id=1 and
exams_scores.subject_code=subjects.subject_code group by exams_scores.subject_code,value;
i get the top n as i need but my problem is that its returning duplicates at random which i dont know where they are coming from
As you can see English and Math have duplicates which should not be there
+------------------+-------+--------------+
| subject_name | names | orderedscore |
+------------------+-------+--------------+
| English | 1500 | 100 |
| English | 1500 | 100 |
| English | 2491 | 100 |
| English | 1501 | 99 |
| English | 1111 | 99 |
|Mathematics | 1004 | 100 |
| Mathematics | 1004 | 100 |
| Mathematics | 2722 | 99 |
| Mathematics | 2734 | 99 |
| Mathematics | 2712 | 99 |
+-----------------------------------------+
I have checked table and no duplicates exist
to confirm there are no duplicates in the table:
select * from exams_scores
having(exam_id=2) and (subject_code=121) and (admission_no=1004);
result :
+------+--------------+---------+--------------+-------+
| id | admission_no | exam_id | subject_code | score |
+------+--------------+---------+--------------+-------+
| 4919 | 1004 | 2 | 121 | 100 |
+------+--------------+---------+--------------+-------+
1 row in set (0.00 sec)
same result for English.
If i run the query like 5 times i sometimes end up with another field having duplicate values.
can anyone tell me why my query is behaving this way..i tried adding distinct inside
group_concat(ditinct(exams_scores.admission_no))
but that didnt work ??
You're grouping by exams_scores.subject_code, value. If you add them to your selected columns (...as orderedscore, exams_scores.subject_code, value from...), you should see that all rows are distinct with respect to these two columns you grouped by. Which is the correct semantics of GROUP BY.
Edit, to clarify:
First, the SQL server removes some rows according to your WHERE clause.
Afterwards, it groups the remaining rows according to your GROUP BY clause.
Finally, it selects the colums you specified, either by directly returning a column's value or performing a GROUP_CONCAT on some of the columns and returning their accumulated value.
If you select columns not included in the GROUP BY clause, the returned results for these columns are arbitrary, since the SQL server reduces all rows equal with respect to the columns specified in the GROUP BY clause to one single row - as for the remaining columns, the results are pretty much undefined (hence the "randomness" you're experiencing), because - what should the server choose as a value for this column? It can only pick one randomly from all the reduced rows.
In fact, some SQL servers won't perform such a query and return an SQL error, since the result for those columns would be undefined, which is something you don't want to have in general. With these servers (I believe MSSQL is one of them), you more or less can only have columns in you SELECT clause which are part of your GROUP BY clause.
Edit 2: Which, finally, means that you have to refine your GROUP BY clause to obtain the grouping that you want.

how can I optimize a query with multiple joins (already have indexes)?

SELECT citing.article_id as citing, lac_a.year, r.id_when_cited, cited_issue.country, citing.num_citations
FROM isi_lac_authored_articles as lac_a
JOIN isi_articles citing ON (lac_a.article_id = citing.article_id)
JOIN isi_citation_references r ON (citing.article_id = r.article_id)
JOIN isi_articles cited ON (cited.id_when_cited = r.id_when_cited)
JOIN isi_issues cited_issue ON (cited.issue_id = cited_issue.issue_id);
I have indexes on all the fields being JOINED on.
Is there anything I can do? My tables are large (some 1 Million records, the references tables has 500 million records, the articles table has 25 Million).
This is what EXPLAIN has to say:
+----+-------------+-------------+--------+--------------------------------------------------------------------------+---------------------------------------+---------+-------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+--------------------------------------------------------------------------+---------------------------------------+---------+-------------------------------+---------+-------------+
| 1 | SIMPLE | cited_issue | ALL | NULL | NULL | NULL | NULL | 1156856 | |
| 1 | SIMPLE | cited | ref | isi_articles_id_when_cited,isi_articles_issue_id | isi_articles_issue_id | 49 | func | 19 | Using where |
| 1 | SIMPLE | r | ref | isi_citation_references_article_id,isi_citation_references_id_when_cited | isi_citation_references_id_when_cited | 17 | mimir_dev.cited.id_when_cited | 4 | Using where |
| 1 | SIMPLE | lac_a | eq_ref | PRIMARY | PRIMARY | 16 | mimir_dev.r.article_id | 1 | |
| 1 | SIMPLE | citing | eq_ref | PRIMARY | PRIMARY | 16 | mimir_dev.r.article_id | 1 | |
+----+-------------+-------------+--------+--------------------------------------------------------------------------+---------------------------------------+---------+-------------------------------+---------+-------------+
5 rows in set (0.07 sec)
If you realy need all the returned data, I would suggest two things:
You, probably, know the data better than MySQL and you can try to make advantage of it if MySQL is not correct in its assumptions. Currently, MySQL thinks that it is easier to full scan the whole isi_issues table at the beginning, and if the result is really going to include all issues, than the assumption is correct. But if there are many issues that should not be in the result, you may want to force another order of the joins that you consider more correct. It is you, who knows which table applies the strongest restrictions and which are the smallest to full scan (you will anyway need to full scan something, since there is no WHERE clause).
You can make profit from covering indexes (that is indexes that contain enough data in itself and not needing to touch the row data). For example, having an index (article_id, num_citations) on isi_articles and (article_id, year) on isi_lac_authored_articles and even (country) on isi_issues will significantly speed up that query as long as the indexes fit in memory, but, from the other side, will make you indexes larger and slightly slow dow inserts into the table.
i think it's the best you can do. i mean at least it's not using nested/multiple queries. you should do a little benchmark on the sql. you could at least limit your results at the least as possible. 15-30 rows for a return set is pretty fine per page (this depends on the app, but 15-30 for me is the tolerance range)
i believe in mySQL (phpMyAdmin, console, GUI whatever) they return some sort of "execution time" which is the time that it took to the query to process. compare that with a benchmark of the query using your server-side code. then compare that with the query run using the server-side code and outputting it with your app interface included after that.
by this, you can see where your bottle-neck is - that is where you optimize.
Unless the result of your query is input to some other query or system, it is useless to return that much(3M) rows. That would be clever to return just an acceptable amount of rows per query(like 1000) that is for visualizing.
Looking at your SQL - the lack of a WHERE clause means it is pulling all rows from:
JOIN isi_issues cited_issue ON (cited.issue_id = cited_issue.issue_id)
You could look at partitioning the large isi_issues table, this would allow MySQL to perform a bit quicker (smaller files are easier to handle)
Or alternatively you can loop the statement and use a LIMIT clause.
LIMIT 0,100000
then
LIMIT 100001, 200000
This will let the statements run quicker and you can deal with the data in batches.