I was under the impression that using an ORDER BY in an SQL query would not affect which records were selected for the result set. I thought that ORDER BY would only affect the presentation of the result set.
Recently, however, I was getting unexpected results from a query until I used an ORDER BY clause. This suggests that either a) ORDER BY can affect which records are included in the result set, or b) I have some other bug which I need to work on.
Which is it?
Here's the query: SELECT node_id FROM users ORDER BY node_id LIMIT 100
(node_id is both a primary key and foreign key).
As you can see, the query includes a LIMIT clause. It seems that if I use the ORDER BY, the records are ordered before the top 100 are selected. I had expected it to select 100 records based on natural order, then order them according to node_id.
I've looked for info on ORDER BY but as yet, the only info I can find suggests that it affects presentation only... I am using MySQL.
ORDER BY reflects the order of all of the records before the LIMIT Clause. To get the result you want you will need this:
select u.node_id
from users u
join
(
SELECT node_id
FROM users
LIMIT 100
) us ON u.node_id = us.node_id
ORDER BY u.node_id
This way you will use the limit clause first and get the top 100 records and then you will sort the result of that. The join clause is faster than a double Select statement especially if you are working with many records.
You can use a nested query:
SELECT node_id FROM
(
SELECT node_id FROM users LIMIT 100
) u
ORDER BY node_id
Related
I'm selecting a set of account records from a large table (millions of rows) with integer id values. As basic of a query as one gets, in a sense. What I'm doing us building a large comma separated list, and passing that into the query as an "in" clause. Right now the result is completely unordered. What I'd like to do is get the results back in the order of the values in the "in" clause.
I assume instead I'll have to build a temporary table and do a join instead, which I'd like to avoid, but may not be able to.
Thoughts? The size of the query right now is capped at about 60k each, as we're trying to limit the output size, but it could be arbitrarily large, which might rule out an "in" query anyway from a practical standpoint, if not a physical one.
Thanks in advance.
Actually, this is better:
SELECT * FROM your_table
WHERE id IN (5,2,6,8,12,1)
ORDER BY FIELD(id,5,2,6,8,12,1);
heres the FIELD documentation:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_field
A bit of a trick....
SELECT * FROM your_table
WHERE id IN (5,2,6,8,12,1)
ORDER BY FIND_IN_SET(id,'5,2,6,8,12,1') DESC;
note that the list of ID's in the find_in_set is a string, so its quoted.
Also note that without DESC, they results are returned in REVERSE order to what the list specified.
If your query is 60K, that's a sign that you're doing it the wrong way.
There is no other way to order the result set than by using an ORDER BY clause. You could have a complicated CASE clause in your order by listing all the elements in your IN clause again, but then your query would probably be 120K.
I know you don't want to, but you should put the values in the IN clause in a table or a temporary table and join with it. You can also include a SortOrder column in the temporary table, and order by that. Databases like joins. Doing it this way will help your query to perform well.
This is what I get for mysql 8.0. It seems opposite to above answers.
sort in same order as list specified:
SELECT * FROM your_table
WHERE id IN (5,2,6,8,12,1)
ORDER BY FIND_IN_SET(id,'5,2,6,8,12,1');
sort in reverse order as list specified:
SELECT * FROM your_table
WHERE id IN (5,2,6,8,12,1)
ORDER BY FIND_IN_SET(id,'5,2,6,8,12,1') DESC;
You're first query surely uses an order by clause. So, you could just do a join, and use the same order by clause.
For example, if this was your first query
SELECT customer_id
FROM customer
WHERE customer_id BETWEEN 1 AND 100
ORDER
BY last_name
And this was your second query
SELECT inventory_id
FROM rental
WHERE customer_id in (...the ordered list...)
Combined would be
SELECT r.inventory_id
FROM rental r
INNER
JOIN customer c
ON r.customer_id = c.customer_id
WHERE c.customer_id BETWEEN 1 AND 100
ORDER
BY c.last_name
This is what worked for me
SELECT * FROM your_table
WHERE id IN ('5','2','6','8','12','1')
ORDER BY FIELD(id,'5','2','6','8','12','1');
I added the ids in quotes
I'm making a sample recent screen that will display a list, it displays the list, with id set as primary key.
I have done the correct query as expected but the table with big amount of data can cause slow performance issues.
This is the sample query below:
SELECT distinct H.id -- (Primary Key),
H.partnerid as PartnerId,
H.partnername AS partner, H.accountname AS accountName,
H.accountid as AccountNo,
FROM myschema.mytransactionstable H
INNER JOIN (
SELECT S.accountid, S.partnerid, S.accountname,
max(S.transdate) AS maxDate
from myschema.mytransactionstable S
group by S.accountid, S.partnerid, S.accountname
) ms ON H.accountid = ms.accountid
AND H.partnerid = ms.partnerid
AND H.accountname =ms.accountname
AND H.transdate = maxDate
WHERE H.accountid = ms.accountid
AND H.partnerid = ms.partnerid
AND H.accountname = ms.accountname
AND H.transdate = maxDate
GROUP BY H.partnerid,H.accountid, H.accountname
ORDER BY H.id DESC
LIMIT 5
In my case, there are values which are similar in the selected columns but differ only in their id's
Below is a link to an image without executing the query above. They are all the records that have not yet been filtered.
Sample result query click here
Since I only want to get the 5 most recent by their id but the other columns can contain similar values
accountname,accountid,partnerid.
I already got the correct query but,
I want to improve the performance of the query. Any suggestions for the improvement of query?
You can try using row_number()
select * from
(
select *,row_number() over(order by transdate desc) as rn
from myschema.mytransactionstable
)A where rn<=5
Don't repeat ON and WHERE clauses. Use ON to say how the tables (or subqueries) are "related"; use WHERE for filtering (that is, which rows to keep). Probably in your case, all the WHERE should be removed.
Please provide SHOW CREATE TABLE
This 'composite' index would probably help because of dealing with the subquery and the JOIN:
INDEX(partnerid, accountid, accountname, transdate)
That would also avoid a separate sort for the GROUP BY.
But then the ORDER BY is different, so it cannot avoid a sort.
This might avoid the sort without changing the result set ordering: ORDER BY partnerid, accountid, accountname, transdate DESC
Please provide EXPLAIN SELECT ... and EXPLAIN FORMAT=JSON SELECT ... if you have further questions.
If we cannot get an index to handle the WHERE, GROUP BY, and ORDER BY, the query will generate all the rows before seeing the LIMIT 5. If the index does work, then the outer query will stop after 5 -- potentially a big savings.
It is a very simple query. For every query, I get a different result. Similar things happen when I used TOP 1. I would like a random sub-sample and it works. But am I missing something? Why does it return a different value every time?
SELECT DISTINCT user_id FROM table1
where day_id>="2009-01-09" and day_id<"2011-02-16"
LIMIT 1;
There's no guarantee that you will get a random result with your query. It's quite likely you'll get the same result each time (although the actual result returned will be indeterminate). To guarantee that you get a random, unique user_id, you should SELECT a random value from the list of DISTINCT values:
SELECT user_id
FROM (SELECT DISTINCT user_id
FROM table1
WHERE day_id >= "2009-01-09" AND day_id < "2011-02-16"
) u
ORDER BY RAND()
LIMIT 1
SQL statements represent unordered sets, add order by clause such as
...
ORDER BY user_id
LIMIT 1
I have a table x :
id lang externalid
1 nl 10
2 nl 11
3 fr 10
From this table I want al the rows for a certain lang and externalid, if the externalid doesn't exist for this lang, I want the row with any other lang.
The subquery sorts the table correct, but when I add the group by, the sort of the subquery is lost. This works in older mysql versions but not in 5.7.
(
SELECT
*
FROM
x
ORDER BY FIELD(lang, "fr") DESC, id
)
as y
group by externalid
I want the query to return the records with id 2 & 3. So for each distinct external id, if possible the lang = 'fr', else any other lang.
How can i solve this problem?
You are talking of given externalid and land. No need to group by externalid hence; use a mere where clause instead.
Combined with ORDER BY and LIMIT you get the record you want (i.e. the desired language if such a record exists, else another one).
select *
from mytable
where externalid = 10
order by lang = 'fr' desc
limit 1;
UPDATE: Okay, according to your comment you want to get the "best" record per externalid. In standard SQL you'd use ROW_NUMBER for this. Other DBMS have further solutions, e.g. Oracle's KEEP FIRST or Postgre's DISTINCT ON. MySQL doesn't support any of these. One way would be to emulate ROW_NUMBER with variables. Another would be to use above query as a subquery per externalid to find the best records:
select *
from mytable
where id in
(
select
(
select m.id
from mytable m
where m.externalid = e.externalid
order by m.lang = 'fr' desc
limit 1
) as best_id
from (select distinct externalid from mytable) e
);
Your subquery generates a result set (a virtual table) that's passed to your outer query.
All SQL queries, without exception, generate their results in unpredictable order unless you specify the order completely in an ORDER BY clause.
Unpredictable is like random, except worse. Random implies you'll get a different order every time you run the query. Unpredictable means you'll get the same order every time, until you don't.
MySQL ordinarily ignores ORDER BY clauses in subqueries (there are a few exceptions, mostly related to subquery LIMIT clauses). Move your ORDER BY to the top level query.
Edit. You are also misusing MySQL's notorious nonstandard extension to GROUP BY.
SELECT *
FROM users
ORDER BY highscore DESC
LIMIT 5 OFFSET 5
and
SELECT *
FROM users
ORDER BY highscore DESC
LIMIT 5 OFFSET 10
returning the same result. And they are different than the records when I omit the LIMIT clause! I searched the community. There are similar questions, but they are of no help.
EDIT: Here are the table data-
Presumably, the issue is that you have ties for highscore. When you have ties, then MySQL orders the rows with the same value in an arbitrary and indeterminate way. Even two runs of the same query can result in different orderings.
Why? The reason is simple. There is no "natural" order for sorting keys with the same value. SQL tables represent unordered sets.
To make the sorting stable, include a unique id as the last key in the ORDER BY:
SELECT u.*
FROM users u
ORDER BY u.highscore DESC, u.userId
LIMIT 5 OFFSET 5;
Then, when you fetch the next 5 rows, they will be different.