MySQL internal order of operations in SELECT query - mysql

What is the internal order of operations in a MySQL SELECT query and a relational query?
For instance, a SELECT query to a single table:
SELECT `name`
FROM `users`
WHERE `publication_count`>0
ORDER BY `publication_count` DESC
I know that at first all table fields are fetched and then only name field is left at the end. Does it happen before or after the condition in WHERE is applied? When is ORDER BY applied?
A relational query using two tables:
SELECT `users`.`name`, `post`.`text`
FROM `users`, `posts`
WHERE `posts`.`author_id`=`user`.`id`
ORDER BY `posts`.`date` DESC
Same question. What happens after what? (I know that at first the Cartesian product is generated)

Processing regarding your example simplifying the rules goes as follow:
1. FROM -- all elements in list (including multiple tables)
2. WHERE -- discard rows not matching conditions
3. SELECT -- output rows are computed (not fetched)
4. ORDER BY -- sort output rows
Also, you shouldn't be using old-fashioned implicit join syntax in WHERE condition. Instead, please use JOIN:
SELECT ...
FROM users
INNER JOIN posts ON users.id = posts.author_id
ORDER BY ...

Related

How to maintain the order of the parameters on the return [duplicate]

I'm selecting a set of account records from a large table (millions of rows) with integer id values. As basic of a query as one gets, in a sense. What I'm doing us building a large comma separated list, and passing that into the query as an "in" clause. Right now the result is completely unordered. What I'd like to do is get the results back in the order of the values in the "in" clause.
I assume instead I'll have to build a temporary table and do a join instead, which I'd like to avoid, but may not be able to.
Thoughts? The size of the query right now is capped at about 60k each, as we're trying to limit the output size, but it could be arbitrarily large, which might rule out an "in" query anyway from a practical standpoint, if not a physical one.
Thanks in advance.
Actually, this is better:
SELECT * FROM your_table
WHERE id IN (5,2,6,8,12,1)
ORDER BY FIELD(id,5,2,6,8,12,1);
heres the FIELD documentation:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_field
A bit of a trick....
SELECT * FROM your_table
WHERE id IN (5,2,6,8,12,1)
ORDER BY FIND_IN_SET(id,'5,2,6,8,12,1') DESC;
note that the list of ID's in the find_in_set is a string, so its quoted.
Also note that without DESC, they results are returned in REVERSE order to what the list specified.
If your query is 60K, that's a sign that you're doing it the wrong way.
There is no other way to order the result set than by using an ORDER BY clause. You could have a complicated CASE clause in your order by listing all the elements in your IN clause again, but then your query would probably be 120K.
I know you don't want to, but you should put the values in the IN clause in a table or a temporary table and join with it. You can also include a SortOrder column in the temporary table, and order by that. Databases like joins. Doing it this way will help your query to perform well.
This is what I get for mysql 8.0. It seems opposite to above answers.
sort in same order as list specified:
SELECT * FROM your_table
WHERE id IN (5,2,6,8,12,1)
ORDER BY FIND_IN_SET(id,'5,2,6,8,12,1');
sort in reverse order as list specified:
SELECT * FROM your_table
WHERE id IN (5,2,6,8,12,1)
ORDER BY FIND_IN_SET(id,'5,2,6,8,12,1') DESC;
You're first query surely uses an order by clause. So, you could just do a join, and use the same order by clause.
For example, if this was your first query
SELECT customer_id
FROM customer
WHERE customer_id BETWEEN 1 AND 100
ORDER
BY last_name
And this was your second query
SELECT inventory_id
FROM rental
WHERE customer_id in (...the ordered list...)
Combined would be
SELECT r.inventory_id
FROM rental r
INNER
JOIN customer c
ON r.customer_id = c.customer_id
WHERE c.customer_id BETWEEN 1 AND 100
ORDER
BY c.last_name
This is what worked for me
SELECT * FROM your_table
WHERE id IN ('5','2','6','8','12','1')
ORDER BY FIELD(id,'5','2','6','8','12','1');
I added the ids in quotes

MySQL GROUP BY ignores the ORDER BY and always returns the 1st row

I have read through tons of similar questions and none is answering what is wrong with mine.
I want to select the entire row that includes the maximum value of one of the columns for each group.
SELECT * FROM (
SELECT t1.* FROM `t1` JOIN `t2` ON t2.id=t1.raceId ORDER BY t1.points DESC
) AS new GROUP BY new.athleteId ORDER BY new.points DESC
This works, giving me a single row for each athlete, but the row it shows is just the earliest row in the DB, not the row with the maximum points.
The sub query alone shows all the rows in the correct order, but when I try to group them, it still takes the earliest row and ignores the ordering.
I can retrieve the maximum points for each grouping, but the rest of the row info still comes from the earliest entry.
The GROUP BY clause is meant to be used with aggeregate functions.
What is it that you are trying to achieve with the GROUP BY?
Maybe one way to achieve what you're after..
As a general rule of thumb; it's wise if you're using a "GROUP BY" to define what aggregate functions to use. MySQL allows you to group by without aggerate functions defined but i've found this to be very confusing whiteout being very specific on what I want to aggregate on. Maybe it's because of my background in SQL server and oracle; which DO NOT allow you to use a group by this way...
essentially get the max points for each athlete then join back to your entire data set to limit by that athlete and points. may need to do it by race if you want athlete by race as well, i'm unsure if you want max athlete points by race, but based on the group by/order by I'm guessing not.
SELECT t1.*, t2.*
FROM (SELECT athlete, max(t1.points)
FROM `t1`
INNER JOIN `t2` ON t2.id=t1.raceId
GROUP BY athlete) new
INNER JOIN `t1` on T1.athletID = new.athletID
and t1.points = new.points
INNER JOIN JOIN `t2` ON t2.id=t1.raceId
ORDER BY new.points DESC
Another way depending on version of mySQL would be to use analytic functions along with aggregate functions... but w/o version number, i'll not go into detail.

Mysql: Why is WHERE IN much faster than JOIN in this case?

I have a query with a long list (> 2000 ids) in a WHERE IN clause in mysql (InnoDB):
SELECT id
FROM table
WHERE user_id IN ('list of >2000 ids')
I tried to optimize this by using an INNER JOIN instead of the wherein like this (both ids and the user_id use an index):
SELECT table.id
FROM table
INNER JOIN users ON table.user_id = users.id WHERE users.type = 1
Surprisingly, however, the first query is much faster (by the factor 5 to 6). Why is this the case? Could it be that the second query outperforms the first one, when the number of ids in the where in clause becomes much larger?
This is not Ans to your Question but you may use as alternative to your first query, You can better increase performance by replacing IN Clause with EXISTS since EXISTS performance better than IN ref : Here
SELECT id
FROM table t
WHERE EXISTS (SELECT 1 FROM USERS WHERE t.user_id = users.id)
This is an unfair comparison between the 2 queries.
In the 1st query you provide a list of constants as a search criteria, therefore MySQL has to open and search only table and / or 1 index file.
In the 2nd query you instruct MySQL to obtain the list dynamically from another table and join that list back to the main table. It is also not clear, if indexes were used to create a join or a full table scan was needed.
To have a fair comparison, time the query that you used to obtain the list in the 1st query along with the query itself. Or try
SELECT table.id FROM table WHERE user_id IN (SELECT users.id FROM users WHERE users.type = 1)
The above fetches the list of ids dynamically in a subquery.

Remove Duplicate record from Mysql Table using Group By

I have a table structure and data below.
I need to remove duplicate record from the table list. My confusion is that when I am firing query
SELECT * FROM `table` GROUP BY CONCAT(`name`,department)
then giving me correct list(12 records).
Same query when I am using the subquery:
SELECT *
FROM `table` WHERE id IN (SELECT id FROM `table` GROUP BY CONCAT(`name`,department))
It returning all record which is wrong.
So, My question is why group by in subquery is not woking.
Actually as Tim mentioned in his answer that it to get first unique record by group by clause is not a standard feature of sql but mysql allows it till mysql5.6.16 version but from 5.6.21 it has been changed.
Just change mysql version in your sql fiddle and check that you will get what you want.
In the query
SELECT * FROM `table` GROUP BY CONCAT(`name`,department)
You are selecting the id column, which is a non-aggregate column. Many RDBMS would give you an error, but MySQL allows this for performance reasons. This means MySQL has to choose which record to retain in the result set. Based on the result set in your original problem, it appears that MySQL is retaining the id of the first duplicate record, in cases where a group has more than one member.
In the query
SELECT *
FROM `table`
WHERE id IN
(
SELECT id FROM `table` GROUP BY CONCAT(`name`,department)
)
you are also selecting a non-aggregate column in the subquery. It appears that MySQL actually decides which id value to be retained in the subquery based on the id value in the outer query. That is, for each id value in table, MySQL performs the subquery and then selectively chooses to retain a record in the group if two id values match.
You should avoid using a non-aggregate column in a query with GROUP BY, because it is a violation of the ANSI standard, and as you have seen here it can result in unexpected results. If you give us more information about what result set you want, we can give you a correct query which will avoid this problem.
I welcome anyone who has documentation to support these observations to either edit my question or post a new one.
You can JOIN the grouped ids with that of table ids, so that you can get desired results.
Example:
SELECT t.* FROM so_q32175332 t
JOIN ( SELECT id FROM so_q32175332
GROUP BY CONCAT( name, department ) ) f
ON t.id = f.id
ORDER BY CONCAT( name, department );
Here order by was added just to compare directly the * results on group.
Demo on SQL Fiddle: http://sqlfiddle.com/#!9/d715a/1

Can I maintain the output of INNER JOIN to be sorted based on the order of the left side table (the 1st table)?

In PostgreSQL - in the following query I perform an INNER JOIN on two tables -
the 1st table (patient_bvi_p) is SORTED. I extract the gene name (a simple string) from the "id4" column and then using this value for performing the INNER JOIN with the 2nd table (geneexpressoin17p).
My issue is that after performing the INNER JOIN the result of my query is all scrambled.
The rows are no longer being sorted based on the left hand table (patient_bvi_p) while I really need/want them to be.
Can someone please explain what is the behavior one should expect after performing an INNER JOIN? Shouldn't the output be sorted in the same way the the left (/first) table was sorted?
Is there a way to maintain somehow the original order? OR - I should always assume that after INNER JOIN the resultant output is unsorted (=scrambled) - and therefore I should perform an extra sorting step AFTER the doing the the INNER join?...
My motivation is basically to avoid an extra sorting step and to rely on the original order of my first table.
select
t1.* ,
bvi_d_exp,
bvi_r_exp,
bvi_exp.bvi_lr_rvd
into Patient_bvi_p_exp
from
(
select split_part(id4, '#', 3) genes, *
from patient_bvi_p
) t1
inner join (
select
genename,
bvi_d_exp,
bvi_r_exp,
bvi_lr_rvd
from geneexpression17p
) bvi_exp on lower(t1.genes) = lower(bvi_exp.genename)
The order of rows in a query output is undefined if there is no order by clause. Postgres will output in any way it sees fit. If you want the output to be ordered you must specify an order by. In other words, you should not rely on output order like you describe, it could change if it is not specified. That said, in your example:
select t1.* ,bvi_d_exp,bvi_r_exp,bvi_exp.bvi_lr_rvd
into Patient_bvi_p_exp
from (select split_part(id4, '#', 3)genes,* from patient_bvi_p)
t1 inner join (select genename,bvi_d_exp,bvi_r_exp,bvi_lr_rvd
from geneexpression17p) bvi_exp on lower(t1.genes)= lower(bvi_exp.genename);
I think you are saying that if you do this:
select * from Patient_bvi_p_exp;
You get random ordering. Yes, that is true. Again, don't rely on order. However, you could:
select t1.* ,bvi_d_exp,bvi_r_exp,bvi_exp.bvi_lr_rvd
into Patient_bvi_p_exp
from (select split_part(id4, '#', 3)genes,* from patient_bvi_p)
t1 inner join (select genename,bvi_d_exp,bvi_r_exp,bvi_lr_rvd
from geneexpression17p) bvi_exp on lower(t1.genes)= lower(bvi_exp.genename)
order by bvi_d;
And that will cause your table to be ordered by the bvi_d column (or whichever you want). So, a simple select on that table will probably return it in the correct order. Or, if you already ran your first query, you could:
create index whatever on Patient_bvi_p_exp(bvi_d);
cluster Patient_bvi_p_exp using whatever;
And this would physically reorder the table such that a simple select would return it in the order you desire.
I have to say again, you are safer doing:
select * from Patient_bvi_p_exp order by bvi_d;
the 1st table (patient_bvi_p) is SORTED
There is no "sorted" table in SQL. If you want a sorted result then use the order by clause