Imagine I have a query like this:
Select
(SELECT a FROM table_10 LIMIT 1) AS sb1,
(SELECT a FROM table_11 WHERE a=sb1 LIMIT 1) AS sb2,
(SELECT a FROM table_12 WHERE a=sb2 LIMIT 1) AS sb3
FROM my_table WHERE 1
I far as I found out the values for sb1,sb2 and sb3 are not saved in the memory and when the second sub-query refers to sb1 it re-runs the first sub-query again. when the third sub-query refers to sb2, the second sub-query re-runs thus the first one will re-run many times.
My reason for this is when I hard code the result instead of referring to the result of sb1 and sb2 I see a very huge difference in query time. (Like 30 seconds!)
My first question: Am I right?
My second question: How can I force mysql to keep the value in sb1 and sb2 and not to run the query each time?
My third question: If I'm not right, then what is causing this difference in time and performance?
How can I force mysql to keep the value in sb1 and sb2 and not to run the query each time?
Convert your correlated queries to JOIN. Formally (ignoring ambiguities) it will be
Select
table_10.a AS sb1,
table_11.a AS sb2,
table_12.a AS sb3
FROM my_table
CROSS JOIN table_10
INNER JOIN table_11 ON a=sb1
INNER JOIN table_12 ON a=sb2
WHERE 1
LIMIT 1
PS. LIMIT without ORDER BY makes no sense. Both in original code and provided one.
PPS. Specify table alias for EACH column name.
Related
I have read through tons of similar questions and none is answering what is wrong with mine.
I want to select the entire row that includes the maximum value of one of the columns for each group.
SELECT * FROM (
SELECT t1.* FROM `t1` JOIN `t2` ON t2.id=t1.raceId ORDER BY t1.points DESC
) AS new GROUP BY new.athleteId ORDER BY new.points DESC
This works, giving me a single row for each athlete, but the row it shows is just the earliest row in the DB, not the row with the maximum points.
The sub query alone shows all the rows in the correct order, but when I try to group them, it still takes the earliest row and ignores the ordering.
I can retrieve the maximum points for each grouping, but the rest of the row info still comes from the earliest entry.
The GROUP BY clause is meant to be used with aggeregate functions.
What is it that you are trying to achieve with the GROUP BY?
Maybe one way to achieve what you're after..
As a general rule of thumb; it's wise if you're using a "GROUP BY" to define what aggregate functions to use. MySQL allows you to group by without aggerate functions defined but i've found this to be very confusing whiteout being very specific on what I want to aggregate on. Maybe it's because of my background in SQL server and oracle; which DO NOT allow you to use a group by this way...
essentially get the max points for each athlete then join back to your entire data set to limit by that athlete and points. may need to do it by race if you want athlete by race as well, i'm unsure if you want max athlete points by race, but based on the group by/order by I'm guessing not.
SELECT t1.*, t2.*
FROM (SELECT athlete, max(t1.points)
FROM `t1`
INNER JOIN `t2` ON t2.id=t1.raceId
GROUP BY athlete) new
INNER JOIN `t1` on T1.athletID = new.athletID
and t1.points = new.points
INNER JOIN JOIN `t2` ON t2.id=t1.raceId
ORDER BY new.points DESC
Another way depending on version of mySQL would be to use analytic functions along with aggregate functions... but w/o version number, i'll not go into detail.
I can't figure out why the 'order by' clauses in Query2 below causes it to take over a minute while the first one returns results instantly. Is there a better way to do this 'order by'
Fast:
select c.id, max(date(a.sent)) as sent,
if(c.id in (select id from bin where (num=1 or num=2)),1,0) as done
from test c, test2 a
where c.id=a.id
group by c.id
limit 1;
Slow
select c.id, max(date(a.sent)) as sent,
if(c.id in (select id from bin where (num=1 or num=2)),1,0) as done
from test c, test2 a
where c.id=a.id
group by c.id
order by done, sent
limit 1;
It's because the "columns" in the order by clause are not real columns, but aliases for calculations elsewhere in the query. Thus, they aren't indexed, and the server has to order them on the fly. Using a join for the calculation of done, rather than a subquery, would likely speed this up a lot.
If you were bringing back all records, the sorting should not take much time, even though they are computed / non-indexed fields. However, you are using "Limit 1". This changes the approach of the optimizer.
In the first case, you are ordering by an ID. Since you have "limit 1" and the ID probably has an index, the optimizer can go ID by ID and when it gets one records that matches the WHERE clause it can return.
However, in the second query, even though you only want 1 record, the optimizer does not know which one that will be unless it computes the entire set (as if you did not have "limit 1") and then returns only the first one.
Take off the "LIMIT 1" and compare the two queries. If the difference remains, it may be a different problem.
It is difficult to say what would work best with your volumes. Try this query:
select id, max(date(sent)) as sent, 0 As done
from test2
where exists (select 1 from bin where bin.id=test2.id and num not in (1,2))
group by id
union all
select id, max(date(sent)) as sent, 1 As done
from test2
where exists (select 1 from bin where bin.id=test2.id and num in (1,2))
group by id
order by done, sent
limit 1
SQL Fiddle is here if you want to tweak it.
I left out the test table, because you were not bringing back any field except ID, which is already on test2. If you need other fields from test, you will have to tweak it.
If I have a query like
SELECT (MAX(b.A)/a.A) AS c
FROM a
INNER JOIN b
ON a.b_id = b.id
GROUP by a.id
does MySQL evaluate the value for "MAX(b.A)" for every row or only once?
It's just of interest to me if there is room for performance improvement or not.
Thanks!
UPDATE
OK let's move on to a real world example: I want to calculate the proportional value of a users likes compared to max-user-likes.
The query to only read the max value of users.likes (which is indexed) takes 0.0003
SELECT MAX(likes)
FROM users
So I now know the value of max-user-likes, let's say it's 10000 so I could query like this which takes 0.0007s:
SELECT (users.likes/10000) AS weight
FROM posts
INNER JOIN users ON posts.author_id = users.id
So one would expect to have both queries together to be something like 0.0003 + 0.0007s, but it takes 0.3s:
SELECT (users.likes/(SELECT MAX(likes) FROM users)) AS weight
FROM posts
INNER JOIN users ON posts.author_id = users.id
So something seems still wrong with my database - any suggestions?
Since you have no GROUP BY clause, the result will only have one row and you can't know from which row the value of a.A will be. The value of MAX(b.A) will be only evaluated once.
When you have a GROUP BY clause, MAX(b.A) will be evaluated for every group.
In general, the expression within an aggregation function is evaluated and checked for NULL for each row. There certainly can be a optimization for MIN and MAX in case of walking through an index, but I doubt that.
BTW, you can easily check this, when you execute MAX(id) on a large table. You will see that the execution time is the same as for COUNT(id) (and might be much more than COUNT(*) depending on the engine).
I sort the rows on date. If I want to select every row that has a unique value in the last column, can I do this with sql?
So I would like to select the first row, second one, third one not, fourth one I do want to select, and so on.
What you want are not unique rows, but rather one per group. This can be done by taking the MIN(pk_artikel_Id) and GROUP BY fk_artikel_bron. This method uses an IN subquery to get the first pk_artikel_id and its associated fk_artikel_bron for each unique fk_artikel_bron and then uses that to get the remaining columns in the outer query.
SELECT * FROM tbl
WHERE pk_artikel_id IN
(SELECT MIN(pk_artikel_id) AS id FROM tbl GROUP BY fk_artikel_bron)
Although MySQL would permit you to add the rest of the columns in the SELECT list initially, avoiding the IN subquery, that isn't really portable to other RDBMS systems. This method is a little more generic.
It can also be done with a JOIN against the subquery, which may or may not be faster. Hard to say without benchmarking it.
SELECT *
FROM tbl
JOIN (
SELECT
fk_artikel_bron,
MIN(pk_artikel_id) AS id
FROM tbl
GROUP BY fk_artikel_bron) mins ON tbl.pk_artikel_id = mins.id
This is similar to Michael's answer, but does it with a self-join instead of a subquery. Try it out to see how it performs:
SELECT * from tbl t1
LEFT JOIN tbl t2
ON t2.fk_artikel_bron = t1.fk_artikel_bron
AND t2.pk_artikel_id < t1.pk_artikel_id
WHERE t2.pk_artikel_id IS NULL
If you have the right indexes, this type of join often out performs subqueries (since derived tables don't use indexes).
This non-standard, mysql-only trick will select the first row encountered for each value of pk_artikel_bron.
select *
...
group by pk_artikel_bron
Like it or not, this query produces the output asked for.
Edited
I seem to be getting hammered here, so here's the disclaimer:
This only works for mysql 5+
Although the mysql specification says the row returned using this technique is not predictable (ie you could get any row as the "first" encountered), in fact in all cases I've ever seen, you'll get the first row as per the order selected, so to get a predictable row that works in practice (but may not work in future releases but probably will), select from an ordered result:
select * from (
select *
...
order by pk_artikel_id) x
group by pk_artikel_bron
This is simplified version of a relatively complex problem that myself and my colleagues can't quite get our heads around.
Consider two tables, table_a and table_b. In our CMS table_a holds metadata for all the data stored in the database, and table_b has some more specific information, so for simplicity's sake, a title and date column.
At the moment our query looks like:
SELECT *
FROM `table_a` LEFT OUTER JOIN `table_b` ON (table_a.id = table_b.id)
WHERE table_a.col = 'value'
ORDER BY table_b.date ASC
LIMIT 0,20
This degrades badly when table_a has a large amount of rows. If the JOIN is changed RIGHT OUTER JOIN (which triggers MySQL to use the INDEX set on table_b.date), the query is infinitely quicker, but it doesn't produce the same results (because if table_b.date doesn't have a value, it is ignored).
This becomes an issue in our CMS because if the user sorts on the date column, any rows that don't have a date set yet disappear from the interface, creating a confusing UI experience and makes it difficult to add dates for the rows that missing them.
Is there a solution that will:
Use table_b.date's INDEX so that
the query will scale better
Somehow retain those rows in
table_b that don't have a date
set so that a user can enter the
data
I'm going to second ArtoAle's comment. since the order by applies to a null value in the outer join for missing rows in table_b, those rows will be out of order anyway.
The simulated outer join is the ugly part, so lets look at that first. Mysql doesn't have except, so you need to write the query in terms of exists.
SELECT table_a.col1, table_a.col2, table_a.col3, ... NULL as table_b_col1, NULL as ...
FROM
table_a
WHERE
NOT EXISTS (SELECT 1 FROM table_a INNER JOIN table_b ON table_a.id = table_b.id);
Which should be UNION ALLed with the original query as an inner join. The UNION_ALL is needed to preserve the original order.
This sort of query is probably going to be dog-slow no matter what you do, because there won't be an index that readily supports a "Foreign Key not present" sort of query. This basically boils down to an index scan in table_a.id with a lookup (Or maybe a parallel scan) for the corresponding row in table_b.id.
So we ended up implemented a different solution that while the results were not as good as using an INDEX, it still provided a nice speed boost of around 25%.
We remove the JOIN and instead used an ORDER BY subquery:
SELECT *
FROM `table_a`
WHERE table_a.col = 'value'
ORDER BY (
SELECT date
FROM table_b
WHERE id = table_a.id
) ASC
LIMIT 0,20