I want to use a same subquery multiple times into UNION. This subquery is time consumed and I think that using it a lot of times may will be increased the total time of execution.
For example
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column1 DESC LIMIT 10)
UNION
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column2 DESC LIMIT 10)
UNION
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column3 DESC LIMIT 10)
Does the (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) executed 3 times ?
If mysql is smart enough the internal subquery will be executed only one so I don't need any optimization, but if not I have to use something else to optimize it (like using a temporary table, but I want to avoid it)
Do I have to optimize this query by other syntax ? Any suggestion ?
In practice I want to filter some data from huge records and get some of them in 3 group-sections, each section in different order
Plan A:
A TEMPORARY TABLE cannot be referenced more than once. So, build a permanent table and DROP it when finished. (If you might have multiple connections doing the same thing, it will be a hassle to make sure you are not using the same table name.)
Plan B:
With MySQL 8.0, you can do
WITH T AS ( SELECT ... )
SELECT ... FROM T ORDER BY col1
UNION ...
Plan C:
If it is possible to do this:
SELECT id FROM A
ORDER BY col1 LIMIT 10
You could use that as a 'derived' table inside
(SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS)
Something like
SELECT A.*, B.*
FROM ( SELECT id FROM A
ORDER BY col1 LIMIT 10 ) AS x1
JOIN A USING(id)
JOIN B ... AND SOME COMPLEX WHERE CONDITIONS
Similarly for the other two SELECTs, then UNION them together.
Better yet, UNION together the 3 sets of ids, then JOIN to A and B once.
This may have the advantage of dealing with fewer rows.
Related
If I have the following two tables:
Table "a" with 2 columns: id (int) [Primary Index], column1 [Indexed]
Table "b" with 3 columns: id_table_a (int),condition1 (int),condition2 (int) [all columns as Primary Index]
I can run the following query to select rows from Table a where Table b condition1 is 1
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id_table_a=a.id && condition1=1 LIMIT 1) ORDER BY a.column1 LIMIT 50
With a couple hundred million rows in both tables this query is very slow. If I do:
SELECT a.id FROM a INNER JOIN b ON a.id=b.id_table_a && b.condition1=1 ORDER BY a.column1 LIMIT 50
It is pretty much instant but if there are multiple matching rows in table b that match id_table_a then duplicates are returned. If I do a SELECT DISTINCT or GROUP BY a.id to remove duplicates the query becomes extremely slow.
Here is an SQLFiddle showing the example queries: http://sqlfiddle.com/#!9/35eb9e/10
Is there a way to make a join without duplicates fast in this case?
*Edited to show that INNER instead of LEFT join didn't make much of a difference
*Edited to show moving condition to join did not make much of a difference
*Edited to add LIMIT
*Edited to add ORDER BY
You can try with inner join and distinct
SELECT distinct a.id
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
but using distinct on select * be sure you don't distinct id that return wrong result in this case use
SELECT distinct col1, col2, col3 ....
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
You could also add a composite index with use also condtition1 eg: key(id, condition1)
if you can you could also perform a
ANALYZE TABLE table_name;
on both the table ..
and another technique is try to reverting the lead table
SELECT distinct a.id
FROM b INNER JOIN a ON a.id=b.id_table_a AND b.condition1=1
Using the most selective table for lead the query
Using this seem different the use of index http://sqlfiddle.com/#!9/35eb9e/15 (the last add a using where)
# USING DISTINCT TO REMOVE DUPLICATES without col and order
EXPLAIN
SELECT DISTINCT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
;
It looks like I found the answer.
SELECT a.id FROM a
INNER JOIN b ON
b.id_table_a=a.id &&
b.condition1=1 &&
b.condition2=(select b.condition2 from b WHERE b.id_table_a=a.id && b.condition1=1 LIMIT 1)
ORDER BY a.column1
LIMIT 5;
I don't know if there is a flaw in this or not, please let me know if so. If anyone has a way to compress this somehow I will gladly accept your answer.
SELECT id FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
Take the condition into the ON clause of the join, that way the index of table b can get used to filter. Also use INNER JOIN over LEFT JOIN
Then you should have less results which have to be grouped.
Wrap the fast version in a query that handles de-duping and limit:
SELECT DISTINCT * FROM (
SELECT a.id
FROM a
JOIN b ON a.id = b.id_table_a && b.condition1 = 1
) x
ORDER BY column1
LIMIT 50
We know the inner query is fast. The de-duping and ordering has to happen somewhere. This way it happens on the smallest rowset possible.
See SQLFiddle.
Option 2:
Try the following:
Create indexes as follows:
create index a_id_column1 on a(id, column1)
create index b_id_table_a_condition1 on b(a_table_a, condition1)
These are covering indexes - ones that contain all the columns you need for the query, which in turn means that index-only access to data can achieve the result.
Then try this:
SELECT * FROM (
SELECT a.id, MIN(a.column1) column1
FROM a
JOIN b ON a.id = b.id_table_a
AND b.condition1 = 1
GROUP BY a.id) x
ORDER BY column1
LIMIT 50
Use your fast query in a subselect and remove the duplicates in the outer select:
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
WHERE b.id_table_a > :offset
ORDER BY a.column1
LIMIT 50
) sub
Because of removing duplicates you might get less than 50 rows. Just repeat the query until you get anough rows. Start with :offset = 0. Use the last ID from last result as :offset in the following queries.
If you know your statistics, you can also use two limits. The limit in the inner query should be high enough to return 50 distinct rows with a probability which is high enough for you.
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
ORDER BY a.column1
LIMIT 1000
) sub
LIMIT 50
For example: If you have an average of 10 duplicates per ID, LIMIT 1000 in the inner query will return an average of 100 distinct rows. Its very unlikely that you get less than 50 rows.
If the condition2 column is a boolean, you know that you can have a maximum of two duplicates. In this case LIMIT 100 in the inner query would be enough.
In a join operation between two subqueries (or a table and a subquery), is it preferred to specify the LIMIT clause in an inner query rather than on the outer query (since the order would determine the amount of rows the DBMS would have to iterate to check the where clause)? like:
((
SELECT id
FROM Table1
WHERE score>=100
LIMIT 10)
AS A NATURAL JOIN Table2
))
would be better than
((
SELECT id
FROM Table1
WHERE score>=100)
AS A NATURAL JOIN Table2
))
LIMIT 10
My thinking is that in the last query, the DBMS first have to iterate (full table or an index) ALL rows in Table1 where score>=100 that can be mapped to Table2 on their common columns (which could be any number of rows), and only after that it will truncate to only 10 rows, whereas in the first query, it will only scan until it has found 10 rows from Table1 that satisfy the where clause and can be mapped to Table2, then stop....
The 2 partial statements are not equivalent. When using LIMIT order matters. If you're placing the limit on Table1 you might never see the rows you would have otherwise seen with limit placed on the whole dataset. Given that disclaimer, it seems like using the limit and then joining would be more efficient, but rule of thumb is you should always measure.
Also consider that instead of joining the SELECT as table, for which MySQL will have to build an internal temporary table, you could join the table itself, i.e.:
SELECT t0.col0, t1.col1
FROM
Table0 t0
JOIN Table1 t1 ON (t0.col0 = t1.col0 AND t1.score >= 100)
which might be even more efficient if you have good indexes and end up using them. But again, you should measure.
I have a series of tables that contain data of similar format. I.e. a UNION would work.
Conceptually you can think of it as 1 table partitioned into multiple tables.
I want to get the data from all of these tables sorted.
Now the problem I have is that the data are too much to be displayed all at once to the user, so I need to display them in portions i.e. pages.
Now my problem is that I need to display the data sorted (as already said).
So if I do something like:
SELECT FROM TABLE_1
UNION
SELECT FROM TABLE_2
UNION
....
SELECT FROM TABLE_N
ORDER BY COL
LIMIT OFFSET, RECORDS;
I would constantly be doing a UNION and ORDER BY to get e.g. the just the corresponding 50 records of the pages on each request.
So how would I most efficiently handle this?
My first attempt would be UNION'ing just a small number of records from each table:
( SELECT FROM table_1 ORDER BY col LIMIT #offset, #records )
UNION
...
( SELECT FROM table_N ORDER BY col LIMIT #offset, #records )
ORDER BY col LIMIT #offset, #records
If the above proves insufficient, I would build a manual index table (based on David Starkey's clever suggestion).
CREATE TABLE index_table (
table_id INT,
item_id INT,
col DATETIME,
INDEX (col, table_id, id)
);
Then populate index_table with a method of your liking (cron job, triggers on tables table_n, ...). Your SELECT statement would then look like this:
SELECT *
FROM ( SELECT * FROM index_table ORDER BY col LIMIT #offset, #records ) AS idx
LEFT JOIN table_1 ON (idx.table_id = 1 AND idx.item_id = table_1.id)
...
LEFT JOIN table_n ON (idx.table_id = n AND idx.item_id = table_n.id)
However, I am not sure of how such a query would perform with so many LEFT JOIN's. It really depends on how many tables table_n there are.
Finally, I would consider merging all tables into one single table.
Assuming table1 and table2 both have a large number of rows (ie several hundred thousand), is the following an inefficient query?
Edit: Order by field added.
SELECT * FROM (
SELECT title, updated FROM table1
UNION
SELECT title, updated FROM table2
) AS query
ORDER BY updated DESC
LIMIT 25
If you absolutely need distinct results, another possibility is to use union all and a group by clause instead:
SELECT title FROM (
SELECT title FROM table1 group by title
UNION ALL
SELECT title FROM table2 group by title
) AS query
group by title
LIMIT 25;
Testing this without the limit clause on an indexed ID column from two tables with ~920K rows each in a test database (at $work) resulted in a bit over a second with the query above and about 17 seconds via a union.
this should be even faster - but then I see no ORDER BY so what 25 records do you actually want?
SELECT * FROM (
SELECT title FROM table1 LIMIT 25
UNION
SELECT title FROM table2 LIMIT 25
) AS query
LIMIT 25
UNION must make an extra pass to fetch the distinct records, so you should use UNION ALL.
Yes, use order by and limits in the inner queries.
SELECT * FROM (
(SELECT title FROM table1 ORDER BY title ASC LIMIT C)
UNION
(SELECT title FROM table2 ORDER BY title ASC LIMIT C)
) AS query
LIMIT 25
This will only go through C rows instead of N (hundreds of thousands). The ORDER BY is necessary and should be on an indexed column.
C is a heuristic constant that should be tuned according to the domain. If you only expect a few duplicates, C=50-100 is probably ok.
You can also find out this for yourself by using EXPLAIN.
I had 3 tables which are not identical to each other. According to one of my requirement I had to copy all these tables records to another table.
That part is okay.What my problem is that the records I inserted is in a order now.
Like
first 100 records from table1
second 100 records from table2
third 100 records from table3
what I wanted to do is change/mix the record positions.Like if i selected first 100 records there should be records from all three table.
selecting data from ORDER BY Rand() is not I want.I just need to select data and display those data.
Is there any way that i can solve this out?Thanks
A great post handling several cases, from simple, to gaps, to non-uniform with gaps.
http://jan.kneschke.de/projects/mysql/order-by-rand/
For most general case, here's how you do it:
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1
This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples
If you don't want to query later on with rand() you could create the table by inserting from a union select ordered by rand() in the first place:
INSERT INTO merged (a, b)
SELECT a, b FROM (
SELECT a, b, rand() AS r FROM t1
UNION ALL
SELECT a, b, rand() AS r FROM t2
) ORDER BY r
However, also consider this post I just came across: INSERT INTO SELECT strange order using UNION, perhaps someone can comment.