Any way to make UNION ALL run faster? - mysql

I have a lot of exactly same tables. TableA,TableB,TableC,TableD etc. which I want to create views from.
Doing select * from TableA takes 20ms, doing select * from tableB takes 20ms, but doing
(select * from TableA) union all (select * from TableB) takes over 20 minutes.
Those tables have exactly same columns. Is there any settings in my.cnf that I need to change, or a way to create a view that would run faster? All tables have 1.5m to about 10m rows.
Results of explain
PRIMARY TableA ALL 28808685
UNION TableB ALL 15316215
UNION RESULT <union1,2> ALL Using temporary
Table structure:
10 varchar(20)'s, 5 unsigned INTs.

My guess is that select * from TableA does not take 20 ms. It takes 20 ms to start returning results.
Although I am going to answer your question, you should revisit your data structure. Having multiple tables with the same layout is usually a really bad idea. Instead, you should have a single table with all the rows.
But, you don't seem to have that.
Try running the union all without parentheses:
select * from TableA union all
select * from TableB;
MySQL has a habit of materializing subqueries. I'm not sure if it does this with union all subqueries, but given your description of the problem, that sees likely.

Related

Which of the two queries do I use best for performance?

i have two query To do a job
query 1 :
SELECT * FROM table1 where id = 1
UNION ALL
SELECT * FROM table2 where id = 5
UNION ALL
SELECT * FROM table1 where id = 70
UNION ALL
SELECT * FROM table2 where id = 3
UNION ALL
SELECT * FROM table1 where id = 90
and query 2 :
SELECT * FROM table1 where id IN (1,70,90)
UNION ALL
SELECT * FROM table2 where id IN (5,3)
Which of these two queries is faster ?
If your answer is the second query .
I've used Query 1 in many different places. in the project Is the difference so large that I would replace everywhere with the second query ?
The second version is more concise, and should be faster, because it only requires actually executing two queries, as opposed to the first version, which does a separate query for each id value.
Assuming id be the primary key in both tables, then MySQL might also be able to use the clustered index for faster lookup of matching records.
What are the typical counts? Total of 5 rows? 2 tables? I would predict the performance difference to be a factor of rows/tables in favoring the 2nd (shorter) formulation. In experimenting, I got about 2x.
So, if you have 100 rows from 2 tables, the second formulation will be significantly faster; enough faster to be worth the effort.
Why?
For such simple queries, parsing and optimizing dominates the time.
For newer versions of MySQL, both queries will touch the same number of rows.
For MySQL 5.7.3 and later, no temp table will be needed for either UNION ALL.
Does it matter that the output rows are likely to be in a different order?

SQL UNION ALL to eliminate duplicates

I found this sample interview question and answer posted on toptal reproduced here. But I don't really understand the code. How can a UNION ALL turn into a UNIION (distinct) like that? Also, why is this code faster?
QUESTION
Write a SQL query using UNION ALL (not UNION) that uses the WHERE clause to eliminate duplicates. Why might you want to do this?
Hide answer
You can avoid duplicates using UNION ALL and still run much faster than UNION DISTINCT (which is actually same as UNION) by running a query like this:
ANSWER
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
The key is the AND a!=X part. This gives you the benefits of the UNION (a.k.a., UNION DISTINCT) command, while avoiding much of its performance hit.
But in the example, the first query has a condition on column a, whereas the second query has a condition on column b. This probably came from a query that's hard to optimize:
SELECT * FROM mytable WHERE a=X OR b=Y
This query is hard to optimize with simple B-tree indexing. Does the engine search an index on column a? Or on column b? Either way, searching the other term requires a table-scan.
Hence the trick of using UNION to separate into two queries for one term each. Each subquery can use the best index for each search term. Then combine the results using UNION.
But the two subsets may overlap, because some rows where b=Y may also have a=X in which case such rows occur in both subsets. Therefore you have to do duplicate elimination, or else see some rows twice in the final result.
SELECT * FROM mytable WHERE a=X
UNION DISTINCT
SELECT * FROM mytable WHERE b=Y
UNION DISTINCT is expensive because typical implementations sort the rows to find duplicates. Just like if you use SELECT DISTINCT ....
We also have a perception that it's even more "wasted" work if the two subset of rows you are unioning have a lot of rows occurring in both subsets. It's a lot of rows to eliminate.
But there's no need to eliminate duplicates if you can guarantee that the two sets of rows are already distinct. That is, if you guarantee there is no overlap. If you can rely on that, then it would always be a no-op to eliminate duplicates, and therefore the query can skip that step, and therefore skip the costly sorting.
If you change the queries so that they are guaranteed to select non-overlapping subsets of rows, that's a win.
SELECT * FROM mytable WHERE a=X
UNION ALL
SELECT * FROM mytable WHERE b=Y AND a!=X
These two sets are guaranteed to have no overlap. If the first set has rows where a=X and the second set has rows where a!=X then there can be no row that is in both sets.
The second query therefore only catches some of the rows where b=Y, but any row where a=X AND b=Y is already included in the first set.
So the query achieves an optimized search for two OR terms, without producing duplicates, and requiring no UNION DISTINCT operation.
The most simple way is like this, especially if you have many columns:
SELECT *
INTO table2
FROM table1
UNION
SELECT *
FROM table1
ORDER BY column1
I guest this is right (Oracle):
select distinct * from (
select * from test_a
union all
select * from test_b
);
The question will be correct if the table has unique identifier - primary key. Otherwise every select can return many the same rows.
To understand why it can faster let's look at how database executes UNION ALL and UNION.
The first is simple joining results from two independent queries. These queries can be processed in parallel and taken to client one by one.
The second is joining + distinction. To distinct records from 2 queries db needs to have all them in memory or if memory is not enough db needs to store them to temporary table and next select unique ones. This is where performance degradation can be. DB's are pretty smart and distinction algorithms are developed good but for large result sets it could be a problem anyway.
UNION ALL + additional WHERE condition can be faster if an index will be used while filtering.
So, here the performance magic.
I guess it will work
select col1 From (
select row_number() over (partition by col1 order by col1) as b, col1
from (
select col1 From u1
union all
select col1 From u2 ) a
) x
where x.b =1
This will also do the same trick:
select * from (
select * from table1
union all
select * from table2
) a group by
columns
having count(*) >= 1
or
select * from table1
union all
select * from table2 b
where not exists (select 1 from table1 a where a.col1 = b.col1)

How to return only 10 rows via SQL query

I have 1 query that returns over 180k rows. I need to make a slight change, so that it returns only about 10 less.
How do I show only the 10 rows as a result?
I've tried EXCEPT but it seems to return a lot more than just the 10.
You can use LIMIT. This will show first n rows. Example:
SELECT * FROM Orders LIMIT 10
If you are trying to make pagination add OFFSET. It will return 10 rows starting from row 20. Example:
SELECT * FROM Orders LIMIT 10 OFFSET 20
MySQL doesn't support EXCEPT (to my knowledge).
Probably the most efficient route would be to incorporate the two WHERE clauses into one. I say efficient in the sense of "Do it that way if you're going to run this query in a regular report or production application."
For example:
-- Query 1
SELECT * FROM table WHERE `someDate`>'2016-01-01'
-- Query 2
SELECT * FROM table WHERE `someDate`>'2016-01-10'
-- Becomes
SELECT * FROM table WHERE `someDate` BETWEEN '2016-01-01' AND '2016-01-10'
It's possible you're implying that the queries are quite complicated, and you're after a quick (read: not necessarily efficient) way of getting the difference for a one-off investigation.
That being the case, you could abuse UNION and a sub-query:
(Untested, treat as pseudo-SQL...)
SELECT
*
FROM (
SELECT * FROM table WHERE `someDate`>'2016-01-01'
UNION ALL
SELECT * FROM table WHERE `someDate`>'2016-01-10'
) AS sub
GROUP BY
`primaryKey`
HAVING
COUNT(1) = 1;
It's ugly though. And not efficient.
Assuming that the only difference is only that one side (I'll call it the "right hand side") is missing records that the left includes, you could LEFT JOIN the two queries (as subs) and filter to right-side-is-null. But that'd be dependent on all those caveats being true.
Temporary tables can be your friend - especially given they're so easily created (and can be indexed):
CREATE TEMPORARY TABLE tmp_xyz AS SELECT ... FROM ... WHERE ...;

Writing faster select queries on large databases

I have two tables, one with about 1,000 rows and one with 700,000 rows; table1 and table2 respectively. I wrote a simple select query:
SELECT DISTINCT name1
FROM table1, table2
WHERE table1.name1 = table2.name2;
The query got me exactly what I want but it took 91 seconds! I tried this sub query out:
SELECT DISTINCT name1
FROM table1
WHERE table1.name1 IN(SELECT DISTINCT name2 FROM table2);
That query took a consistent 37 seconds. So there's some performance boost in the way you write select queries. I wrote a third query:
CREATE TEMPORARY TABLE IF NOT EXISTS t1qry
(SELECT DISTINCT table1.name1 FROM table1);
CREATE TEMPORARY TABLE IF NOT EXISTS t2qry
(SELECT DISTINCT table2.name2 FROM table2);
SELECT name2 FROM t2qry JOIN t1qry ON name1 = name2;
DROP TABLE t1qry, t2qry;
This last query took 0.4 seconds to run and produced identical results to the other two.
I knew that each 'select distinct' query took less than a second to run so I was trying to craft one that would find common distinct values between the table. My question is why does what I wrote work? How do I write a faster select query like this without creating temporary tables?
I've been using MySQL and MariaDB but I'll take any SQL related help here. I'm new to SQL and have been trying to learn as much as I can, so I'll take any pointers or info about this.
If your latter version (including creating the temporary tables) goes so fast, then you probably have indexes on nameX in both tables.
I would suggests using exists:
SELECT DISTINCT name1
FROM table1
WHERE EXISTS (SELECT 1 FROM table2 WHERE table1.name1 = table2.name2);
For these queries, you do want indexes on table1(name1) and table2(name2).
And, if table1 has no duplicates, then leave out the DISTINCT.

MySql view is slow comparatively to their base query

I have created a view in MySQL as
create view vtax
as
SELECT * FROM table1
union
SELECT * FROM table2;
Where in table 1 have 800000 records, and table2 have 500000 records, when I run the independent queries the result are returned with 0.078 secs, but when I am running them through the view it goes in toss taking time more than 10-15 secs.
select * from vtax where col1=value; -- takes more than 10-15 secs
select * from table1 where col1=value; -- takes 0.078 secs
select * from table2 where col1=value; -- takes 0.078 secs
I have created indexes on the tables separately.
Any help/idea what should be done.
UNION
performs a distinct over your results (often a sort). Can you use
UNION ALL
? (ie. are the rows distinct?)
You should compare apples with apples. Unions are often much slower than simple queries. Compare the an union with the view. You will notice that the standard union query is slow as well. Probably the optimizer has problems with the decision for the optimal path. Check some other questions like: Why are UNION queries so slow in MySQL?
As stated in the comments a view isn't indexed in MySQL.
If you use the union in the query:
SELECT * FROM table1 WHERE col1 = 'value'
UNION
SELECT * FROM table2 WHERE col1 = 'value'
Then indexes (if there are any) can be used.