SQL UNION ALL to eliminate duplicates

SQL UNION ALL to eliminate duplicates - mysql

I found this sample interview question and answer posted on toptal reproduced here. But I don't really understand the code. How can a UNION ALL turn into a UNIION (distinct) like that? Also, why is this code faster?
QUESTION
Write a SQL query using UNION ALL (not UNION) that uses the WHERE clause to eliminate duplicates. Why might you want to do this?
Hide answer
You can avoid duplicates using UNION ALL and still run much faster than UNION DISTINCT (which is actually same as UNION) by running a query like this:
ANSWER
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
The key is the AND a!=X part. This gives you the benefits of the UNION (a.k.a., UNION DISTINCT) command, while avoiding much of its performance hit.

But in the example, the first query has a condition on column a, whereas the second query has a condition on column b. This probably came from a query that's hard to optimize:
SELECT * FROM mytable WHERE a=X OR b=Y
This query is hard to optimize with simple B-tree indexing. Does the engine search an index on column a? Or on column b? Either way, searching the other term requires a table-scan.
Hence the trick of using UNION to separate into two queries for one term each. Each subquery can use the best index for each search term. Then combine the results using UNION.
But the two subsets may overlap, because some rows where b=Y may also have a=X in which case such rows occur in both subsets. Therefore you have to do duplicate elimination, or else see some rows twice in the final result.
SELECT * FROM mytable WHERE a=X
UNION DISTINCT
SELECT * FROM mytable WHERE b=Y
UNION DISTINCT is expensive because typical implementations sort the rows to find duplicates. Just like if you use SELECT DISTINCT ....
We also have a perception that it's even more "wasted" work if the two subset of rows you are unioning have a lot of rows occurring in both subsets. It's a lot of rows to eliminate.
But there's no need to eliminate duplicates if you can guarantee that the two sets of rows are already distinct. That is, if you guarantee there is no overlap. If you can rely on that, then it would always be a no-op to eliminate duplicates, and therefore the query can skip that step, and therefore skip the costly sorting.
If you change the queries so that they are guaranteed to select non-overlapping subsets of rows, that's a win.
SELECT * FROM mytable WHERE a=X
UNION ALL
SELECT * FROM mytable WHERE b=Y AND a!=X
These two sets are guaranteed to have no overlap. If the first set has rows where a=X and the second set has rows where a!=X then there can be no row that is in both sets.
The second query therefore only catches some of the rows where b=Y, but any row where a=X AND b=Y is already included in the first set.
So the query achieves an optimized search for two OR terms, without producing duplicates, and requiring no UNION DISTINCT operation.

The most simple way is like this, especially if you have many columns:
SELECT *
INTO table2
FROM table1
UNION
SELECT *
FROM table1
ORDER BY column1

I guest this is right (Oracle):
select distinct * from (
select * from test_a
union all
select * from test_b
);

The question will be correct if the table has unique identifier - primary key. Otherwise every select can return many the same rows.
To understand why it can faster let's look at how database executes UNION ALL and UNION.
The first is simple joining results from two independent queries. These queries can be processed in parallel and taken to client one by one.
The second is joining + distinction. To distinct records from 2 queries db needs to have all them in memory or if memory is not enough db needs to store them to temporary table and next select unique ones. This is where performance degradation can be. DB's are pretty smart and distinction algorithms are developed good but for large result sets it could be a problem anyway.
UNION ALL + additional WHERE condition can be faster if an index will be used while filtering.
So, here the performance magic.

I guess it will work
select col1 From (
select row_number() over (partition by col1 order by col1) as b, col1
from (
select col1 From u1
union all
select col1 From u2 ) a
) x
where x.b =1

This will also do the same trick:
select * from (
select * from table1
union all
select * from table2
) a group by
columns
having count(*) >= 1
or
select * from table1
union all
select * from table2 b
where not exists (select 1 from table1 a where a.col1 = b.col1)

Related

mysql outfile how more records then the count [duplicate]

What is the difference between UNION and UNION ALL?

UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not.
There is a performance hit when using UNION instead of UNION ALL, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)

Both UNION and UNION ALL concatenate the result of two different SQLs. They differ in the way they handle duplicates.
UNION performs a DISTINCT on the result set, eliminating any duplicate rows.
UNION ALL does not remove duplicates, and it therefore faster than UNION.
Note: While using this commands all selected columns need to be of the same data type.
Example: If we have two tables, 1) Employee and 2) Customer
Employee table data:
Customer table data:
UNION Example (It removes all duplicate records):
UNION ALL Example (It just concatenate records, not eliminate duplicates, so it is faster than UNION):

UNION removes duplicates, whereas UNION ALL does not.
In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted, and the settings of various RDBMS parameters ( For Oracle PGA_AGGREGATE_TARGET with WORKAREA_SIZE_POLICY=AUTO or SORT_AREA_SIZE and SOR_AREA_RETAINED_SIZE if WORKAREA_SIZE_POLICY=MANUAL ).
Basically, the sort is faster if it can be carried out in memory, but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" comment, but have insufficient reputation (points) to do so.

In ORACLE: UNION does not support BLOB (or CLOB) column types, UNION ALL does.

The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.
from http://zengin.wordpress.com/2007/07/31/union-vs-union-all/

UNION
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.
UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.

You can avoid duplicates and still run much faster than UNION DISTINCT (which is actually same as UNION) by running query like this:
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
Notice the AND a!=X part. This is much faster then UNION.

Just to add my two cents to the discussion here: one could understand the UNION operator as a pure, SET-oriented UNION - e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose T-SQL offers the operator UNION ALL.

UNION - results in distinct records while
UNION ALL - results in all the records including duplicates.
Both are blocking operators and hence I personally prefer using JOINS over Blocking Operators(UNION, INTERSECT, UNION ALL etc. ) anytime.
To illustrate why Union operation performs poorly in comparison to Union All checkout the following example.
CREATE TABLE #T1 (data VARCHAR(10))
INSERT INTO #T1
SELECT 'abc'
UNION ALL
SELECT 'bcd'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'def'
UNION ALL
SELECT 'efg'
CREATE TABLE #T2 (data VARCHAR(10))
INSERT INTO #T2
SELECT 'abc'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'efg'
Following are results of UNION ALL and UNION operations.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Using UNION results in Distinct Sort operations in the Execution Plan. Proof to prove this statement is shown below:

Not sure that it matters which database
UNION and UNION ALL should work on all SQL Servers.
You should avoid of unnecessary UNIONs they are huge performance leak. As a rule of thumb use UNION ALL if you are not sure which to use.

(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a single result set.
ALL
Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
UNION will take too long as a duplicate rows finding like DISTINCT is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2) DT
A side effect of applying DISTINCT over results is a sorting operation on results.
UNION ALL results will be shown as arbitrary order on results But UNION results will be shown as ORDER BY 1, 2, 3, ..., n (n = column number of Tables) applied on results. You can see this side effect when you don't have any duplicate row.

I add an example,
UNION, it is merging with distinct --> slower, because it need comparing (In Oracle SQL developer, choose query, press F10 to see cost analysis).
UNION ALL, it is merging without distinct --> faster.
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
and
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION ALL
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;

UNION merges the contents of two structurally-compatible tables into a single combined table.
Difference:
The difference between UNION and UNION ALL is that UNION will omit duplicate records whereas UNION ALL will include duplicate records.
Union Result set is sorted in ascending order whereas UNION ALL Result set is not sorted
UNION performs a DISTINCT on its Result set so it will eliminate any duplicate rows. Whereas UNION ALL won't remove duplicates and therefore it is faster than UNION.*
Note: The performance of UNION ALL will typically be better than UNION, since UNION requires the server to do the additional work of removing any duplicates. So, in cases where it is certain that there will not be any duplicates, or where having duplicates is not a problem, use of UNION ALL would be recommended for performance reasons.

Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
Student- ID(int), Name(varchar(50)), Email(varchar(50)), PositionID(int)
You can apply UNION or UNION ALL for those two table which have same number of columns. But they have different name or data type.
When you apply UNION operation on 2 tables, it neglects all duplicate entries(all columns value of row in a table is same of another table). Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
When you apply UNION ALL operation on 2 tables, it returns all entries with duplicate(if there is any difference between any column value of a row in 2 tables). Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
Output
Performance:
Obviously UNION ALL performance is better that UNION as they do additional task to remove the duplicate values. You can check that from Execution Estimated Time by press ctrl+L at MSSQL

UNION removes duplicate records in other hand UNION ALL does not. But one need to check the bulk of data that is going to be processed and the column and data type must be same.
since union internally uses "distinct" behavior to select the rows hence it is more costly in terms of time and performance.
like
select project_id from t_project
union
select project_id from t_project_contact
this gives me 2020 records
on other hand
select project_id from t_project
union all
select project_id from t_project_contact
gives me more than 17402 rows
on precedence perspective both has same precedence.

If there is no ORDER BY, a UNION ALL may bring rows back as it goes, whereas a UNION would make you wait until the very end of the query before giving you the whole result set at once. This can make a difference in a time-out situation - a UNION ALL keeps the connection alive, as it were.
So if you have a time-out issue, and there's no sorting, and duplicates aren't an issue, UNION ALL may be rather helpful.

One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.

Important! Difference between Oracle and Mysql: Let's say that t1 t2 don't have duplicate rows between them but they have duplicate rows individual. Example: t1 has sales from 2017 and t2 from 2018
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION ALL
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE UNION ALL fetches all rows from both tables. The same will occur in MySQL.
However:
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE, UNION fetches all rows from both tables because there are no duplicate values between t1 and t2. On the other hand in MySQL the resultset will have fewer rows because there will be duplicate rows within table t1 and also within table t2!

UNION ALL also works on more data types as well. For example when trying to union spatial data types. For example:
select a.SHAPE from tableA a
union
select b.SHAPE from tableB b
will throw
The data type geometry cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable.
However union all will not.

MySQL UNION function but with duplicates [duplicate]

What is the difference between UNION and UNION ALL?

UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not.
There is a performance hit when using UNION instead of UNION ALL, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)

Both UNION and UNION ALL concatenate the result of two different SQLs. They differ in the way they handle duplicates.
UNION performs a DISTINCT on the result set, eliminating any duplicate rows.
UNION ALL does not remove duplicates, and it therefore faster than UNION.
Note: While using this commands all selected columns need to be of the same data type.
Example: If we have two tables, 1) Employee and 2) Customer
Employee table data:
Customer table data:
UNION Example (It removes all duplicate records):
UNION ALL Example (It just concatenate records, not eliminate duplicates, so it is faster than UNION):

UNION removes duplicates, whereas UNION ALL does not.
In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted, and the settings of various RDBMS parameters ( For Oracle PGA_AGGREGATE_TARGET with WORKAREA_SIZE_POLICY=AUTO or SORT_AREA_SIZE and SOR_AREA_RETAINED_SIZE if WORKAREA_SIZE_POLICY=MANUAL ).
Basically, the sort is faster if it can be carried out in memory, but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" comment, but have insufficient reputation (points) to do so.

In ORACLE: UNION does not support BLOB (or CLOB) column types, UNION ALL does.

The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.
from http://zengin.wordpress.com/2007/07/31/union-vs-union-all/

UNION
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.
UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.

You can avoid duplicates and still run much faster than UNION DISTINCT (which is actually same as UNION) by running query like this:
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
Notice the AND a!=X part. This is much faster then UNION.

Just to add my two cents to the discussion here: one could understand the UNION operator as a pure, SET-oriented UNION - e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose T-SQL offers the operator UNION ALL.

UNION - results in distinct records while
UNION ALL - results in all the records including duplicates.
Both are blocking operators and hence I personally prefer using JOINS over Blocking Operators(UNION, INTERSECT, UNION ALL etc. ) anytime.
To illustrate why Union operation performs poorly in comparison to Union All checkout the following example.
CREATE TABLE #T1 (data VARCHAR(10))
INSERT INTO #T1
SELECT 'abc'
UNION ALL
SELECT 'bcd'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'def'
UNION ALL
SELECT 'efg'
CREATE TABLE #T2 (data VARCHAR(10))
INSERT INTO #T2
SELECT 'abc'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'efg'
Following are results of UNION ALL and UNION operations.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Using UNION results in Distinct Sort operations in the Execution Plan. Proof to prove this statement is shown below:

Not sure that it matters which database
UNION and UNION ALL should work on all SQL Servers.
You should avoid of unnecessary UNIONs they are huge performance leak. As a rule of thumb use UNION ALL if you are not sure which to use.

(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a single result set.
ALL
Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
UNION will take too long as a duplicate rows finding like DISTINCT is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2) DT
A side effect of applying DISTINCT over results is a sorting operation on results.
UNION ALL results will be shown as arbitrary order on results But UNION results will be shown as ORDER BY 1, 2, 3, ..., n (n = column number of Tables) applied on results. You can see this side effect when you don't have any duplicate row.

I add an example,
UNION, it is merging with distinct --> slower, because it need comparing (In Oracle SQL developer, choose query, press F10 to see cost analysis).
UNION ALL, it is merging without distinct --> faster.
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
and
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION ALL
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;

UNION merges the contents of two structurally-compatible tables into a single combined table.
Difference:
The difference between UNION and UNION ALL is that UNION will omit duplicate records whereas UNION ALL will include duplicate records.
Union Result set is sorted in ascending order whereas UNION ALL Result set is not sorted
UNION performs a DISTINCT on its Result set so it will eliminate any duplicate rows. Whereas UNION ALL won't remove duplicates and therefore it is faster than UNION.*
Note: The performance of UNION ALL will typically be better than UNION, since UNION requires the server to do the additional work of removing any duplicates. So, in cases where it is certain that there will not be any duplicates, or where having duplicates is not a problem, use of UNION ALL would be recommended for performance reasons.

Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
Student- ID(int), Name(varchar(50)), Email(varchar(50)), PositionID(int)
You can apply UNION or UNION ALL for those two table which have same number of columns. But they have different name or data type.
When you apply UNION operation on 2 tables, it neglects all duplicate entries(all columns value of row in a table is same of another table). Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
When you apply UNION ALL operation on 2 tables, it returns all entries with duplicate(if there is any difference between any column value of a row in 2 tables). Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
Output
Performance:
Obviously UNION ALL performance is better that UNION as they do additional task to remove the duplicate values. You can check that from Execution Estimated Time by press ctrl+L at MSSQL

UNION removes duplicate records in other hand UNION ALL does not. But one need to check the bulk of data that is going to be processed and the column and data type must be same.
since union internally uses "distinct" behavior to select the rows hence it is more costly in terms of time and performance.
like
select project_id from t_project
union
select project_id from t_project_contact
this gives me 2020 records
on other hand
select project_id from t_project
union all
select project_id from t_project_contact
gives me more than 17402 rows
on precedence perspective both has same precedence.

If there is no ORDER BY, a UNION ALL may bring rows back as it goes, whereas a UNION would make you wait until the very end of the query before giving you the whole result set at once. This can make a difference in a time-out situation - a UNION ALL keeps the connection alive, as it were.
So if you have a time-out issue, and there's no sorting, and duplicates aren't an issue, UNION ALL may be rather helpful.

One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.

Important! Difference between Oracle and Mysql: Let's say that t1 t2 don't have duplicate rows between them but they have duplicate rows individual. Example: t1 has sales from 2017 and t2 from 2018
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION ALL
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE UNION ALL fetches all rows from both tables. The same will occur in MySQL.
However:
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE, UNION fetches all rows from both tables because there are no duplicate values between t1 and t2. On the other hand in MySQL the resultset will have fewer rows because there will be duplicate rows within table t1 and also within table t2!

UNION ALL also works on more data types as well. For example when trying to union spatial data types. For example:
select a.SHAPE from tableA a
union
select b.SHAPE from tableB b
will throw
The data type geometry cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable.
However union all will not.

Which of the two queries do I use best for performance?

i have two query To do a job
query 1 :
SELECT * FROM table1 where id = 1
UNION ALL
SELECT * FROM table2 where id = 5
UNION ALL
SELECT * FROM table1 where id = 70
UNION ALL
SELECT * FROM table2 where id = 3
UNION ALL
SELECT * FROM table1 where id = 90
and query 2 :
SELECT * FROM table1 where id IN (1,70,90)
UNION ALL
SELECT * FROM table2 where id IN (5,3)
Which of these two queries is faster ?
If your answer is the second query .
I've used Query 1 in many different places. in the project Is the difference so large that I would replace everywhere with the second query ?

The second version is more concise, and should be faster, because it only requires actually executing two queries, as opposed to the first version, which does a separate query for each id value.
Assuming id be the primary key in both tables, then MySQL might also be able to use the clustered index for faster lookup of matching records.

What are the typical counts? Total of 5 rows? 2 tables? I would predict the performance difference to be a factor of rows/tables in favoring the 2nd (shorter) formulation. In experimenting, I got about 2x.
So, if you have 100 rows from 2 tables, the second formulation will be significantly faster; enough faster to be worth the effort.
Why?
For such simple queries, parsing and optimizing dominates the time.
For newer versions of MySQL, both queries will touch the same number of rows.
For MySQL 5.7.3 and later, no temp table will be needed for either UNION ALL.
Does it matter that the output rows are likely to be in a different order?

Any way to make UNION ALL run faster?

I have a lot of exactly same tables. TableA,TableB,TableC,TableD etc. which I want to create views from.
Doing select * from TableA takes 20ms, doing select * from tableB takes 20ms, but doing
(select * from TableA) union all (select * from TableB) takes over 20 minutes.
Those tables have exactly same columns. Is there any settings in my.cnf that I need to change, or a way to create a view that would run faster? All tables have 1.5m to about 10m rows.
Results of explain
PRIMARY TableA ALL 28808685
UNION TableB ALL 15316215
UNION RESULT <union1,2> ALL Using temporary
Table structure:
10 varchar(20)'s, 5 unsigned INTs.

My guess is that select * from TableA does not take 20 ms. It takes 20 ms to start returning results.
Although I am going to answer your question, you should revisit your data structure. Having multiple tables with the same layout is usually a really bad idea. Instead, you should have a single table with all the rows.
But, you don't seem to have that.
Try running the union all without parentheses:
select * from TableA union all
select * from TableB;
MySQL has a habit of materializing subqueries. I'm not sure if it does this with union all subqueries, but given your description of the problem, that sees likely.

Multiple table counts in one query

I've done some searching around but I haven't found a clear answer and explanation to my question.
I have 5 tables called table1, table2, table3, table4 and table5 and I want to do COUNT(*) on each of the tables to get the number of rows.
Should I try to combine these into one query or use 5 separate queries? I have always been taught that the least number of queries the better so I am guessing I should try to combine them into one query.
One way of doing it is to use UNION but does anyone know what the most efficient way of doing this is and why?
Thanks for any help.

Assuming you just want a count(*) from each one, then
SELECT
( SELECT count(*) from table1 ) AS table1,
( SELECT count(*) from table2 ) AS table2,
( SELECT count(*) from table3 ) AS table3,
etc...
)
would give you those counts as a single row. The DB server would still be running n+1 queries (n tables, 1 parent query) to get those counts, but it'd be the same story if you were using a UNION anyways. The union would produce multiple rows with 1 value in each, v.s. the 1 row with multiple values of the subselect method.

Provided you have read access to this (so rather not on shared hosting where you don’t have your actual own database instance) you could read that info from the INFORMATION_SCHEMA TABLES table, that does have a TABLE_ROWS column.
(Be aware of what it says for InnoDB tables there – so if you don’t you MyISAM and need the precise counts, the other answer has the better solution.)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL UNION ALL to eliminate duplicates - mysql

The most simple way is like this, especially if you have many columns: SELECT * INTO table2 FROM table1 UNION SELECT * FROM table1 ORDER BY column1

I guest this is right (Oracle): select distinct * from ( select * from test_a union all select * from test_b );

I guess it will work select col1 From ( select row_number() over (partition by col1 order by col1) as b, col1 from ( select col1 From u1 union all select col1 From u2 ) a ) x where x.b =1

This will also do the same trick: select * from ( select * from table1 union all select * from table2 ) a group by columns having count() >= 1 or select from table1 union all select * from table2 b where not exists (select 1 from table1 a where a.col1 = b.col1)

Related

mysql outfile how more records then the count [duplicate]

MySQL UNION function but with duplicates [duplicate]

Which of the two queries do I use best for performance?

Any way to make UNION ALL run faster?

Multiple table counts in one query

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL UNION ALL to eliminate duplicates - mysql

The most simple way is like this, especially if you have many columns: SELECT * INTO table2 FROM table1 UNION SELECT * FROM table1 ORDER BY column1

I guest this is right (Oracle): select distinct * from ( select * from test_a union all select * from test_b );

I guess it will work select col1 From ( select row_number() over (partition by col1 order by col1) as b, col1 from ( select col1 From u1 union all select col1 From u2 ) a ) x where x.b =1

This will also do the same trick: select * from ( select * from table1 union all select * from table2 ) a group by columns having count(*) >= 1 or select * from table1 union all select * from table2 b where not exists (select 1 from table1 a where a.col1 = b.col1)

Related

mysql outfile how more records then the count [duplicate]

MySQL UNION function but with duplicates [duplicate]

Which of the two queries do I use best for performance?

Any way to make UNION ALL run faster?

Multiple table counts in one query

Categories

Resources

This will also do the same trick: select * from ( select * from table1 union all select * from table2 ) a group by columns having count() >= 1 or select from table1 union all select * from table2 b where not exists (select 1 from table1 a where a.col1 = b.col1)