I am in the process of learning MySQL right now and while I get how to do UNIONS and JOINS. However I'm not seeing the advantages of a UNION over any type of JOIN. They both combine results from tables but seems like you have to jump through more hoops to combine tables using UNION if they're not identical with their columns. Is there an advantage of using a UNION sometimes or is it just another command we can use?
UNION adds rows from multiple tables/views.
Whereas join make the filters between rows from different related tables in a single sql statement.
Union: used to combine the result set of two different SELECT statement with same datatype of result set.
Join: used to retrieve matched records between 2 or more tables.
Please visit this link, it will help you to clear your doubts.
Related
I have an issue of performance with a query with multiple UNION ALL statements. I need to add (row by row) data from different tables into the same columns. The query need to be used to create a view in MySQL, so, here an example:
CREATE OR REPLACE
ALGORITHM = UNDEFINED
DEFINER = usr
SQL SECURITY DEFINER
VIEW my_view AS
SELECT DISTINCT
column 1,
column 2,
column 3
FROM
table 1
WHERE
condition 1
UNION ALL
SELECT DISTINCT
column 1,
column 2,
column 3
FROM
table 2
WHERE
condition 2
UNION ALL
SELECT DISTINCT
column 1,
column 2,
column 3
FROM
table 3
WHERE
condition 3
It seems pointless to do all the multiple UNION ALLs just to add (row by row) data from the same features (not just 3 columns as in the example, I have many more) coming from different tables because this is something that requires lots of resources from the DB, leading to "lost connection error during the query" due to the time it takes to run.
Is there any way to optimize this kind of query?
Thanks in advance.
UNION ALL is the most performant way of concatenating result sets. (UNION is slower because it removes duplicates.)
Surely your timeout occurs when you use the view, not when you create it.
Your performance issue stems from one or more of the SELECT queries in your UNION ALL cascade being very slow. You, or your "data engineer" colleagues, may need to create appropriate indexes on your table 1, table 2, table 3 tables.
To figure this out, do these things.
Read Optimizing Queries With EXPLAIN.
Run SHOW CREATE TABLE whateverTableName;. Look at the output. It will show you the indexes.
Run the SELECT queries using that same table prefixed with EXPLAIN. It will show you the indexes it used to satisfy the query.
Ask another question here showing us the output from those two steps.
Or, it's possible your resultset from your big query is vast. There's no magic that can process millions of rows faster than O(n).
I have two tables which both have column called identity_type, which takes on one of 10 values.
Identity_type has the same values in both tables.
I want to be able to show the count of each identity_type for each table side by side, without trying to join the data. Is this possible?
I.e., what I'm trying to show is the output of:
SELECT * FROM table GROUP BY identity_type
for table_1 and table_2 side by side.
You can't join the tables side by side without using some kind of join (and even if you could, why would you want to avoid using join syntax?).
Also, this is unlikely to be valid:
SELECT * FROM table GROUP BY identity_type
All fields need to either be included in the "group by" or part of an aggregate function.
You could show them one below the other by using a UNION ALL. You would still be querying both tables. However, you would not be using a JOIN.
Short answer: No
Long one: First, you need to add group attribute in SELECT statement when you use aggregate functions to GROUP BY. Meaning, SELECT identity_type, COUNT(*) FROM table GROUP BY identity_type. Even when you joined the tables, since there will be two different grouping, database engine cannot show two grouping at the same time. If you really need to see that kind of output, you probably can do that with some advanced JOIN and PIVOT syntax but I do not think you would want to do that.
I have a dozen of tables with the same structure. All of their names match question_20%. Each table has an indexed column named loaded which can have values of 0 and 1.
I want to count all of the records where loaded = 1. If I had only one table, I would run select count(*) from question_2015 where loaded = 1.
Is there a query I can run that finds the tables in INFORMATION_SCHEMA.TABLES, sums over all of these counts, and produces a single output?
You can do what you want with dynamic SQL.
However, you have a problem with your data structure. Having multiple parallel tables is usually a very bad idea. SQL supports very large tables, so having all the information in one table is a great convenience, from the perspective of querying (as you are now learning) and maintainability.
SQL offers indexes and partitioning schemes for addressing performance issues on large tables.
Sometimes, separate tables are necessary, to meet particular system requirements. If so, then a view should be available to combine all the tables:
create view v_tables as
select t1.*, 'table1' as which from table1 union all
select t2.*, 'table2' as which from table2 union all
. . .
If you had such a view, then your query would simply be:
select which, count(*)
from v_tables
where loaded = 1
group by which;
Forgive me if this seems like common sense as I am still learning how to split my data between multiple tables.
Basically, I have two:
general with the fields userID,owner,server,name
count with the fields userID,posts,topics
I wish to fetch the data from them and cannot decide how I should do it: in a UNION:
SELECT `userID`, `owner`, `server`, `name`
FROM `english`.`general`
WHERE `userID` = 54 LIMIT 1
UNION
SELECT `posts`, `topics`
FROM `english`.`count`
WHERE `userID` = 54 LIMIT 1
Or a JOIN:
SELECT `general`.`userID`, `general`.`owner`, `general`.`server`,
`general`.`name`, `count`.`posts`, `count`.`topics`
FROM `english`.`general`
JOIN `english`.`count` ON
`general`.`userID`=`count`.`userID` AND `general`.`userID`=54
LIMIT 1
Which do you think would be the more efficient way and why? Or perhaps both are too messy to begin with?
It's not about efficiency, but about how they work.
UNION just unions 2 different independent queries. So you get 2 result sets one after another.
JOIN appends each row from one result set to each row from another result set. So in total result set you have "long" rows (in terms of amount of columns)
Just for completeness as I don't think it's mentioned elsewhere: often UNION ALL is what's intended when people use UNION.
UNION will remove duplicates (so relatively expensive because it requires a sort). This remove duplicates in the final result (so it doesn't matter if there's a duplicate in a single query or the same data from individual SELECTs). UNION is a set operation.
UNION ALL just sticks the results together: no sorting, no duplicate removal. This is going to be quicker (or at least no worse) than UNION.
If you know the individual queries won't return duplicate results use UNION ALL. (In fact often best to assume UNION ALL and think about UNION if you need that behaviour; using SELECT DISTINCT with UNION is redundant).
You want to use a JOIN. Joining is used to creating a single set which is a combination of related data. Your union example doesn't make sense (and probably won't run). UNION is for linking two result sets with identical columns to create a set that has the combined rows (it does not 'union' the columns.)
If you want to fetch users and near user posts and topics. you need to write QUERY using JOIN like this:
SELECT general.*,count.posts,count.topics FROM general LEFT JOIN count ON general.userID=count.userID
I have a very big table (nearly 2,000,000 records) that got split to 2 smaller tables. one table contains only records from last week and the other contains all the rest (which is a lot...)
now i got some Stored Procedures / Functions that used to query the big table before it got split.
i still need them to query the union of both tables, however it seems that creating a View which uses the union statement between the two tables lasts forever...
that's my view:
CREATE VIEW `united_tables_view` AS select * from table1 union select * from table2;
and then i'd like to switch everywhere the Stored procedure select from 'oldBigTable' to select from 'united_tables_view'...
i've tried adding indexes to make the time shorter but nothing helps...
any Ideas?
PS
the view and union are my idea but any other creative idea would be perfect!
bring it on!
thanks!
If there is a reason not to, you should merge the tables rather than constantly query both of them.
Here is question on StackOverflow about doing that:
How can I merge two MySQL tables?
If you need to keep them seperate, you can use syntax along the lines of:
SELECT table1.column1, table2.column2 FROM table1, table2 WHERE table1.column1 = table2.column1;
Here is an article about when to use SELECT, JOIN and UNION
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1050307.html