Learning SQL: UNION or JOIN? - mysql

Forgive me if this seems like common sense as I am still learning how to split my data between multiple tables.
Basically, I have two:
general with the fields userID,owner,server,name
count with the fields userID,posts,topics
I wish to fetch the data from them and cannot decide how I should do it: in a UNION:
SELECT `userID`, `owner`, `server`, `name`
FROM `english`.`general`
WHERE `userID` = 54 LIMIT 1
UNION
SELECT `posts`, `topics`
FROM `english`.`count`
WHERE `userID` = 54 LIMIT 1
Or a JOIN:
SELECT `general`.`userID`, `general`.`owner`, `general`.`server`,
`general`.`name`, `count`.`posts`, `count`.`topics`
FROM `english`.`general`
JOIN `english`.`count` ON
`general`.`userID`=`count`.`userID` AND `general`.`userID`=54
LIMIT 1
Which do you think would be the more efficient way and why? Or perhaps both are too messy to begin with?

It's not about efficiency, but about how they work.
UNION just unions 2 different independent queries. So you get 2 result sets one after another.
JOIN appends each row from one result set to each row from another result set. So in total result set you have "long" rows (in terms of amount of columns)

Just for completeness as I don't think it's mentioned elsewhere: often UNION ALL is what's intended when people use UNION.
UNION will remove duplicates (so relatively expensive because it requires a sort). This remove duplicates in the final result (so it doesn't matter if there's a duplicate in a single query or the same data from individual SELECTs). UNION is a set operation.
UNION ALL just sticks the results together: no sorting, no duplicate removal. This is going to be quicker (or at least no worse) than UNION.
If you know the individual queries won't return duplicate results use UNION ALL. (In fact often best to assume UNION ALL and think about UNION if you need that behaviour; using SELECT DISTINCT with UNION is redundant).

You want to use a JOIN. Joining is used to creating a single set which is a combination of related data. Your union example doesn't make sense (and probably won't run). UNION is for linking two result sets with identical columns to create a set that has the combined rows (it does not 'union' the columns.)

If you want to fetch users and near user posts and topics. you need to write QUERY using JOIN like this:
SELECT general.*,count.posts,count.topics FROM general LEFT JOIN count ON general.userID=count.userID

Related

SELECT multiple columns from multiple tables and don't fill blank spaces

I have what I believe to be a pretty unique use case. I would like to be able to runs a single SELECT statement on a database where I get one column from four tables. I need to run where clauses on each different table where I have one main clause that will be across each of the tables and I am not able to JOIN because the data in each column will be a different length and I don't want to have duplicate items.
I have an example of the Select statement below. Also I understand if this is not possible.
SELECT s.service_id, u.id AS "user_id", h.mac_address, l.id AS "location_id" FROM services s
LEFT JOIN db.av_product ap ON s.product_id = ap.id
WHERE s.customer_code LIKE 'test_customer'
AND u.customer_code LIKE 'test_customer'
AND h.customer_code LIKE 'test_customer'
AND l.customer_code LIKE 'test_customer'
AND s.parent_id IS NULL
AND s.active=0
AND ap.sku NOT REGEXP 'fakeregex'
AND l.active = "1"
AND h.hardware_id NOT IN ('44','45')
AND (u.support_user != 1 OR u.support_user IS NULL);
TIA!
You will need to use joins for your tables to make a single query OR you can try multiple queries merged with UNION keyword.
If you want to make a single query, have a look about SELECT DISTINCT or GROUP BY for handling duplicates.
wut up?
do you know what UNION is?
The UNION operator is used to combine the result-set of two or more SELECT statements.
but every SELECT statement within UNION must have the same number of columns; so there we got a problem.
you can handle it with WHERE operator so I won't get in to it.
anyway, UNION!
shall we?
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
anyway; your solution is UNION, maybe not like what I wrote.
you can try this link too.
https://www.w3schools.com/mysql/mysql_union.asp
have a clean code

How to avoid running an expensive sub-query twice in a union

I want to union two queries. Both queries use an inner join into a data set, that is very intensive to compute, but the dataset query is the same for both queries. For example:
SELECT veggie_id
FROM potatoes
INNER JOIN ( [...] ) massive_market
ON massive_market.potato_id=potatoes.potato_id
UNION
SELECT veggie_id
FROM carrots
INNER JOIN ( [...] ) massive_market
ON massive_market.carrot_id=carrots.carrot_id
Where [...] corresponds to a subquery that takes a second to compute, and returns rows of at least carrot_id and potato_id.
I want to avoid having the query for massive_market [...] twice in my overal query.
Whats the best way to do this?
If that subquery takes more than a second to run, I'd say it's down to an indexing issue as opposed to the query itself (of course, without seeing that query, that is somewhat conjecture, I'd recommend posting that query too). In my experience, 9/10 slow queries issues are down to improper indexing of the database.
Ensure veggie_id, potato_id and carrot_id are indexed
Also, if you're using any joins in the massive_market subquery, ensure the columns you're performing the joins on are indexed too.
Edit
If indexing has been done properly, the only other solution I can think of off the top of my head is:
CREATE TEMPORARY TABLE tmp_veggies (potato_id [datatype], carrot_id [datatype]);
INSERT IGNORE INTO tmp_veggies (potato_id, carrot_id) select potatoes.veggie_id, carrots.veggie_id from [...] massive_market
RIGHT OUTER JOIN potatoes on massive_market.potato_id = potatoes.potato_id
RIGHT OUTER JOIN carrots on massive_market.carrot_id = carrots.carrot_id;
SELECT carrot_id FROM tmp_veggies
UNION
SELECT potato_id FROM tmp_veggies;
This way, you've reversed the query so it's only running the massive subquery once and the UNION is happening on the temporary table (which'll be dropped automatically but not until the connection is closed, so you may want to drop the table manually). You can add any additional columns you need into the CREATE TEMPORARY TABLE and SELECT statement
The goal is to pull all repeated query-strings out of the list of query-strings requiring the repeated query-strings. So I kept potatoes and carrots within one unionizing subquery, and placed massive_market afterwards and outside this unification.
This seems obrvious, but my question originated from a much more complex query, and the work needed to pull this strategy off was a bit more involving in my case. For my simple example in my question above, this would resolve in something like:
SELECT veggie_id
FROM (
SELECT veggie_id, potato_id, NULL AS carrot_id FROM potatoes
UNION
SELECT veggie_id, NULL AS potato_id, carrot_id FROM carrots
) unionized
INNER JOIN ( [...] ) massive_market
ON massive_market.potato_id=unionized.potato_id
OR massive_market.carrot_id=unionized.carrot_id

Multiple table counts in one query

I've done some searching around but I haven't found a clear answer and explanation to my question.
I have 5 tables called table1, table2, table3, table4 and table5 and I want to do COUNT(*) on each of the tables to get the number of rows.
Should I try to combine these into one query or use 5 separate queries? I have always been taught that the least number of queries the better so I am guessing I should try to combine them into one query.
One way of doing it is to use UNION but does anyone know what the most efficient way of doing this is and why?
Thanks for any help.
Assuming you just want a count(*) from each one, then
SELECT
( SELECT count(*) from table1 ) AS table1,
( SELECT count(*) from table2 ) AS table2,
( SELECT count(*) from table3 ) AS table3,
etc...
)
would give you those counts as a single row. The DB server would still be running n+1 queries (n tables, 1 parent query) to get those counts, but it'd be the same story if you were using a UNION anyways. The union would produce multiple rows with 1 value in each, v.s. the 1 row with multiple values of the subselect method.
Provided you have read access to this (so rather not on shared hosting where you don’t have your actual own database instance) you could read that info from the INFORMATION_SCHEMA TABLES table, that does have a TABLE_ROWS column.
(Be aware of what it says for InnoDB tables there – so if you don’t you MyISAM and need the precise counts, the other answer has the better solution.)

Using LIMIT's during one-to-many relationship Queries

I have the two tables tabA and tabB, and there is a one-to-many relationship from tabA to tabB. I have the query:
SELECT * FROM `tabA` LEFT JOIN `tabB` ON `tabA`.`aID` = `tabB`.`aID`
and the rows that are returned is a large set with multiple duplicates from tabA for each tabB reference to tabA.
I am aware that I can use GROUP BY to limit the tabA rows to unique elements, unless I use custom field(s) using the GROUP_CONCAT function, combined with two REPLACE functions for escaping (which seriously impacts performance), I loose all bar one of the rows contained in tabB. An example query looks like:
SELECT `tabA`.*,
GROUP_CONCAT(REPLACE(REPLACE(`tabB`.`tabBCol1`, '/', '//'), ',', '/,')) AS `tabBCol`,
GROUP_CONCAT(REPLACE(REPLACE(`tabB`.`tabBCol2`, '/', '//'), ',', '/,')) AS `tabBCo2`
FROM `tabA`
LEFT JOIN `tabB` ON `tabA`.`aID` = `tabB`.`aID`
GROUP BY `tabA`.`aID`
That query will allow me to use the LIMIT syntax so I can (for example) only show 5 entries, starting after 5 (i.e. LIMIT 5,5). And when I apply that to the former query, then I won't get the next 5 queries, but a random set of data based on the numbers of associations.
So, apart from the second query, is there any way that I can fetch the rows, with there associations, but allow the use of the LIMIT syntax, and without the performance hit of excessive REPLACE functions?
ADDITIONAL
Although I can use multiple subqueries for each row, using the first query with GROUP BY syntax (which would allow me to apply any WHERE conditions for the associations), I am trying to find a way to avoid the N+1 Selects Problem (although in this example, my LIMIT syntax is LIMIT 5,5, I will be applying this to much larger LIMITs (upto 1000 rows at a time)).
Try two queries:
// get those 5 records
SELECT * FROM Cars WHERE some_conditon = blabla LIMIT 5;
// get all associated records from related table
SELECT * FROM Wheels WHERE car_id IN (1, 3, 5, 123, 16);
In the result there will not be any N problem as you will always have two queries. Even if you will have 1000 records in 1st query it will always be better to use this simple method, than joins/groups by/concats/etc.

Striping of results from MySQL UNION

I am trying to conceptualize how to set up UNION of 3 tables that will allow for ordering in a striping fashion.
Top 5 from the UNION of Tables A,B,C
with results ordered like so:
A
B
C
A
B
C
....
Is this sort of thing possibe with SQL and more specifically MySQL?
Personally, I would pull the three queries out separately, and then process through them in your favourite programming language. The queries should run faster like this, as they would not be so complex.
It should be possible using only SQL though, by adding a couple of columns to your output for each of the three queries, and then wrapping the whole lot in an outer select such as;
SELECT * FROM ( <PUT THE FULL UNION HERE> ) ORDER BY table_name, row_count
You'd need to alter each of the queries like this;
##rowCount=0;
SELECT 'A' AS table_name, (##rowCount+1) AS row_count, <remaining fields>
FROM table_A;
Now, I'm not totally sure of the syntax for the incrementing row counter, but I've seen it done elsewhere (probably on StackOverflow somewhere), so maybe someone else can help with that part? (Or you may find the answer by searching this site...)