If I have three tables each having 1000 rows with number from 1 to 1000, then how many comparisons does MySQL during the following JOINs queries:
SELECT *
FROM table_1 t1
JOIN table_2 t2 ON t1.id_1 = t2.id_2
JOIN table_3 t3 ON t2.id_2 = t3.id_3;
and
SELECT *
FROM table_1 t1
JOIN table_2 t2
JOIN table_3 t3
WHERE t1.id_1 = t2.id_2 AND t2.id_2 = t3.id_3;
Note that there are no indexes on any of the tables. Is there a difference between the two queries? I'm thinking that in second query, it is written with the intention such that MySQL would create a cartesian product of all three tables and then filter out the rows matching the condition(having to scan 1000 X 1000 X 1000 rows) but internally it is translated to first query in which it would scan 1000 X 1000 rows in the first JOIN followed by another 1000 X 1000 rows in the second JOIN. The result should be same(1000 rows with three columns have same number 1 to 1000).
Which one is it? 1000 X 1000 X 1000 rows OR 1000 X 1000 + 1000 X 1000 rows
Question arose after reading the following from "Query Optimization" chapter from MySQL by Paul DuBois(4th edition, page 306):
Those two queries are treated the same.
For clarity, ON should state how the tables are related; WHERE should filter.
But, for optimization, those are equivalent for JOIN.
FOr LEFT JOIN, it does matter where you put the conditions.
As an extra wrinkle to your question,... If the tables had different numbers of rows, the Optimizer would probably start with the smallest table.
See this to find out how to count the rows: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts
Please provide SHOW CREATE TABLE so we can understand id_1, etc.
It would be foolish to run such code without indexes.
What is the expected output? Something like this?
+---+---+---+
| n | n | n |
+---+---+---+
| 0 | 0 | 0 |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
...
Related
I am trying to fetch distinct count of all columns in a single query. Consider the below table.
COL1 | COL2 | COL3
A | 5 | C
B | 5 | C
C | 5 | C
C | 5 | C
D | 7 | C
Expected result
DC_COL1 | DC_COL2 | DC_COL3 #DC - Distinct count
4 | 2 | 1
Though the above result can not be achieved (AFAIK) in a single query (single full table scan) using valid group by functions, what are the optimisations that could be done here?
Firing individual queries for each column might result in full table scan for each column. Though the entire table might have come to the buffer pool during the distinct count query for the first column but it will still be a performance issue on large tables.
It can be done in a single table scan:
SELECT
COUNT(DISTINCT COL1) DC_COL1,
COUNT(DISTINCT COL2) DC_COL2,
COUNT(DISTINCT COL3) DC_COL3
FROM tablename
During a table join, when does MySQL use this function?
The single result column that replaces two common columns is defined
using the coalesce operation. That is, for two t1.a and t2.a the
resulting single join column a is defined as a = COALESCE(t1.a, t2.a),
where:
COALESCE(x, y) = (CASE WHEN x IS NOT NULL THEN x ELSE y END)
https://dev.mysql.com/doc/refman/8.0/en/join.html
I know what the function does, but I want to know when it is used during the join operation. This just makes no sense to me! Can someone show me an example?
That is in reference to redundant column elimination during natural join and join with using. Describing how the columns are excluded from display.
The order of operation is described above the section you referenced.
First, coalesced common columns of the two joined tables, in the order in which they occur in the first table
Second, columns unique to the first table, in order in which they occur in that table
Third, columns unique to the second table, in order in which they occur in that table
Example
t1
| a | b | c |
| 1 | 1 | 1 |
t2
| a | b | d |
| 1 | 1 | 1 |
The join with using
SELECT * FROM t1 JOIN t2 USING (b);
Would result in, t1.b being coalesced (due to USING), followed by the columns unique to the first table, followed by those in the second table.
| b | a | c | a | d |
| 1 | 1 | 1 | 1 | 1 |
Whereas a natural join
SELECT * FROM t1 NATURAL JOIN t2;
Would result in, the t1 columns (or rather common columns from both tables) being coalesced, followed by the unique columns of the first table, followed by those in the second table.
| a | b | c | d |
| 1 | 1 | 1 | 1 |
I'm trying to export multiple MySQL tables into a single CSV file, these tables have different number of columns and have nothing in common. An example is below:
table1:
ID| Name
1 | Ted
2 | Marry
null| null
table2:
Married| Age | Phone
Y | 35 | No
N | 25 | Yes
Y | 45 | No
The result that I want to get is:
ID| Name | Married | Age | Phone
1 | Ted | Y | 35 | No
2 | Marry | N | 25 | Yes
null| null | Y | 45 | No
Is it possible to do using MySQL commands? I have tried all types of join but it doesn't give me the result that I need.
Try this query:
SELECT * FROM
(SELECT #n1:=#n1+1 as 'index', Table1.* FROM Table1, (select #n1:=0)t)t1
natural join (SELECT #n2:=#n2+1 as 'index', Table2.* FROM Table2, (select #n2:=0)t1)t2 ;
You can see in this fiddle a working example.
In the example we generate an index column for Table1 and Table2 and natural join the 2 tables.
This way the join is done using the row position as returned by the SELECT of tables without any ORDER operator and usually this is not a good idea!
I'm not quite sure you understand what you are asking but sure you can join two tables without really caring on which rows will be matched:
SET #i1=0;
SET #i2=0;
SELECT * INTO OUTFILE 'xyz' FIELDS TERMINATED BY ','
FROM (SELECT #i1:=#i1+1 as i1, table1.* FROM table1) AS a
JOIN (SELECT #i2:=#i2+1 as i2, table2.* FROM table2) b ON (a.i1=b.i2);
I have a database in which I need to find some missing entries and fill them in.
I have a table called "menu", each restaurant has multiple dishes and each dish has 4 different language entries (actually 8 in the main database but for simplicity lets go with 4), I need to find out which dishes for a particular restaurant are missing any language entries.
select * from menu where restaurantid = 1
i get stuck there, something along the lines of where language 1 or 2 or 3 or 4 doesn't exist which is the complicated bit because I need to see the languages that exist in order to see the language that's missing because I can't display something that isn't there. I hope that makes sense?
In the example table below restaurant 2 dishid 2 is missing language 3, that's what i need to find.
+--------------+--------+----------+-----------+
| RestaurantID | DishID | DishName | Language |
+--------------+--------+----------+-----------+
| 1 | 1 | Soup | 1 |
| 1 | 1 | Soúp | 2 |
| 1 | 1 | Soupe | 3 |
| 1 | 1 | Soupa | 4 |
| 1 | 2 | Bread | 1 |
| 1 | 2 | Bréad | 2 |
| 1 | 2 | Breade | 3 |
| 1 | 1 | Breada | 4 |
| 2 | 1 | Dish1 | 1 |
| 2 | 1 | Dísh1 | 2 |
| 2 | 1 | Disha1 | 3 |
| 2 | 1 | Dishe1 | 4 |
| 2 | 2 | Dish2 | 1 |
| 2 | 2 | Dísh2 | 2 |
| 2 | 2 | Dishe2 | 4 |
+--------------+--------+----------+-----------+
An anti-join pattern is usually the most efficient, in terms of performance.
Your particular case is a little more tricky, in that you need to "generate" rows that are missing. If every (ResturantID,DishID) should have 4 rows, with Language values of 1,2,3 and 4, we can generate that set of all rows with a CROSS JOIN operation.
The next step is to apply an anti-join... a LEFT OUTER JOIN to the rows that exist in the menu table, so we get all the rows from the CROSS JOIN set, along with matching rows.
The "trick" is to use a predicate in the WHERE clause that filters out rows where we found a match, so we are left rows that didn't have a match.
(It seems a bit strange at first, but once you get your brain wrapped around the anti-join pattern, it becomes familiar.)
So a query of this form should return the specified result set.
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
LEFT
JOIN menu m
ON m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
Let's unpack that bit.
First we get a set containing the numbers 1 thru 4.
Next we get a set containing the (RestaurantID, DishID) distinct tuples. (For each distinct Restaurant, a distinct list of DishID, as long as there is at least one row for any Language for that combination.)
We do a CROSS JOIN, matching every row from set one (lang) with every row from set (d), to generate a "complete" set of every (RestaurantID, DishID, Language) we want to have.
The next part is the anti-join... the left outer join to menu to find which of the rows from the "complete" set has a matching row in menu, and filtering out all the rows that had a match.
That may be a little confusing. If we think of that CROSS JOIN operation producing a temporary table that looks like the menu table, but containing all possible rows... we can think of it in terms of pseudocode:
create temporary table all_menu_rows (RestaurantID, MenuID, Language) ;
insert into all_menu_rows ... all possible rows, combinations ;
Then the anti-join pattern is a little easier to see:
SELECT r.RestaurantID
, r.DishID
, r.Language
FROM all_menu_rows r
LEFT
JOIN menu m
ON m.RestaurantID = r.RestaurantID
AND m.DishID = r.DishID
AND m.Language = r.Language
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
(But we don't have to incur the extra overhead of creating and populating the temporary table, we can do that right in the query.)
Of course, this isn't the only approach. We could use a NOT EXISTS predicate instead of an anti-join, though this is not usually as efficient. The first part of the query is the same, to generate the "complete" set of rows we expect to have; what differs is how we identify whether or not there is a matching row in the menu table:
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
WHERE NOT EXISTS ( SELECT 1
FROM menu m
WHERE m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
)
ORDER BY 1,2,3
For each row in the "complete" set (generated by the CROSS JOIN operation), we're going to run a correlated subquery that checks whether a matching row is found. The NOT EXISTS predicate returns TRUE if no matching row is found. (This is a little easier to understand, but it usually doesn't perform as well as the anti-join pattern.)
You can use the following statement if each menu item should have a record on each language (8 in real life 4 in example). You can change the number 4 to 8 if you want to see all menu items per restaurant that doesn't have all 8 entries.
SELECT RestaurantID,DishID, COUNT( * )
FROM Menu
GROUP BY RestaurantID,DishID
HAVING COUNT( * ) <4
I have two tables, one with ranges of numbers, second with numbers. I need to select all ranges, which have at least one number with status in (2,0). I have tried number of different joins, some of them took forever to execute, one which I ended with is fast, but it select really small number of ranges.
SELECT SQL_CALC_FOUND_ROWS md_number_ranges.*
FROM md_number_list
JOIN md_number_ranges
ON md_number_list.range_id = md_number_ranges.id
WHERE md_number_list.phone_num_status NOT IN (2, 0)
AND md_number_ranges.reseller_id=1
GROUP BY range_id
LIMIT 10
OFFSET 0
What i need is something like "select all ranges, join numbers where number.range_id = range.id and where there is at least one number with phone_number_status not in (2, 0).
Any help would be really appreciated.
Example data structure:
md_number_ranges:
id | range_start | range_end | reseller_id
1 | 000001 | 000999 | 1
2 | 100001 | 100999 | 2
md_number_list:
id | range_id | number | phone_num_status
1 | 1 | 0000001 | 1
2 | 1 | 0000002 | 2
3 | 2 | 1000012 | 0
4 | 2 | 1000015 | 2
I want to be able select range 1, because it has one number with status 1, but not range 2, because it has two numbers, but with status which i do not want to select.
It's a bit hard to tell what you want, but perhaps this will do:
SELECT *
from md_number_ranges m
join (
SELECT md_number_ranges.id
, count(*) as FOUND_ROWS
FROM md_number_list
JOIN md_number_ranges
ON md_number_list.range_id = md_number_ranges.id
WHERE md_number_list.phone_num_status NOT IN (2, 0)
AND md_number_ranges.reseller_id=1
GROUP BY range_id
) x
on x.id=m.id
LIMIT 10
OFFSET 0
Is this what you're looking for?
SELECT DISTINCT r.*
FROM md_number_ranges r
JOIN md_number_list l ON r.id = l.range_id
WHERE l.phone_num_status NOT IN (0,2)
SQL Fiddle Demo