Left Join takes very long time on 150 000 rows - mysql

I am having some difficulties to accomplish a task.
Here is some data from orders table:
+----+---------+
| id | bill_id |
+----+---------+
| 3 | 1 |
| 9 | 3 |
| 10 | 4 |
| 15 | 6 |
+----+---------+
And here is some data from a bills table:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+----+
I want to list all the bills that have no order associated with.
In order to achieve that, I thought that the use of LEFT JOIN was appropriated so I wrote this request:
SELECT * FROM bills
LEFT JOIN orders
ON bills.id = orders.bill_id
WHERE orders.bill_id IS NULL;
I thought that I would have the following result:
+----------+-----------+----------------+
| bills.id | orders.id | orders.bill_id |
+----------+-----------+----------------+
| 2 | NULL | NULL |
| 5 | NULL | NULL |
+----------+-----------+----------------+
But I can't reach the end of the request, it has run more than 5 minutes without result, I stopped the request because this can't be a production time anyway.
My real dataset has more than 150 000 orders and 100 000 bills. Is the dataset too big?
Is my request wrong somewhere?
Thank you very much for your tips!
EDIT: side note, the tables have no foreign keys defined... *flies away*

Your query is fine. I would use table aliases in writing it:
SELECT b.*
FROM bills b LEFT JOIN
orders o
ON b.id = o.bill_id
WHERE o.bill_id IS NULL;
You don't need the NULL columns from orders, probably.
You need an index on orders(bill_id):
create index idx_orders_billid on orders(bill_id);

By your where statement, I assume your looking for orders that have no bills.
If that's the case you don't need to do a join to the bills table as they would by definition not exist.
You will find
SELECT * FROM orders
WHERE orders.bill_id IS NULL;
A much better performing query.
Edit:
Sorry I missed your "I want to list all the bills that have no order associated with." when reading the question. As #gordon pointed out an index would certainly help. However if changing the scheme is feasible I would rather have a nullable bill.order_id column instead of a order.bill_id because you won't need a left join, an inner join would suffice to get order bills as it would be a quicker query for your other assumed requirements.

Related

MYSQL : Group by all weeks of a year with 0 included

I have a question about some mysql code.
I have a table referencing some employees with the date of arrival et the project id. I wanna calculate all the entries in the enterprise and group it by week.
A this moment, I can have this result
Project ID | Week | Count
1 | 2019-S01 | 2
1 | 2019-S03 | 1
2 | 2019-S01 | 1
2 | 2019-S04 | 5
2 | 2019-S05 | 3
2 | 2019-S06 | 2
This is good, but I would like to have all the weeks returned, even if a week has 0 as result :
Project ID | Week | Count
1 | 2019-S01 | 2
1 | 2019-S02 | 0
1 | 2019-S03 | 1
...
2 | 2019-S01 | 1
2 | 2019-S02 | 0
2 | 2019-S03 | 0
2 | 2019-S04 | 5
2 | 2019-S05 | 3
2 | 2019-S06 | 2
...
Here is my actual code :
SELECT
AP.SECTION_ANALYTIQUE AS SECTION,
FS_GET_FORMAT_SEMAINE(AP.DATE_ARRIVEE_PROJET) AS SEMAINE,
Count(*) AS COMPTE
FROM
RT00_AFFECTATIONS_PREV AP
WHERE
(AP.DATE_ARRIVEE_PROJET <= CURDATE() AND Year(AP.DATE_ARRIVEE_PROJET) >= Year(CURDATE()))
GROUP BY
SECTION, SEMAINE
ORDER BY
SECTION
Does anybody have a solution ?
I searched things on internet but didn't find anything accurate :(
Thank you in advance ! :)
The classic way to meet this requirement is to create a referential table to store all possible weeks.
create table all_weeks(week varchar(8) primary key);
insert into all_weeks values
('2019-S01'), ('2019-S02'), ('2019-S03'), ('2019-S04'), ('2019-S05'), ('2019-S06');
Once this is done, you can generate a cartesian product of all possible sections and weeks with a CROSS JOIN, and LEFT JOIN that with the original table.
Given your code snippet, this should look like:
SELECT
s.section_analytique AS section,
w.week AS semaine,
COUNT(ap.section_analytique) AS compte
FROM
(SELECT DISTINCT section_analytique from rt00_affectations_prev) s
CROSS JOIN all_weeks w
LEFT JOIN rt00_affectations_prev ap
ON s.section_analytique = ap.section_analytique AND w.week = FS_GET_FORMAT_SEMAINE(ap.date_arrivee_projet)
GROUP BY s.section_analytique, w.week
ORDER BY s.section_analytique
PS: be careful not to put conditions on the original table in the WHERE clause: this would defeat the purpose of the LEFT JOIN. If you need to do some filtering, use the referential table instead (you might need to add a few columns to it, like the starting date of the week maybe).

joining on a table that references another table's column twice

I have this table Meeting in POSTGRESQL that looks like:
id | subject | time |
1 | Eat stuff| 2017-08-23 00:00:00 |
2 | Do stuff | 2017-08-28 00:00:00 |
And another table SimilarMeeting that looks like:
meetingId | similarId | score |
1 | 2 | 0.8 |
where SimilarMeeting.meetingId and SimilarMeeting.similarId are foreign keys to the Meeting table. I'm trying to generate a join statement that'd give me a result that looks like:
meetingSubject | similarSubject | score
Eat stuff | Do stuff | 0.8
I'm kinda stumped here as two inner joins don't work here since the first inner join and second inner join reference two different Meeting rows.
select "ma"."subject", "mb"."subject",
"eva"."SimilarMeeting"."similarityScore" from "eva"."SimilarMeeting"
join "eva"."Meeting" AS ma on "eva"."SimilarMeeting"."meetingId" = "ma"."id"
join "eva"."Meeting" AS mb on "eva"."SimilarMeeting"."similarId" = "ma"."id"
in the last line, in the join condition, you join with ma instead of mb.

Select count with value from different tables

I want to count all entries in one table grouped by the user id.
This is the query I used which works fine.
select uuid_mapping_id, count(*) from t_message group by uuid_mapping_id;
and these are the results:
+-----------------+----------+
| uuid_mapping_id | count(*) |
+-----------------+----------+
| 1 | 65 |
| 4 | 277 |
Now I would like to display the actual user name, instead of the ID.
To achieve this I would need the help of two different tables.
The table t_uuid_mapping which has two columns:
uid_mapping_id, which equals uuid_mapping_id in the other table.
And f_uuid which is also unique but completely different.
f_uuid can also be found in another table t_abook which also contains the names in the column f_name.
The result I am looking for should be:
+-----------------+----------+
| f_name | count(*) |
+-----------------+----------+
| admin | 65 |
| user1 | 277 |
I am new to the database topic and understand that this could be achieved by using JOIN in the query, but to be honest I did not completely understand this yet.
if I understand you correctly:
SELECT tm.f_name, COUNT(*) as count
FROM t_message tm
LEFT JOIN t_abook ta ON (tm.uuid_mapping_id = ta.uid_mapping_id)
GROUP BY tm.f_name

MySQL Many to Many Table Join Slow Performance

I have two tables with a joining column having a Many to Many relationship. There are a few hundred thousand records in each table. I'm seeing some very slow query performance and am having trouble singling out the issue.
Table_A:
+---------------------------+-------------+---------------+
| ID | Name varchar (30) | Age int(3) | Status int(1) |
+----+----------------------+-------------+---------------+
| 1 | Tom | 23 | 1 |
| 2 | Jerry | 34 | 2 |
| 3 | Smith | 21 | 1 |
| 4 | Ben | 46 | 5 |
+---------------------------+-------------+---------------+
Table_B:
+---------------------------+-------------+---------------+
| ID | Name varchar (30) | Sign int(3) | Status int(1) |
+----+----------------------+-------------+---------------+
| 1 | Tom | 12 | 1 |
| 2 | Smith | 8 | 1 |
| 3 | Tom | 3 | 0 |
| 4 | Tom | 10 | 1 |
+---------------------------+-------------+---------------+
I need to get the Age of each Name in Table A who has at least one row in Table B with a match on Name and a Status (Table B) of 1.
I tried:
SELECT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name
WHERE Table_B.Status=1;
That query takes so long I haven't waited for it to return.
I then tried:
SELECT DISTINCT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name AND Table_B.Status=1;
That returned very fast.
I tested further and tried:
SELECT DISTINCT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name
WHERE Table_B.Status=1;
That again didn't return.
I'm confused as to what's going on here.
In the last query shouldn't the WHERE condition act the same as the previous query's JOIN ON condition (Status=1)?
Why does SELECT DISTINCT return results whereas without using DISTINCT the process takes forever?
For a many-to-many table, do not include an AUTO_INCREMENT. Do have the PRIMARY KEY include both other ids. Do have another index. Do use InnoDB.
See More details, plus rationale.
Without seeing an explain plan (or whatever the MySQL equivalent is) it's impossible to say for certain.
My guess would be that the server knows that your OUTER JOIN' to table B is completely irrelevant when you useSELECT DISTINCT, so it just runs against table A and gets the Age values from there without even performing theJOIN. Do you see why theOUTER JOIN` is irrelevant?
In the first query the server needs to perform the JOIN to get the right number of rows back.
When you add the additional logic to your WHERE clause in the last query you've effectively turned it into an INNER JOIN, so now the JOIN has to happen again and it takes a long time.
Make sure you have indexes set on the Table_A.Name, Table_B.Name and Table_B.Status columns
First, you don't need a LEFT JOIN, because you only care about matches:
SELECT a.Age
FROM Table_A a JOIN
Table_B b
ON Table_A.Name = b.Name
WHERE b.Status = 1;
For this query can take advantage of indexes on Table_B(status, name) and Table_A(Name, Age).

MySQL Intermediate-Level Table Relationship

Each row in Table_1 needs to have a relationship with one or more rows that might come from any number of other tables in the database (Table_X). So I set up an intermediate table (Table_2) where each row contains an id from Table_1, and the id from Table_X. It also has its own auto increment id since none of the relationships will be exclusive and therefore both the other ids will not be unique in the table.
My problem now is that when I retrieve the row from Table_1 and would like to see the information from each related row from Table_X, I don't know how to get it. At first I thought I could create a column for the exact name of Table_X for each row in Table_2 and have a second SELECT statement using that information, but I've been seeing inklings about things such as foreign keys and join statements that I think I need to get into. I'm just having trouble sorting it all out. Do I even need Table_2?
This probably isn't overly complicated, but I'm just getting into MySQL and this is the first real challenge I've encountered.
Edit to include requested information: If I understand correctly, I think I'm dealing with a many to many relationship. Table_3 has games; Table_1 has articles. An article can be about multiple games, and a game can also have multiple articles written about it. The only other possibly pertinent information I can see is that when a new article is made, every game that will be related to it is decided all at once. But the list of articles related to a given game can grow over time as more articles are written. That's probably not especially important, however.
If I understood correctly You are talking about one to many relationship in database (for example: one person can have multiple phone numbers), You can store data in two separate tables persons and phones.
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Phones:
|phone_id |person_id |phone_number|
| 1 | 1 | 31337 |
| 2 | 1 | 370 |
Then you can execute query with Join:
SELLECT * FROM `persons`
LEFT JOIN `phones` ON `persons`.`person_id` = `phones`.`person_id`
WHERE `persons`.`person_id` = 1;
And it will return to You list of persons with phone numbers:
|person_id|person_name |person_age |phone_id |person_id |phone_number|
| 1 | Bodan Kustan| 28 | 1 | 1 | 31337 |
| 1 | Bodan Kustan| 28 | 2 | 1 | 370 |
Another possibility is Many to Many relationship (for example: Any person can love pizza, and pizza is not unique for that person), then You need third table to join tables together person_food
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Food:
|food_id |food_name |
| 1 | meat |
| 2 | pizza |
Person_Food
|person_id |food_id |
| 1 | 2 |
Then you can execute query with Join:
SELLECT * FROM `persons`
LEFT JOIN `person_food` ON `person`.`person_id` = `person_food`.`person_id`
LEFT JOIN `food` ON `food`.`food_id` = `person_food`.`food_id`
WHERE `persons`.`person_id` = 1;
And it will return data from all tables:
|person_id|person_name |person_age |person_id |food_id |food_name |
| 1 | Bodan Kustan| 28 | 1 | 2 | pizza |
However sometimes you need to join n amount of tables to join, then You could use separate table to hold information about relation. My approach (I don't think it's the best) would be to store table name next to relation (for example split mobile phones and home phones into two separate tables):
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Mobile_Phone:
|mobile_phone_id |mobile_phone_number |
| 1 | 31337 |
Home_Phone:
|home_phone_id |home_phone_number |
| 1 | 370 |
Person_Phone:
|person_id |related_id |related_column |related_table |
| 1 | 1 | mobile_phone_id | mobile_phone |
| 1 | 1 | home_phone_id | home_phone |
Then query middle table to get all relations:
SELECT * FROM person_phone WHERE person_id = 1
Then build dynamic query (pseudo code, not tested -- might not work):
foreach (results as result)
append_to_final_sql = "LEFT JOIN {related_table}
ON {related_table}.{related_column} = `person_phone`.`related_id`
AND `person_phone`.`related_table` = {related_table}"
final_sql = "SELECT * FROM `persons` "
+ append_to_final_sql +
" WHERE `persons`.`person_id` = 1"
So Your final SQL would be:
SELECT * FROM `persons`
LEFT JOIN `person_phone` ON `person_phone`.`person_id` = `person`.`person_id`
LEFT JOIN `mobile_phone` ON `mobile_phone`.`mobile_phone_id` = `person_phone`.`related_id` AND `person_phone`.`related_table` = 'mobile_phone'
LEFT JOIN `home_phone` ON `home_phone`.`home_phone_id` = `person_phone`.`related_id` AND `person_phone`.`related_table` = 'home_phone'
You only need Table2 if entries in Table_x can be related to multiple rows in Table1 - otherwise a simple key for Table1 will suffice.
Look into joins - very powerful, flexible and fast.
select * from Table1 left join Table2 on Table1_id = Table2_table_1_id
left join Table_X on Tablex_id = Table2_table_x_id
Look at the output and you'll see that it returns all table_x rows with copies of the Table1 and Table2 fields.