MySQL Many to Many Table Join Slow Performance - mysql

I have two tables with a joining column having a Many to Many relationship. There are a few hundred thousand records in each table. I'm seeing some very slow query performance and am having trouble singling out the issue.
Table_A:
+---------------------------+-------------+---------------+
| ID | Name varchar (30) | Age int(3) | Status int(1) |
+----+----------------------+-------------+---------------+
| 1 | Tom | 23 | 1 |
| 2 | Jerry | 34 | 2 |
| 3 | Smith | 21 | 1 |
| 4 | Ben | 46 | 5 |
+---------------------------+-------------+---------------+
Table_B:
+---------------------------+-------------+---------------+
| ID | Name varchar (30) | Sign int(3) | Status int(1) |
+----+----------------------+-------------+---------------+
| 1 | Tom | 12 | 1 |
| 2 | Smith | 8 | 1 |
| 3 | Tom | 3 | 0 |
| 4 | Tom | 10 | 1 |
+---------------------------+-------------+---------------+
I need to get the Age of each Name in Table A who has at least one row in Table B with a match on Name and a Status (Table B) of 1.
I tried:
SELECT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name
WHERE Table_B.Status=1;
That query takes so long I haven't waited for it to return.
I then tried:
SELECT DISTINCT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name AND Table_B.Status=1;
That returned very fast.
I tested further and tried:
SELECT DISTINCT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name
WHERE Table_B.Status=1;
That again didn't return.
I'm confused as to what's going on here.
In the last query shouldn't the WHERE condition act the same as the previous query's JOIN ON condition (Status=1)?
Why does SELECT DISTINCT return results whereas without using DISTINCT the process takes forever?

For a many-to-many table, do not include an AUTO_INCREMENT. Do have the PRIMARY KEY include both other ids. Do have another index. Do use InnoDB.
See More details, plus rationale.

Without seeing an explain plan (or whatever the MySQL equivalent is) it's impossible to say for certain.
My guess would be that the server knows that your OUTER JOIN' to table B is completely irrelevant when you useSELECT DISTINCT, so it just runs against table A and gets the Age values from there without even performing theJOIN. Do you see why theOUTER JOIN` is irrelevant?
In the first query the server needs to perform the JOIN to get the right number of rows back.
When you add the additional logic to your WHERE clause in the last query you've effectively turned it into an INNER JOIN, so now the JOIN has to happen again and it takes a long time.

Make sure you have indexes set on the Table_A.Name, Table_B.Name and Table_B.Status columns

First, you don't need a LEFT JOIN, because you only care about matches:
SELECT a.Age
FROM Table_A a JOIN
Table_B b
ON Table_A.Name = b.Name
WHERE b.Status = 1;
For this query can take advantage of indexes on Table_B(status, name) and Table_A(Name, Age).

Related

Select count with value from different tables

I want to count all entries in one table grouped by the user id.
This is the query I used which works fine.
select uuid_mapping_id, count(*) from t_message group by uuid_mapping_id;
and these are the results:
+-----------------+----------+
| uuid_mapping_id | count(*) |
+-----------------+----------+
| 1 | 65 |
| 4 | 277 |
Now I would like to display the actual user name, instead of the ID.
To achieve this I would need the help of two different tables.
The table t_uuid_mapping which has two columns:
uid_mapping_id, which equals uuid_mapping_id in the other table.
And f_uuid which is also unique but completely different.
f_uuid can also be found in another table t_abook which also contains the names in the column f_name.
The result I am looking for should be:
+-----------------+----------+
| f_name | count(*) |
+-----------------+----------+
| admin | 65 |
| user1 | 277 |
I am new to the database topic and understand that this could be achieved by using JOIN in the query, but to be honest I did not completely understand this yet.
if I understand you correctly:
SELECT tm.f_name, COUNT(*) as count
FROM t_message tm
LEFT JOIN t_abook ta ON (tm.uuid_mapping_id = ta.uid_mapping_id)
GROUP BY tm.f_name

Left Join takes very long time on 150 000 rows

I am having some difficulties to accomplish a task.
Here is some data from orders table:
+----+---------+
| id | bill_id |
+----+---------+
| 3 | 1 |
| 9 | 3 |
| 10 | 4 |
| 15 | 6 |
+----+---------+
And here is some data from a bills table:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+----+
I want to list all the bills that have no order associated with.
In order to achieve that, I thought that the use of LEFT JOIN was appropriated so I wrote this request:
SELECT * FROM bills
LEFT JOIN orders
ON bills.id = orders.bill_id
WHERE orders.bill_id IS NULL;
I thought that I would have the following result:
+----------+-----------+----------------+
| bills.id | orders.id | orders.bill_id |
+----------+-----------+----------------+
| 2 | NULL | NULL |
| 5 | NULL | NULL |
+----------+-----------+----------------+
But I can't reach the end of the request, it has run more than 5 minutes without result, I stopped the request because this can't be a production time anyway.
My real dataset has more than 150 000 orders and 100 000 bills. Is the dataset too big?
Is my request wrong somewhere?
Thank you very much for your tips!
EDIT: side note, the tables have no foreign keys defined... *flies away*
Your query is fine. I would use table aliases in writing it:
SELECT b.*
FROM bills b LEFT JOIN
orders o
ON b.id = o.bill_id
WHERE o.bill_id IS NULL;
You don't need the NULL columns from orders, probably.
You need an index on orders(bill_id):
create index idx_orders_billid on orders(bill_id);
By your where statement, I assume your looking for orders that have no bills.
If that's the case you don't need to do a join to the bills table as they would by definition not exist.
You will find
SELECT * FROM orders
WHERE orders.bill_id IS NULL;
A much better performing query.
Edit:
Sorry I missed your "I want to list all the bills that have no order associated with." when reading the question. As #gordon pointed out an index would certainly help. However if changing the scheme is feasible I would rather have a nullable bill.order_id column instead of a order.bill_id because you won't need a left join, an inner join would suffice to get order bills as it would be a quicker query for your other assumed requirements.

Select two (or more) consecutive rows in a MySQL table

The question in short
What is an efficient, scalable way of selecting two (or more) rows from a table with consecutive IDs, especially if this table is joined with another table?
Related questions have been asked before on Stack Overflow, e.g.:
SQL check adjacent rows for sequence
How select where there are 2 consecutives rows with a specific value using MySQL?
The answers to these questions suggest a self-join. My working example described below uses that suggestion, but it performs very, very poorly on larger data sets. I've ran out of ideas how to improve it, and I'd really appreciate your input.
The issue in detail
Let's assume I were developing a database that keeps track of ball possession during a football/soccer match (please understand that I can't disclose the purpose of my real application). I require an efficient, scalable way that allows me to query changes of ball possession from one player to another (i.e. passes). For example, I might be interested in a list of all passes from any defender to any forward.
Mock database structure
My mock database consists of two tables, The first table Players stores the players' names in the Name column and their position (GOA, DEF, MID, FOR for goalie, defender, midfield, forward) in the POS column.
The second table Possession keeps track of ball possession. Whenever ball possession changes, i.e. the ball is passed to a new player, a row is added to this table. The primary key ID also indicates the temporal order of possession changes: consecutive IDs indicate an immediate sequence of ball possessions.
CREATE TABLE Players(
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
POS VARCHAR(3) NOT NULL,
Name VARCHAR(7) NOT NULL);
CREATE TABLE Possession(
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
PlayerID INT NOT NULL);
Next, we create some indices:
CREATE INDEX POS ON Players(POS);
CREATE INDEX Name ON Players(Name);
CREATE INDEX PlayerID ON Possession(PlayerID);
Now, we populate the Players table with a few players, and also add test entries to the Possession table:
INSERT INTO Players (POS, Name) VALUES
('DEF', 'James'), ('DEF', 'John'), ('DEF', 'Michael'),
('DEF', 'David'), ('MID', 'Charles'), ('MID', 'Thomas'),
('MID', 'Paul'), ('FOR', 'Bob'), ('GOAL', 'Kenneth');
INSERT INTO Possession (PlayerID) VALUES
(1), (8), (2), (5), (3), (8), (3), (9), (6), (4), (7), (9);
Let's quickly check our database by joining the Possession and the Players table:
SELECT Possession.ID, PlayerID, POS, Name
FROM
Possession
INNER JOIN Players ON Possession.PlayerID = Players.ID
ORDER BY Possession.ID;
This looks good:
+----+----------+-----+---------+
| ID | PlayerID | POS | Name |
+----+----------+-----+---------+
| 1 | 1 | DEF | James |
| 2 | 8 | FOR | Bob |
| 3 | 2 | DEF | John |
| 4 | 5 | MID | Charles |
| 5 | 3 | DEF | Michael |
| 6 | 8 | FOR | Bob |
| 7 | 3 | DEF | Michael |
| 8 | 9 | GOA | Kenneth |
| 9 | 6 | MID | Thomas |
| 10 | 4 | DEF | David |
| 11 | 7 | MID | Paul |
| 12 | 9 | GOA | Kenneth |
+----+----------+-----+---------+
The table can be read like this: First, the DEFender James passed to the FORward Bob. Then, Bob passed to the DEFender John, who in turn passed to the MIDfield Charles. After some more passes, the ball ends with the GOAlkeeper Kenneth.
Working solution
I need a query that lists all passes from any defender to any forward. As we can see in the previous table, there are two instances of that: right at the start, James sends the ball to Bob (row ID 1 to ID 2), and later on, Michael sends the ball to Bob (row ID 5 to ID 6).
In order to do this in SQL, I create a self-join for the Possession table, with the second instance being offset by one row. In order to be able to access the players' names and positions, I also join the two Possession table instances to the Players table. The following query does that:
SELECT
M1.ID AS "From",
M2.ID AS "To",
P1.Name AS "Sender",
P2.Name AS "Receiver"
FROM
Possession AS M1
INNER JOIN Possession as M2 ON M2.ID = M1.ID + 1
INNER JOIN Players as P1 ON M1.PlayerId = P1.ID AND P1.POS = "DEF" -- see execution plan
INNER JOIN Players as P2 ON M2.PlayerId = P2.ID AND P2.POS = "FOR"
We get the expected output:
+------+----+---------+----------+
| From | To | Sender | Receiver |
+------+----+---------+----------+
| 1 | 2 | James | Bob |
| 5 | 6 | Michael | Bob |
+------+----+---------+----------+
The problem
While this query is executed virtually instantly in the mock football database, there appears to be a problem in the execution plan with this query. Here is the output of EXPLAIN for it:
+------+-------------+-------+------+------------------+----------+---------+------------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+------------------+----------+---------+------------+------+-------------------------------------------------+
| 1 | SIMPLE | P2 | ref | PRIMARY,POS | POS | 5 | const | 1 | Using index condition |
| 1 | SIMPLE | M2 | ref | PRIMARY,PlayerID | PlayerID | 4 | MOCK.P2.ID | 1 | Using index |
| 1 | SIMPLE | P1 | ALL | PRIMARY,POS | NULL | NULL | NULL | 9 | Using where; Using join buffer (flat, BNL join) |
| 1 | SIMPLE | M1 | ref | PlayerID | PlayerID | 4 | MOCK.P1.ID | 1 | Using where; Using index |
+------+-------------+-------+------+------------------+----------+---------+------------+------+-------------------------------------------------+
I have to admit that I'm not very good at interpreting query execution plans. But it seems to me that the third row indicates a bottle neck for the join marked in the query above: apparently, a full scan is done for the P1 alias table, no key seems to be used even though POS and the primary key are available, and the join buffer (flat, BNL join) part is also very suspicious. I don't know what any of that means, but I usually don't find this with normal joins.
Perhaps due to this bottle neck, the query does not finish within any acceptable time span for my real database. My real equivalent to the mock Players table has ~60,000 rows, and the Possession equivalent has ~1,160,000 rows. I monitored the execution of the query via SHOW PROCESSLIST. After more than 600 seconds, the process was still tagged as Sending data, at which point I killed the process.
The query plan on this larger data set is rather similar to the one for the small mock data set. The third join appears to be problematic with no key used, a full table scan being performed, and the join buffer part that I don't really understand:
+------+-------------+-------+------+---------------+----------+---------+------------------+-------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+----------+---------+------------------+-------+-------------------------------------------------+
| 1 | SIMPLE | P2 | ref | POS | POS | 1 | const | 1748 | Using index condition |
| 1 | SIMPLE | M2 | ref | PlayerId | PlayerId | 2 | REAL.P2.PlayerId | 7 | |
| 1 | SIMPLE | P1 | ALL | POS | NULL | NULL | NULL | 61917 | Using where; Using join buffer (flat, BNL join) |
| 1 | SIMPLE | M1 | ref | PlayerId | PlayerId | 2 | REAL.P1.PlayerId | 7 | Using where |
+------+-------------+-------+------+---------------+----------+---------+-----------------------+-------+-------------------------------------------------+
I tried forcing an index for the aliased table P1 by using Players AS P1 FORCE INDEX (POS) instead of Players AS P1 in the query shown above. This change does affect the execution plan. If I force POS to be used as the key, the third line in the output of EXPLAIN is very similar to the first line. The only difference is the number of rows, which is still very high (30912). Even this modified query did not complete after 600 seconds.
I don't think that this is a configuration issue. I have made up to 18 GB of RAM available to the MySQL server, and the server uses this memory for other queries. For the present query, memory consumption does not exceed 2 GB of RAM.
Back to the question
Thanks for staying this somewhat long-winded explanation up to this point!
Let's return to the initial question: What is an efficient, scalable way of selecting two (or more) rows from a table with consecutive IDs, especially if this table is joined with another table?
My current query certainly is doing something wrong, as it didn't finish even after ten minutes. Is there something that I can change in my current query to make it useful for my larger real data set? If not: is there an alternative, better solution that I could use?
I believe the issue is that you only have single field indexes on the players table. MySQL can only use a single index per joined table.
In case of the player table 2 fields are key from performance point of view:
playerid, since it is used in the join;
pos, since you filter on it.
You seem to have standalone indexes on both fields, but this forces MySQL to choose whether to use index for joining the 2 tables or to filter based on the where criteria.
I would create a multi-column index on playerid, pos fields (in this order), which can satisfy both the join and the where. This way MySQL can use a single index to satisfy both the join and the where.
I would also use explicit join instead of the comma separated list of tables with the join condition in where for better readability.
Here's a general plan:
SELECT
#n := #n + 1 AS N, -- Now the rows will be numbered 1,2,3,...
...
FROM ( SELECT #n := 0 ) AS init
JOIN tbl
ORDER BY ... -- based on your definition of 'consecutive'
Then you can use that query as a subquery somewhere else.
SELECT ...
FROM ( the above query ) AS x
GROUP BY ceiling(N/2) -- 1&2 will be grouped together; 3&4; etc
You can use `IF((N % 2) = 1, ..., ...) to different things with first versus second item in each pair.
You mentioned JOINing to another table. If possible, avoid doing the JOIN until this last SELECT.

MySQL Intermediate-Level Table Relationship

Each row in Table_1 needs to have a relationship with one or more rows that might come from any number of other tables in the database (Table_X). So I set up an intermediate table (Table_2) where each row contains an id from Table_1, and the id from Table_X. It also has its own auto increment id since none of the relationships will be exclusive and therefore both the other ids will not be unique in the table.
My problem now is that when I retrieve the row from Table_1 and would like to see the information from each related row from Table_X, I don't know how to get it. At first I thought I could create a column for the exact name of Table_X for each row in Table_2 and have a second SELECT statement using that information, but I've been seeing inklings about things such as foreign keys and join statements that I think I need to get into. I'm just having trouble sorting it all out. Do I even need Table_2?
This probably isn't overly complicated, but I'm just getting into MySQL and this is the first real challenge I've encountered.
Edit to include requested information: If I understand correctly, I think I'm dealing with a many to many relationship. Table_3 has games; Table_1 has articles. An article can be about multiple games, and a game can also have multiple articles written about it. The only other possibly pertinent information I can see is that when a new article is made, every game that will be related to it is decided all at once. But the list of articles related to a given game can grow over time as more articles are written. That's probably not especially important, however.
If I understood correctly You are talking about one to many relationship in database (for example: one person can have multiple phone numbers), You can store data in two separate tables persons and phones.
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Phones:
|phone_id |person_id |phone_number|
| 1 | 1 | 31337 |
| 2 | 1 | 370 |
Then you can execute query with Join:
SELLECT * FROM `persons`
LEFT JOIN `phones` ON `persons`.`person_id` = `phones`.`person_id`
WHERE `persons`.`person_id` = 1;
And it will return to You list of persons with phone numbers:
|person_id|person_name |person_age |phone_id |person_id |phone_number|
| 1 | Bodan Kustan| 28 | 1 | 1 | 31337 |
| 1 | Bodan Kustan| 28 | 2 | 1 | 370 |
Another possibility is Many to Many relationship (for example: Any person can love pizza, and pizza is not unique for that person), then You need third table to join tables together person_food
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Food:
|food_id |food_name |
| 1 | meat |
| 2 | pizza |
Person_Food
|person_id |food_id |
| 1 | 2 |
Then you can execute query with Join:
SELLECT * FROM `persons`
LEFT JOIN `person_food` ON `person`.`person_id` = `person_food`.`person_id`
LEFT JOIN `food` ON `food`.`food_id` = `person_food`.`food_id`
WHERE `persons`.`person_id` = 1;
And it will return data from all tables:
|person_id|person_name |person_age |person_id |food_id |food_name |
| 1 | Bodan Kustan| 28 | 1 | 2 | pizza |
However sometimes you need to join n amount of tables to join, then You could use separate table to hold information about relation. My approach (I don't think it's the best) would be to store table name next to relation (for example split mobile phones and home phones into two separate tables):
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Mobile_Phone:
|mobile_phone_id |mobile_phone_number |
| 1 | 31337 |
Home_Phone:
|home_phone_id |home_phone_number |
| 1 | 370 |
Person_Phone:
|person_id |related_id |related_column |related_table |
| 1 | 1 | mobile_phone_id | mobile_phone |
| 1 | 1 | home_phone_id | home_phone |
Then query middle table to get all relations:
SELECT * FROM person_phone WHERE person_id = 1
Then build dynamic query (pseudo code, not tested -- might not work):
foreach (results as result)
append_to_final_sql = "LEFT JOIN {related_table}
ON {related_table}.{related_column} = `person_phone`.`related_id`
AND `person_phone`.`related_table` = {related_table}"
final_sql = "SELECT * FROM `persons` "
+ append_to_final_sql +
" WHERE `persons`.`person_id` = 1"
So Your final SQL would be:
SELECT * FROM `persons`
LEFT JOIN `person_phone` ON `person_phone`.`person_id` = `person`.`person_id`
LEFT JOIN `mobile_phone` ON `mobile_phone`.`mobile_phone_id` = `person_phone`.`related_id` AND `person_phone`.`related_table` = 'mobile_phone'
LEFT JOIN `home_phone` ON `home_phone`.`home_phone_id` = `person_phone`.`related_id` AND `person_phone`.`related_table` = 'home_phone'
You only need Table2 if entries in Table_x can be related to multiple rows in Table1 - otherwise a simple key for Table1 will suffice.
Look into joins - very powerful, flexible and fast.
select * from Table1 left join Table2 on Table1_id = Table2_table_1_id
left join Table_X on Tablex_id = Table2_table_x_id
Look at the output and you'll see that it returns all table_x rows with copies of the Table1 and Table2 fields.

SELECT from Union x 3 using filter of another table

Background
I have a web application which must remove entries from other tables, filtered through a selection of 'tielists' from table 1 -> item_table 1, table 2, table 3.... now basically my result set is going to be filthy big unless I use a filter statement from another table, using a user_id... so can someone please help me structure my statement as needed? TY!
Tables
cars_belonging_to_user
-----------------------------
ID | user_id | make | model
----------------------------
1 | 1 | Toyota | Camry
2 | 1 |Infinity| Q55
3 | 1 | DMC | DeLorean
4 | 2 | Acura | RSX
Okay, Now the three 'tielists'
name:tielist_one
----------------------------
id | id_of_car | id_x | id_y|
1 | 1 | 12 | 22 |
2 | 2 | 23 | 32 |
-----------------------------
name:tielist_two
-------------------------------
id | id_of_car | id_x | id_z|
1 | 3 | 32 | 22 |
-----------------------------
name: tielist_three
id | id_of_car | id_x | id_a|
1 | 4 | 45 | 2 |
------------------------------
Result Set and Code
echo name_of_tielist_table
// I can structure if statements to echo result sets based upon the name
// Future Methodology: if car_id is in tielist_one, delete id_x from x_table, delete id_y from y_table...
// My output should be a double select base:
--SELECT * tielists from WHERE car_id is 1... output name of tielist... then
--SELECT * from specific_tielist where car_id is 1.....delete x_table, delete y_table...
Considering the list will be massive, and the tielist equally long, I must filter the results where car_id(id) = $variable && user_id = $id....
Side Notes
Only one car id will appear once in any single tielist..
This select statement MUST be filtered with user_id = $variable... (and remember, i'm looking for which car id too)
I MUST HAVE THE NAME of the tielist it comes from able to be echo'd into a variable...
I will only be looking for one single id_of_car at any given time, because this select will be contained in a foreach loop.
I was thinking a union all items would do the trick to select the row, but how can I get the name of the tielist the row is in, and how can the filter be used from the user_id row
If you want performance, I would suggest left outer join instead of union all. This will allow the query to make efficient use of indexes for your purpose.
Based on what you say, a car is in exactly one of the lists. This is important for this method to work. Here is the SQL:
select cu.*,
coalesce(tl1.id_x, tl2.id_x, tl3.id_x) as id_x,
tl1.y, tl2.idz, tl3.id_a,
(case when tl1.id is not null then 'One'
when tl2.id is not null then 'Two'
when tl3.id is not null then 'Three'
end) as TieList
from Cars_Belonging_To_User cu left ouer join
TieList_One tl1
on cu.id_of_car = tl1.id_of_car left outer join
TieList_Two tl2
on cu.id_of_car = tl2.id_of_car left outer join
TieList_Three tl3
on cu.id_of_car = tl3.id_of_car;
You can then add a where clause to filter as you need.
If you have an index on id_of_car for each tielist table, then the performance should be quite good. If the where clause uses an index on the first table, then the joins and where should all be using indexes, and the query will be quite fast.