Optimize SQL query (Facebook-like application) - mysql

My application is similar to Facebook, and I'm trying to optimize the query that get user records. The user records are that he as src ou dst. The src is in usermuralentry directly, the dst list are in usermuralentry_user.
So, a entry can have one src and many dst.
I have those tables:
mysql> desc usermuralentry ;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_src_id | int(11) | NO | MUL | NULL | |
| private | tinyint(1) | NO | | NULL | |
| content | longtext | NO | | NULL | |
| date | datetime | NO | | NULL | |
| last_update | datetime | NO | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
10 rows in set (0.10 sec)
mysql> desc usermuralentry_user ;
+-------------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| usermuralentry_id | int(11) | NO | MUL | NULL | |
| userinfo_id | int(11) | NO | MUL | NULL | |
+-------------------+---------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
And the following query to retrieve information from two users.
mysql> explain
SELECT *
FROM usermuralentry AS a
, usermuralentry_user AS b
WHERE a.user_src_id IN ( 1, 2 )
OR
(
a.id = b.usermuralentry_id
AND b.userinfo_id IN ( 1, 2 )
);
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | b | ALL | usermuralentry_id,usermuralentry_user_bcd7114e,usermuralentry_user_6b192ca7 | NULL | NULL | NULL | 147188 | |
| 1 | SIMPLE | a | ALL | PRIMARY | NULL | NULL | NULL | 1371289 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
2 rows in set (0.00 sec)
but it is taking A LOT of time...
Some tips to optimize? Can the table schema be better in my application?

Use this query
SELECT *
FROM usermuralentry AS a
left join usermuralentry_user AS b
on b.usermuralentry_id = a.id
WHERE a.user_src_id IN(1, 2)
OR (a.id = b.usermuralentry_id
AND b.userinfo_id IN(1, 2));
And for some tips here are
You are using two tables in from clause which is a cartision product and will take a lot of time as well as undesired results. Always use joins in this situation.

I think your join isn't properly formed, and you need to change the query to use UNION. The OR condition in the where clause is killing performance as well:
SELECT *
FROM usermuralentry AS a
JOIN usermuralentry_user AS b ON a.id = b.usermuralentry_id /* use explicit JOIN! */
WHERE a.user_src_id IN (1 , 2)
UNION
SELECT *
FROM usermuralentry AS a
JOIN usermuralentry_user AS b ON a.id = b.usermuralentry_id
WHERE b.usermuralentry_id IN ( 1, 2 )
You also need an index: ALTER TABLE usermuralentry_user ADD INDEX (usermuralentry_id)

Related

Optimizing join on derived table - EXPLAIN different on local and server

I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_number,
p.street_name,
p.site_address_city_state,
p.number_of_units,
p.number_of_stories,
p.bedrooms,
p.bathrooms,
p.lot_area_sqft,
p.cost_per_sq_ft,
p.year_built,
p.sales_date,
p.sales_price,
p.id
FROM (
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
JOIN property AS p
ON p.parcel_number = open_cases.APN
LIMIT 0, 1000;
mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query | 21 | Copying to tmp table | SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query | 0 | NULL | show processlist |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:
Explain (local):
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| 1 | PRIMARY | p | NULL | ALL | NULL | NULL | NULL | NULL | 40319 | 100.00 | Using temporary |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | lead_housing.p.parcel_number | 40 | 100.00 | NULL |
| 2 | DERIVED | pi | NULL | ALL | NULL | NULL | NULL | NULL | 1623978 | 100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
Explain (server):
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| 1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 41369 | Using temporary |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 122948 | Using where; Distinct; Using join buffer |
| 2 | DERIVED | pi | ALL | NULL | NULL | NULL | NULL | 1718586 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
Schemas:
mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| lblCaseNo | int(11) | NO | MUL | NULL | |
| APN | bigint(10) | NO | MUL | NULL | |
| date | varchar(50) | NO | | NULL | |
| status | varchar(500) | NO | | NULL | |
| property_case_detail_id | int(11) | YES | MUL | NULL | |
| case_type_id | int(11) | YES | MUL | NULL | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status | tinyint(1) | YES | | 1 | |
| created_date | datetime | NO | | NULL | |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)
mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| parcel_number | bigint(10) | NO | | 0 | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date | datetime | NO | | NULL | |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
Variables that might be relevant:
tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592
Any ideas on how to optimize this, and any idea why the explains are so different?
Since this is where the Optimizers are quite different, let's try to optimize
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
Give this a try:
SELECT ...
FROM property AS p
WHERE NOT EXISTS ( SELECT 1 FROM property_inspection
WHERE status = 'Resolved Date'
AND p.parcel_number = APN )
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
or...
SELECT ...
FROM property AS p
LEFT JOIN property_inspection AS pi ON p.parcel_number = pi.APN
WHERE pi.status = 'Resolved Date'
AND pi.APN IS NULL
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
Index:
property_inspection: INDEX(status, parcel_number) -- in either order
MySQL 5.5 and 5.7 are quite different and the later has better optimizer so there is no surprise that explain plans are different.
You'd better provide SHOW CREATE TABLE property; and SHOW CREATE TABLE property_inspection; outputs as it will show indexes that are on your tables.
Your sub-query is the issue.
- Server tries to process 1.6M rows with no index and grouping everything.
- Having is quite expensive operation so you'd better avoid it, expecially in sub-queries.
- Grouping in this case is bad idea. You do not need the aggregation/counting. You need to check if the 'Resolved Date' status is just exists
Based on the information provided I'd recommend:
- Alter table property_inspection to reduce length of status column.
- Add index on the column. Use covering index (APN, property_case_detail_id, status) if possible (in this columns order).
- Change query to something like this:
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
...
p.id
FROM
property_inspection AS `pi1`
INNER JOIN property AS p ON (
p.parcel_number = `pi1`.APN
)
LEFT JOIN (
SELECT
`pi2`.property_case_detail_id
, `pi2`. APN
FROM
property_inspection AS `pi2`
WHERE
`status` = 'Resolved Date'
) AS exclude ON (
exclude.APN = `pi1`.APN
AND exclude.property_case_detail_id = `pi1`.property_case_detail_id
)
WHERE
exclude.APN IS NULL
LIMIT
0, 1000;

mysql INNER_JOIN subquery

I have inherited a MySQL database, that has a table as follows:
mysql> describe stock_groups;
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| group | varchar(5) | YES | | NULL | |
| name | varchar(255) | YES | | NULL | |
| parent | varchar(5) | YES | | NULL | |
| order | int(11) | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
When I run mysql> select * from stock_groups wheregroup='D2';
I get:
mysql> select * from stock_groups where `group`='D2';
+----+-------+------+--------+-------+
| id | group | name | parent | order |
+----+-------+------+--------+-------+
| 79 | D2 | MENS | D | 51 |
+----+-------+------+--------+-------+
1 row in set (0.00 sec)
and also i have a table:
mysql> describe stock_groups_styles_map;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| group | varchar(5) | NO | | NULL | |
| style | varchar(25) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)
when I run:
mysql> select `group` from stock_groups_styles_map where style='N26';
+-------+
| group |
+-------+
| B1 |
| B11 |
| D2 |
| V2 |
+-------+
4 rows in set (0.00 sec)
how do i get the stock_groups.name ?
Join the tables, and select only the data you need. If you need unique rows, use the distinct keyword:
select -- If you need unique names, use "select distinct" instead of "select"
sg.name
from
stock_groups_styles_map as sgs
inner join stock_groups as sg on sgs.group = sg.group
where
sgs.style = 'N26'
You could also solve this using subqueries, but that would be rather inneficient in this case.
Something important
You should add the appropriate indexes to your tables. It will improve the performance of your database.
You can use inner join on group column
SELECT ss.group, sg.name from
stock_groups sg inner join stock_groups_styles_map ss on ss.group = sg.group
where ss.style='N26'
SELECT stock_groups.name FROM stock_groups_styles_map, stock_groups WHERE stock_groups_styles_map.style ='N26';
worked for me.

More efficient sql to find records not in join table?

I have:
_sms_users_
id
_join_smsuser_campaigns_
sms_user_id
I'd like to pull out the sms_users that don't have a record in the join_smsuser_campaigns table (via the join_smsuser_campaigns.sms_user_id = sms_users.id relationships)
I have the sql:
select * from sms_users where id not in (select sms_user_id from join_smsuser_campaigns);
edit:
here's the explain select results:
mysql> explain select u.* from sms_users u left join join_smsuser_campaigns c on u.id = c.sms_user_id where c.sms_user_id is null;
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
| 1 | SIMPLE | u | ALL | NULL | NULL | NULL | NULL | 42303 | |
| 1 | SIMPLE | c | index | NULL | index_join_smsuser_campaigns_on_campaign_id_and_sms_user_id | 8 | NULL | 30722 | Using where; Using index; Not exists |
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
mysql> describe sms_users;
+------------------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
mysql> describe join_smsuser_campaigns;
+---------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+------------------+------+-----+---------+-------+
| sms_user_id | int(11) | NO | | NULL | |
Looks like the problm is that I don't have an index on sms_user_id on the join?
This takes about 5 minutes to run and I see that there is one record. Is there a more efficient way to do this via a join? My sql skills are pretty basic.
Queries can get slow when having many items in the IN clause. Use a left join instead
select u.*
from sms_users u
left join join_smsuser_campaigns c on u.id = c.sms_user_id
where c.sms_user_id is null
See this great join explanation

query doesn't use index when using IN subquery

You will notice the primary query is NOT using an index on school_id, when there is an index. If I remove the subquery and use a hardcoded list, it will use an index.
mysql> explain SELECT year, race, CONCAT(percent,'%') as percent
-> FROM school_data_race_ethnicity as school_data_race_ethnicity_outer
-> WHERE school_id IN(
-> SELECT field_school_id_value
-> FROM field_data_field_school_id
-> WHERE entity_id IN (SELECT entity_id
-> FROM field_data_field_district
-> WHERE field_district_nid =
-> (SELECT entity_id FROM field_data_field_district_id
-> WHERE `field_district_id_value` = 26130106 LIMIT 1))
-> ) ORDER BY year DESC, race;
+----+--------------------+----------------------------------+----------------+------------------------------+-----------+---------+------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------------------------+----------------+------------------------------+-----------+---------+------+-------+-----------------------------+
| 1 | PRIMARY | school_data_race_ethnicity_outer | ALL | NULL | NULL | NULL | NULL | 97116 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | field_data_field_school_id | ALL | NULL | NULL | NULL | NULL | 5325 | Using where |
| 3 | DEPENDENT SUBQUERY | field_data_field_district | index_subquery | entity_id,field_district_nid | entity_id | 4 | func | 1 | Using where |
| 4 | SUBQUERY | field_data_field_district_id | ALL | NULL | NULL | NULL | NULL | 685 | Using where |
+----+--------------------+----------------------------------+----------------+------------------------------+-----------+---------+------+-------+-----------------------------+
4 rows in set (0.00 sec)
mysql> describe school_data_race_ethnicity
-> ;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| school_id | varchar(255) | NO | MUL | NULL | |
| year | int(11) | NO | MUL | NULL | |
| race | varchar(255) | NO | | NULL | |
| percent | decimal(5,2) | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql>
use INNER JOIN instead of subquery with IN clause:
explain SELECT year, race, CONCAT(percent,'%') as percent
FROM school_data_race_ethnicity a
INNER JOIN(
SELECT field_school_id_value
FROM field_data_field_school_id b
INNER JOIN (SELECT entity_id
FROM field_data_field_district
WHERE field_district_nid =
(SELECT entity_id FROM field_data_field_district_id
WHERE `field_district_id_value` = 26130106 LIMIT 1)) c
ON b. entity_id = c.entity_id
) d
ON a.school_id = d.field_school_id_value
ORDER BY year DESC, race;
The server probably is skipping indexed seek because it's not able to figure out how many items will actually be in the list. The thinking is that since it doesn't have a good estimate of the number of items in the subquery, it's going to choose the most conservative approach and go with a scan (probably).
I'd suggest rewriting the query using JOINs instead:
SELECT DISTINCT o.year, o.race, CONCAT(o.percent,'%') as percent
FROM school_data_race_ethnicity as o
INNER JOIN field_data_field_school_id s
ON s.field_school_id_value = o.school_id
INNER JOIN field_data_field_district d
ON d.entity_id = s.entity_id
WHERE field_district_nid = (
SELECT entity_id
FROM field_data_field_district_id
WHERE `field_district_id_value` = 26130106 LIMIT 1))
ORDER BY o.year DESC, o.race;

MYSQL outer join, doesn't return non-matching rows

I have these two tables in my db
describe external_review_sources;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| ersID | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | UNI | NULL | |
| logo | varchar(255) | NO | | NULL | |
+-------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
And
describe listing_external_review_source_rel;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| lersrID | int(11) | NO | PRI | NULL | auto_increment |
| bid | int(10) | NO | | NULL | |
| url | varchar(255) | NO | | NULL | |
| ersID | int(11) | YES | | NULL | |
| active | int(10) | NO | | NULL | |
| order | int(10) | NO | | NULL | |
+---------+--------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
I query these tables this way:
SELECT *
FROM
listing_external_review_source_rel
RIGHT JOIN
external_review_sources USING(ersID)
where bid=902028 or bid IS NULL;
+-------+---------------+------+---------+--------+-------+--------+-------+
| ersID | name | logo | lersrID | bid | url | active | order |
+-------+---------------+------+---------+--------+-------+--------+-------+
| 1 | G1 | a | 17 | 902028 | url11 | 1 | 0 |
| 2 | D1 | b | 18 | 902028 | url22 | 0 | 0 |
+-------+---------------+------+---------+--------+-------+--------+-------+
2 rows in set (0.00 sec)
As you can see results are showing up for bid=902028, how ever for a bid such as 866696 that does NOT exist in listing_external_review_source_rel, the results are empty
SELECT *
FROM listing_external_review_source_rel
RIGHT JOIN external_review_sources USING(ersID)
where bid=866696 or bid IS NULL;
Empty set (0.00 sec)
I expect the results to be this:
+-------+---------------+------+---------+--------+-------+--------+-------+
| ersID | name | logo | lersrID | bid | url | active | order |
+-------+---------------+------+---------+--------+-------+--------+-------+
| 1 | G1 | NULL | NULL| NULL | NULL | NULL | NULL |
| 2 | D1 | NULL | NULL| NULL | NULL | NULL | NULL |
+-------+---------------+------+---------+--------+-------+--------+-------+
2 rows in set (0.00 sec)
That's what I have used the "or bid IS NULL" condition.
What am I doing wrong and what query would give me this result? I basically am interested in having nonmatching rows in my results as well.
Most people use LEFT JOIN so I'll rewrite it to be more standard:
SELECT *
FROM external_review_sources a LEFT JOIN
listing_external_review_source_rel b ON a.ersID=b.ersID AND bid=866696;
Remember that an outter join returns all rows where the ON condition matches, and NULL where they don't. In this case your match condition is more than just the ersID
Try this
SELECT * FROM
external_review_sources e
LEFT JOIN
listing_external_review_source_rel r ON e.ersID = r.ersID AND r.bid = 866696
WHERE
r.bid IS NULL;
When filtering on "outer tables", the filter needs to be a derived table or in the JOIN because you want to filter before the WHERE (logically). Also, a best practice is to use LEFT JOIN for clarity.
With a derived table
SELECT * FROM
external_review_sources e
LEFT JOIN
(
SELECT *
FROM listing_external_review_source_rel
WHERE bid = 866696
) r USING (ersID)
WHERE
r.bid IS NULL;
Try it this way, moving the test of the bid column out of the WHERE clause and into the JOIN:
SELECT *
FROM listing_external_review_source_rel ler
RIGHT JOIN external_review_sources ers
ON ler.ersID = ers.ersID
AND ler.bid=866696;