I have a table of parents, and a table of children
My parents table is as follows:
+----------------+-------------------------------+
| parent_id | parent_name |
+----------------+-------------------------------+
| 10792 | Katy |
| 7562 | Alex |
| 13330 | Drew |
| 9153 | Brian |
+----------------+-------------------------------+
My children's table is:
+----------+-------------------------------+-----------+-----+
| child_id | child_name | parent_id | age |
+----------+-------------------------------+-----------+-----+
| 1 | Matthew | 10792 | 4 |
| 2 | Donald | 9153 | 5 |
| 3 | Steven | 10792 | 9 |
| 4 | John | 7562 | 6 |
+----------+-------------------------------+-----------+-----+
When I use a sub-select such as:
SELECT parent_name, (SELECT SUM(age) FROM children WHERE parent_id = parents.parent_id) AS combined_age FROM parents;
My issue is that when I execute this query (parents are 13,000 records, children are 21,000 records) an index of parent_id in children doesn't get used, as shown in the explain plan. I get:
+----+--------------------+--------------------------+--------+---------------+------+---------+-------+-------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------------+--------+---------------+------+---------+-------+-------+-------------------------------------------------+
| 1 | PRIMARY | parents | ALL | NULL | NULL | NULL | NULL | 13548 | NULL
| 2 | DEPENDENT SUBQUERY | children | ALL | PARENTS,AGE | NULL | NULL | NULL | 21654 | Range checked for each record (index map: 0x22) |
+----+--------------------+--------------------------+--------+---------------+------+---------+-------+-------+-------------------------------------------------+
This query is taking over 3 minutes to run, and I can't seem to get the subquery to use an index to query where the children belong to the parent. I tried USE INDEX and FORCE INDEX, as well as USE KEY AND FORCE KEY. Any ideas?
So It turns out this happens when the two ID fields are not the same field type INT(11) VS. VARCHAR(45). In the application, one of the table's ID fields was created strangely. Updating the field type solved the SQL optimizer.
Thanks!
The index should be used. The optimal index would be children(parent_id, age).
You can try re-writing the query as a join:
select p.parent_name, sum(c.age)
from parents p left join
children c
on p.parent_id = c.parent_id
group by p.parent_name;
Related
When running the following query:
SELECT productid
FROM product
WHERE productid=ROUND(RAND()*(SELECT MAX(productid) FROM product));
The result should be 0 or 1 results (0 due to data gaps, 1 if a record is found), however it results in multiple results a good number of times (very easy to reproduce, 90% of queries have more than 1 result).
Sample output:
+-----------+
| productid |
+-----------+
| 11701 |
| 20602 |
| 22029 |
| 24994 |
+-----------+
(Number of records in DB is about 30k).
Running a single SELECT RAND() always results in a single result.
Explain:
explain SELECT productid FROM product WHERE productid=ROUND(RAND()*(SELECT MAX(productid) FROM product));
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+
| 1 | PRIMARY | product | NULL | index | NULL | idx_prod_url | 2003 | NULL | 31197 | 10.00 | Using where; Using index |
| 2 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+
Who can explain this behavior?
Follow up:
Following Martin's remark a rewrite of the query in:
SELECT productid FROM product
WHERE productid=(SELECT ROUND(RAND()*(SELECT MAX(productid) FROM product)));
Explain:
explain SELECT productid FROM product WHERE productid=(SELECT ROUND(RAND()*(SELECT MAX(productid) FROM product)));
+----+----------------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+----------------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+
| 1 | PRIMARY | product | NULL | index | NULL | idx_prod_url | 2003 | NULL | 31197 | 100.00 | Using where; Using index |
| 2 | UNCACHEABLE SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+----------------------+---------+------------+-------+---------------+--------------+---------+------+-------+----------+------------------------------+
However despite the changed plan, the behavior stays the same.
Follow up 2:
Using an INNER JOIN, the behavior disappears:
SELECT a.productid FROM product a
INNER JOIN (SELECT ROUND(RAND()*(SELECT MAX(productid))) as productid
FROM product) b ON a.productid=b.productid;
Explain:
explain SELECT a.productid FROM product a INNER JOIN (SELECT ROUND(RAND()*(SELECT MAX(productid))) as productid FROM product) b ON a.productid=b.productid;
+----+--------------------+------------+------------+--------+---------------+--------------+---------+-------+-------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+------------+------------+--------+---------------+--------------+---------+-------+-------+----------+----------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | a | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 2 | DERIVED | product | NULL | index | NULL | idx_prod_url | 2003 | NULL | 31197 | 100.00 | Using index |
| 3 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+--------------------+------------+------------+--------+---------------+--------------+---------+-------+-------+----------+----------------+
Try this:
SELECT productid
FROM product
ORDER BY rand() LIMIT 1;
See MySQL Select Random Records for other random select options.
Explanation:
When the query in the original post executes, Maria says, "I need to look at data from the table, product, so let me start with the first row." Then, "Let me check the WHERE clause to see if I should use this row." At that point, Maria calculates a random number and multiplies it by the largest productid, rounding it to an integer. She checks to see if that integer equals the productid from the first row. If it does not, then she forgets about that row. If it does, then she selects the productid from that row and holds onto it. Either way, she then moves on to the second row. She calculates a (new) random number, multiplies it by the largest productid, and checks whether it equals the productid from the second row. If it does not, she moves on. If it does, she adds that productid to her list of selected values. She does the same thing on the third row, etc. The final output is then a list of zero to MAX(productid) productid values, each one independently selected with a probability of 1/MAX(productid).
When the query from Follow up 2 executes, the left side of the join is just a table. For the right side of the join, Maria says, "I need to look at data from the table, product. Since there is no WHERE clause, I will look at the whole table. Oh my, I have an aggregate function, MAX(productid), in the SELECT clause. That gives me a single value from the whole table." She calculates a single random number and multiplies it by that value, creating a table, b, consisting of a single column, productid, with one row. That table is then inner joined with the table, product, to find every row with a matching productid, which would be exactly one row, and the productid from that row is selected.
Note that if all you are looking for is the productid and you do not need any other data from the selected row, then the right side of the join is all you should need.
SELECT ROUND(RAND()*(SELECT MAX(productid))) as productid from product;
Note also that ROUND is probably not the right function for what you are trying to do. This might be what you really want:
SELECT FLOOR(1+RAND()*(SELECT MAX(productid))) as productid from product;
When I add a second level of abstraction to a join, ie joining on a table that was join in the first place, the query time is be multiplied by 1000x
mysql> SELECT xxx.content.id, columns.stories.title, columns.stories.date
-> FROM xxx.content
-> LEFT JOIN columns.stories on xxx.content.contentable_id = columns.stories.id
-> WHERE columns.stories.title IS NOT NULL
-> AND xxx.content.contentable_type = 'PublisherStory';
yields results in 0.01 seconds
mysql> SELECT xxx.content.id, columns.photos.id as pid, columns.stories.title, columns.stories.date
-> FROM xxx.content
-> LEFT JOIN columns.stories on xxx.content.contentable_id = columns.stories.id
-> LEFT JOIN columns.photos on columns.stories.id = columns.photos.story_id
-> WHERE columns.stories.title IS NOT NULL
-> AND xxx.content.contentable_type = 'PublisherStory';
yields results in 14 seconds
this is being performed on tables with records in the 10ks to low 100ks of records
Is this normal or what could be causing such a slowdown?
first query plan:
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+------+-------------+
| 1 | SIMPLE | content | ALL | NULL | NULL | NULL | NULL | 7099 | Using where |
| 1 | SIMPLE | stories | eq_ref | PRIMARY | PRIMARY | 8 | xxx.content.contentable_id | 1 | Using where |
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+------+-------------+
Second Query
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+-------+-------------+
| 1 | SIMPLE | content | ALL | NULL | NULL | NULL | NULL | 7099 | Using where |
| 1 | SIMPLE | stories | eq_ref | PRIMARY | PRIMARY | 8 | xxx.content.contentable_id | 1 | Using where |
| 1 | SIMPLE | photos | ALL | NULL | NULL | NULL | NULL | 21239 | |
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+-------+-------------+
if joining columns.photos slows down so much, it could be because :
photos.story_id is not a foreign key from stories and there is no index on that column.
Not seing your table structure, I could no tell exactly but I suggest you to
verify that photos.story_id is a foreign key and if your mysql version does not
support foreign key ( pretty old) put an index on that column.
I would check indexes on contentable_type and stories.title also as they are
part of the where close.
I don't have a lot of experience yet with MySQL and with databases in general, though I'm going head on into the development of a large-scale web app anyway. The following is the search query for my app that allows users to search for other users. Now that the primary table for this query dev_Profile has about 14K rows, the query is considerably slow (about 5 secs when running a query that returns the largest set possible). I'm sure there are many optimization tweaks that could be made here, but would creating an index be the most fundamental first step to do here? I've been first trying to learn about indexes on my own, and how to make an index for a query with multiple joins, but I'm just not quite grasping it. I'm hoping that seeing things in the context of my actual query could be more educational.
Here's the basic query:
SELECT
dev_Profile.ID AS pid,
dev_Profile.Name AS username,
IF(TIMESTAMPDIFF(SECOND, st1.lastActivityTime, UTC_TIMESTAMP()) > 300 OR ISNULL(TIMESTAMPDIFF(SECOND, st1.lastActivityTime, UTC_TIMESTAMP())), 0, 1) AS online,
FLOOR(DATEDIFF(CURRENT_DATE, dev_Profile.DOB) / 365) AS age,
IF(dev_Profile.GenderID=1, 'M', 'F') AS sex,
IF(ISNULL(st2.Description), 0, st2.Description) AS relStatus,
st3.Name AS country,
IF(dev_Profile.RegionID > 0, st4.Name, 0) AS region,
IF(dev_Profile.CityID > 0, st5.Name, 0) AS city,
IF(ISNULL(st6.filename), 0, IF(st6.isApproved=1 AND st6.isDiscarded=0 AND st6.isModerated=1 AND st6.isRejected=0 AND isSizeAvatar=1, 1, 0)) AS hasPhoto,
IF(ISNULL(st6.filename), IF(dev_Profile.GenderID=1, 'http://www.mysite.com/lib/images/avatar-male-small.png', 'http://www.mysite.com/lib/images/avatar-female-small.png'), IF(st6.isApproved=1 AND st6.isDiscarded=0 AND st6.isModerated=1 AND st6.isRejected=0 AND isSizeAvatar=1, CONCAT('http://www.mysite.com/uploads/', st6.filename), IF(dev_Profile.GenderID=1, 'http://www.mysite.com/lib/images/avatar-male-small.png', 'http://www.mysite.com/lib/images/avatar-female-small.png'))) AS photo,
IF(ISNULL(dev_Profile.StatusMessage), IF(ISNULL(dev_Profile.AboutMe), IF(ISNULL(st7.AboutMyMatch), 0, st7.AboutMyMatch), dev_Profile.AboutMe), dev_Profile.StatusMessage) AS text
FROM
dev_Profile
LEFT JOIN dev_User AS st1 ON st1.ID = dev_Profile.UserID
LEFT JOIN dev_ProfileRelationshipStatus AS st2 ON st2.ID = dev_Profile.ProfileRelationshipStatusID
LEFT JOIN Country AS st3 ON st3.ID = dev_Profile.CountryID
LEFT JOIN Region AS st4 ON st4.ID = dev_Profile.RegionID
LEFT JOIN City AS st5 ON st5.ID = dev_Profile.CityID
LEFT JOIN dev_Photos AS st6 ON st6.ID = dev_Profile.PhotoAvatarID
LEFT JOIN dev_DesiredMatch AS st7 ON st7.ProfileID = dev_Profile.ID
WHERE
dev_Profile.ID != 11222 /* $_SESSION['ProfileID'] */
AND st1.EmailVerified = 'true'
AND st1.accountIsActive=1
ORDER BY st1.lastActivityTime DESC LIMIT 900;
The speed of this query (too slow, as you can see):
900 rows in set (5.20 sec)
The EXPLAIN for this query:
+----+-------------+-------------+--------+---------------+---------+---------+---------------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+---------------+---------+---------+---------------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | dev_Profile | range | PRIMARY | PRIMARY | 4 | NULL | 13503 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | st2 | eq_ref | PRIMARY | PRIMARY | 1 | syk.dev_Profile.ProfileRelationshipStatusID | 1 | |
| 1 | SIMPLE | st3 | eq_ref | PRIMARY | PRIMARY | 4 | syk.dev_Profile.CountryID | 1 | |
| 1 | SIMPLE | st4 | eq_ref | PRIMARY | PRIMARY | 4 | syk.dev_Profile.RegionID | 1 | |
| 1 | SIMPLE | st5 | eq_ref | PRIMARY | PRIMARY | 4 | syk.dev_Profile.CityID | 1 | |
| 1 | SIMPLE | st1 | eq_ref | PRIMARY | PRIMARY | 4 | syk.dev_Profile.UserID | 1 | Using where |
| 1 | SIMPLE | st6 | eq_ref | PRIMARY | PRIMARY | 4 | syk.dev_Profile.PhotoAvatarID | 1 | |
| 1 | SIMPLE | st7 | ALL | NULL | NULL | NULL | NULL | 442 | |
+----+-------------+-------------+--------+---------------+---------+---------+---------------------------------------------+-------+----------------------------------------------+
It's also possible that the query can have more WHERE and HAVING clauses, if a user's search contains additional criteria. The additional clauses are (set with example values):
AND dev_Profile.GenderID = 1
AND dev_Profile.CountryID=127
AND dev_Profile.RegionID=36
AND dev_Profile.CityID=601
HAVING (age >= 18 AND age <= 50)
AND online=1
AND hasPhoto=1
This is the EXPLAIN for the query using all possible WHERE and HAVING clauses:
+----+-------------+-------------+--------+---------------+---------+---------+---------------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+---------------+---------+---------+---------------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | dev_Profile | range | PRIMARY | PRIMARY | 4 | NULL | 13503 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | st2 | eq_ref | PRIMARY | PRIMARY | 1 | syk.dev_Profile.ProfileRelationshipStatusID | 1 | |
| 1 | SIMPLE | st3 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | st4 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | st5 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | st1 | eq_ref | PRIMARY | PRIMARY | 4 | syk.dev_Profile.UserID | 1 | Using where |
| 1 | SIMPLE | st6 | eq_ref | PRIMARY | PRIMARY | 4 | syk.dev_Profile.PhotoAvatarID | 1 | |
| 1 | SIMPLE | st7 | ALL | NULL | NULL | NULL | NULL | 442 | |
+----+-------------+-------------+--------+---------------+---------+---------+---------------------------------------------+-------+----------------------------------------------+
I'm not even sure if this is TMI or not enough.
Is an index the right step to take here? If so, could someone get me going in the right direction?
The right step is whatever speeds up your query!
With your original query, I would say that you end up doing a table scan on the dev_Profile table as there are no indexable conditions on it. With your modified query, it depends on the number of different values allowed in the column - if there are may duplicates then the index may not get used as it has to fetch the table anyway in order to complete the rest of the query.
I have read your plan correctly then you are joining all of your other tables on an indexed non-nullable column already (except for st7, which doesn't seem to be using an index for some reason). It therefore looks as if you should not be using left joins. This would then allow the use of an index on (EmailVerified, accountIsActive, lastActivityTime) on table st1.
One should use indexes that are relevant for frequent queries. An index just slightly degrades write-performance while immensely speeds searches. As a rule of thumb, objects own IDs should be indexed as PRIMARY key and it's a good idea to have an index on column-groups that appear always together in a query. I figure you should index GenderID, CountryID, RegionID, CityID, age, online and hasPhoto. You should provide the schema of at least dev_Profile if you think that the right indexes are not used.
Notice that country/region/city IDs might represent redundant information. Your design may be suboptimal.
Notice2: you're doing awfully lot of application logic in SELECT. SQL is not designed for these lots of IF-in-IF-in-IF clauses, and because of the URLs the query is returning much larger a table than if would if you just requested the relevant field (i.e. filename, genderID, and so on). There might be times when those exact interpreted values have to be returned by the query, by in general you are better off (in the aspects of speed and readability) to code these processing steps into your application code.
This is mysql explain plan for one of the query I am looking into.
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+
| 1 | SIMPLE | table2 | index | NULL | PRIMARY | 4 | NULL | 6 | |
| 1 | SIMPLE | table3 | ALL | NULL | NULL | NULL | NULL | 23 | |
| 1 | SIMPLE | table1 | ALL | NULL | NULL | NULL | NULL | 8 | |
| 1 | SIMPLE | table5 | index | NULL | PRIMARY | 4 | NULL | 1 | |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+
4 rows in set (0 sec)
What is the significance of the order of statements in this output ?
Does it mean that table5 is read before all others ?
The tables are listed in the output in the order that MySQL would read them while processing the query. You can read more about the Explain plan output here.
Additionally, the output tells me:
The optimizer saw the query as having four (4) SELECT statements within it. Being a "simple" select type, those queries are not using UNION or subqueries.
Two of those statements could use indexes (based on the type column), which were primary keys (based on the key column). The other two could not use any indexes.
I've recently noticed that a query I have is running quite slowly, at almost 1 second per query.
The query looks like this
SELECT eventdate.id,
eventdate.eid,
eventdate.date,
eventdate.time,
eventdate.title,
eventdate.address,
eventdate.rank,
eventdate.city,
eventdate.state,
eventdate.name,
source.link,
type,
eventdate.img
FROM source
RIGHT OUTER JOIN
(
SELECT event.id,
event.date,
users.name,
users.rank,
users.eid,
event.address,
event.city,
event.state,
event.lat,
event.`long`,
GROUP_CONCAT(types.type SEPARATOR ' | ') AS type
FROM event FORCE INDEX (latlong_idx)
JOIN users ON event.uid = users.id
JOIN types ON users.tid=types.id
WHERE `long` BETWEEN -74.36829174058 AND -73.64365405942
AND lat BETWEEN 40.35195025942 AND 41.07658794058
AND event.date >= '2009-10-15'
GROUP BY event.id, event.date
ORDER BY event.date, users.rank DESC
LIMIT 0, 20
)eventdate
ON eventdate.uid = source.uid
AND eventdate.date = source.date;
and the explain is
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 20 | |
| 1 | PRIMARY | source | ref | iddate_idx | iddate_idx | 7 | eventdate.id,eventdate.date | 156 | |
| 2 | DERIVED | event | ALL | latlong_idx | NULL | NULL | NULL | 19500 | Using temporary; Using filesort |
| 2 | DERIVED | types | ref | eid_idx | eid_idx | 4 | active.event.id | 10674 | Using index |
| 2 | DERIVED | users | eq_ref | id_idx | id_idx | 4 | active.types.id | 1 | Using where |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
I've tried using 'force index' on latlong, but that doesn't seem to speed things up at all.
Is it the derived table that is causing the slow responses? If so, is there a way to improve the performance of this?
--------EDIT-------------
I've attempted to improve the formatting to make it more readable, as well
I run the same query changing only the 'WHERE statement as
WHERE users.id = (
SELECT users.id
FROM users
WHERE uidname = 'frankt1'
ORDER BY users.approved DESC , users.rank DESC
LIMIT 1 )
AND date & gt ; = '2009-10-15'
GROUP BY date
ORDER BY date)
That query runs in 0.006 seconds
the explain looks like
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 42 | |
| 1 | PRIMARY | source | ref | iddate_idx | iddate_idx | 7 | eventdate.id,eventdate.date | 156 | |
| 2 | DERIVED | users | const | id_idx | id_idx | 4 | | 1 | |
| 2 | DERIVED | event | range | eiddate_idx | eiddate_idx | 7 | NULL | 24 | Using where |
| 2 | DERIVED | types | ref | eid_idx | eid_idx | 4 | active.event.bid | 3 | Using index |
| 3 | SUBQUERY | users | ALL | idname_idx | idname_idx | 767 | | 5 | Using filesort |
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
The only way to clean up that mammoth SQL statement is to go back to the drawing board and carefully work though your database design and requirements. As soon as you start joining 6 tables and using an inner select you should expect incredible execution times.
As a start, ensure that all your id fields are indexed, but better to ensure that your design is valid. I don't know where to START looking at your SQL - even after I reformatted it for you.
Note that 'using indexes' means you need to issue the correct instructions when you CREATE or ALTER the tables you are using. See for instance MySql 5.0 create indexes