mysql explain slow where on left joined table - mysql

Playing with a mysql and thinking how to solve one thing in the future. I want to retrieve statuses which are posted by my friends (specific user ids) or are posted inside of the group I follow.
CREATE TABLE `status` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`status` text COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_F23501207E3C61F9` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1567559 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `group_status` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`group_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_F23501207E3C61F9` (`group_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I fed both tables with 1M rows.
The query I am running:
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
OR gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
ORDER BY s.id DESC
LIMIT 15
The result:
Question one:
Shouldn't be the extra role like: "using index" instead of "where" ?
Question two:
Why is the response time so slow? 2,3s
Edit after Tim's answer:
The filesort behaviour I guess is normal when using union no?
Why there is 'using where' in the second row of explain? If in the third is 'using where, using index' ?
In case of how many returned rows from selects you think this would get slow?
The union select seems to be super fast but there is only few rows returning each select currently. I will try to select more rows in each select.

Where you have an "OR" on different columns, mysql may use none of your indexes.
Usually we can solve the problem using "UNION" two separate queries with each matching one of the criteria.
SELECT id, status, group_id FROM
(
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
UNION
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
) t
ORDER BY id DESC
LIMIT 15
However, in your case, this may NOT help if either query returns large number of records.
Your status column is defined as text, which may cause the file sort. You can check it to a long varchar to see if the filesort goes way. Or try this to avoid worse case scenario:
SELECT ss.id, group_id, ss.status
FROM (
SELECT id, group_id FROM
(
SELECT s.id, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
UNION
SELECT s.id, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
) t
ORDER BY id DESC
LIMIT 15
) f
JOIN status ss
ON f.id =ss.id
ORDER BY ss.id

Related

Why is this query really slow with 70k+ rows?

First of all, this is my table structure:
CREATE TABLE IF NOT EXISTS `site_forum_comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`forum_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`data` int(11) NOT NULL,
`comment` longtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
Before importing my backup, it had like 10-15 rows and I made a ranking system based on number of comments and this query was working flawlessly:
SELECT u.id, u.username, COUNT(f.id) AS rank
FROM site_users AS u
LEFT JOIN site_forum_comments AS f ON (f.user_id = u.id)
GROUP BY u.id
ORDER BY rank DESC
LIMIT :l
But now, with more than 70k rows inserted, the script won't even load and just crashes the server.
What have I possibly done wrong? Is this problem about the query specifically or is it the table structure?
Thanks in advance, cheers!
This is your query:
SELECT u.id, u.username, COUNT(f.id) AS rank
FROM site_users u LEFT JOIN
site_forum_comments f
ON f.user_id = u.id
GROUP BY u.id
ORDER BY rank DESC
LIMIT :l
Because you are choosing the highest ranked user, you can probably use an inner join rather than an outer join. In any case, this version doesn't have a great many optimization opportunities. But, you need an index on site_forum_comments(user_id, id).
You might get better performance with the same index and a correlated subquery:
SELECT u.id, u.username,
(SELECT COUNT(*)
FROM site_forum_comments f
WHERE f.user_id = u.id
) as rank
FROM site_users u
ORDER BY rank DESC
LIMIT :l;
You are currently joining all users to their comments without an index on the user_id column thats slow.
The following query will select the highest user first and only join that one user with the highest rank with the site_users table (using the index over site_users.id). So it should be faster.
SELECT site_users.id, site_users.username, a.rank
FROM (
SELECT user_id, COUNT(*) as rank
FROM site_forum_comments
GROUP BY user_id
ORDER BY rank DESC
LIMIT 1
) AS a
LEFT JOIN site_users ON a.user_id = site_users.id
note that with this query you won't get a result if the rank is 0

Summary Join query taking too long

I am trying to get the max record of each 'telephone_number' where process_status='0' and for that I am using the below query.
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (SELECT MAX( id ) AS maxid FROM SPRINTABLE_DATA GROUP BY telephone_number)dt
ON t.id = dt.maxid WHERE process_status = '0'
AND RESET_FLAG = '0'
ORDER BY id DESC limit 0,700
The above query is giving me the desired result but the problem is this is too slow..
My table has about 20 million rows and this query is taking about 15-20 mins at times.
What can be done to improve this?
This is the structure:.
CREATE TABLE `SPRINTABLE_DATA` (
`ID` bigint(11) NOT NULL AUTO_INCREMENT,
`CUSTID` int(11) DEFAULT NULL,
`telephone_number` varchar(20) DEFAULT NULL,
`TOTAL_USAGE` int(11) DEFAULT NULL,
`PROCESS_STATUS` tinyint(4) DEFAULT '0',
`RESET_FLAG` tinyint(4) DEFAULT '0',
`RESET_REASON` varchar(10) DEFAULT NULL,
`PLAN_ID` varchar(20) DEFAULT NULL,
`ACCOUNT_STATUS` varchar(30) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `telephone_number` (`telephone_number`),
KEY `CALL_CUST` (`CALL_START_TIME`,`CUSTID`),
KEY `telephone_number1` (`telephone_number `,`PROCESS_STATUS`,`SOC_ADDED`),
KEY `CURRENT_USAGE` (`CURRENT_USAGE`),
KEY `TOTAL_USAGE` (`TOTAL_USAGE`)
) ENGINE=InnoDB AUTO_INCREMENT=36392272 DEFAULT CHARSET=latin1
It seems that you are looking for the 700 most recently called numbers. (If that is not correct, please edit your question.
Your query follows a good practice for retrieving the latest log row for each item (telephone number in your case), as follows, in your subquery.
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
To optimize the performance of this subquery, you need a compound index on two fields: (telephone_number, id), in that order. If you don't have that index, add it in. This is to allow a so-called loose index scan, an extraordinarily efficient way of satisfying a query.
Secondly, you're looking for (I presume) a small subset of your data. Presumably you have plenty more than 700 distinct telephone_number values. This means you're sorting a lot of data with ORDER BY only to discard it with LIMIT. So, let's do a deferred join, sorting a minimal number of columns, and then retrieving all the information you need.
Here's how to get the ID values of the 700 rows you need
SELECT q.ID /* get our 700 records */
FROM SPRINTABLE_DATA q
JOIN (
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
) r ON q.id = r.id
WHERE q.process_status = '0'
AND q.RESET_FLAG = '0'
ORDER BY q.ID DESC
LIMIT 0,700
This pulls out 700 id numbers. You need to do some experimenting with indexes to find out what helps the most to optimize this. It's possible that an index on
process_status, RESET_FLAG, id
will help. It's also possible that changing the order of columns in the index will help, like this:
id, process_status, RESET_FLAG
Try them both.
Finally, we'll use this as a subquery to carry out the join (the so-called deferred join) to fetch the actual detail records. This technique gets rid of the need for sorting all that data.
SELECT t.ID, t.CUSTID, t.telephone_number, t.TOTAL_USAGE, t.ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (
SELECT q.ID /* get our 700 records */
FROM SPRINTABLE_DATA q
JOIN (
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
) r ON q.id = r.id
WHERE q.process_status = '0'
AND q.RESET_FLAG = '0'
ORDER BY q.ID DESC
LIMIT 0,700
) s ON t.ID = s.ID
ORDER BY t.ID DESC
This will yield the same results, but will be faster.
Now, finally, if it's possible to select the latest calls from the 700 numbers that meet your criteria, you can simplify this query a lot. This will change your result set in a subtle way, though. In that case your call-selection subquery will look like this:
SELECT MAX( id ) AS id /* 700 matching numbers */
FROM SPRINTABLE_DATA
WHERE process_status = '0'
AND reset_flag = '0'
GROUP BY telephone_number
ORDER BY ID desc
LIMIT 0,700
With a compound covering index on
reset_flag, process_status, telephone_number, ID
this query will be quite fast. Your final query in this case would be
SELECT t.ID, t.CUSTID, t.telephone_number, t.TOTAL_USAGE, t.ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (
SELECT MAX( id ) AS id /* 700 matching numbers */
FROM SPRINTABLE_DATA
WHERE process_status = '0'
AND reset_flag = '0'
GROUP BY telephone_number
ORDER BY ID desc
LIMIT 0,700
) s ON t.ID = s.ID
ORDER BY t.ID DESC
Made a slight modification to your query
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (SELECT telephone_number,MAX( id ) AS maxid FROM SPRINTABLE_DATA GROUP BY telephone_number)dt
ON t.id = dt.maxid WHERE process_status = '0'
AND RESET_FLAG = '0'
ORDER BY id DESC limit 0,700
Add these indexes if are not there already
ALTER TABLE SPRINTABLE_DATA ADD KEY (telephone_number,id)
ALTER TABLE SPRINTABLE_DATA ADD KEY (process_status,reset_flag,id)
Another option which is probably the fastest is to use a correlated subquery
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t WHERE EXISTS
(SELECT MAX( id ) FROM SPRINTABLE_DATA tt WHERE t.id=tt.id AND tt.process_status = '0'
AND tt.RESET_FLAG = '0' )
ORDER BY id DESC limit 0,700
For this you need
ALTER TABLE SPRINTABLE_DATA ADD KEY (id,process_status,reset_flag)

GROUP BY with MAX date field - erratic results

Have a table containing form data. Each row contains a section_id and field_id. There are 50 distinct fields for each section. As users update an existing field, a new row is inserted with an updated date_modified. This keeps a rolling archive of changes.
The problem is that I'm getting erratic results when pulling the most recent set of fields to display on a page.
I've narrowed down the problem to a couple of fields, and have recreated a portion of the table in question on SQLFiddle.
Schema:
CREATE TABLE IF NOT EXISTS `cTable` (
`section_id` int(5) NOT NULL,
`field_id` int(5) DEFAULT NULL,
`content` text,
`user_id` int(11) NOT NULL,
`date_modified` datetime NOT NULL,
KEY `section_id` (`section_id`),
KEY `field_id` (`field_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This query shows all previously edited rows for field_id 39. There are five rows returned:
SELECT cT.*
FROM cTable cT
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here's what I'm trying to do to pull the most recent row for field_id 39. No rows returned:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Record Count: 0;
If I try the same query on a different field_id, say 54, I get the correct result:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=54;
Record Count: 1;
Why would same query work on one field_id, but not the other?
In your subquery from where you are getting maxima you need to GROUP BY section_id,field_id using just GROUP BY field_id is skipping the section id, on which you are applying filter
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT section_id,field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY section_id,field_id
) AS max
ON(max.field_id =cT.field_id
AND max.date_modified=cT.date_modified
AND max.section_id=cT.section_id
)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
See Fiddle Demo
You are looking for the max(date_modified) per field_id. But you should look for the max(date_modified) per field_id where the section_id is 123. Otherwise you may find a date for which you find no match later.
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable
WHERE section_id = 123
GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here is the SQL fiddle: http://www.sqlfiddle.com/#!2/0cefd8/19.

Optimising a working MYSQL statement

Background
I have a table of "users", a table of "content", and a table of "content_likes". When a user "likes" an item of content, a relation is added to "content_likes". Simple.
Now what I am trying to do is order content based on the number of likes it has received. This is relatively easy, however, I only want to retrieve 10 items at a time and then with a lazy load I am retrieving the next 10 items and so forth. If the select was ordered by time it would be easy to do the offset in the select statement, however, due to the ordering by number of "likes" I need another column I can offset by. So I've added a "rank" column to the result set, then on the next call of 10 items I can offset by this.
This query WORKS and does what I need to do. However, I am concerned about performance. Could anyone advise on optimising this query. Or even possibly a better way of doing it.
DB SCHEMA
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `content` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`owner_id` int(11) NOT NULL,
`added` int(11) NOT NULL,
`deleted` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `content_likes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`added` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
*columns omitted for simplicity
Breakdown of query
group content_id in content_likes relations table, and order by likes desc
add a column "rank" (or row number) to result set and order by this
join "content" table so that any content with a deleted flag can be ommited
only return results where "rank" (or row number) is greater than variable
limit result set to 10
THE MYSQL
SELECT
results.content_id, results.likes, results.rank
FROM
(
SELECT
t1.content_id, t1.likes, #rn:=#rn+1 AS rank
FROM
(
SELECT
cl.content_id,
COUNT(cl.content_id) AS likes
FROM
content_likes cl
GROUP BY
cl.content_id
ORDER BY
likes DESC,
added DESC
) t1, (SELECT #rn:=0) t2
ORDER BY
rank ASC
) results
LEFT JOIN
content c
ON
(c.id = results.content_id)
WHERE
c.deleted <> 1
AND
results.rank > :lastRank
LIMIT
10
MYSQL ALTERNATIVE
SELECT
*
FROM
(
SELECT
results.*, #rn:=#rn+1 AS rank
FROM
(
SELECT
c.id, cl.likes
FROM
content c
INNER JOIN
(SELECT content_id, COUNT(content_id) AS likes FROM content_likes GROUP BY content_id ORDER BY likes DESC, added DESC) cl
ON
c.id = cl.content_id
WHERE
c.deleted <> 1
AND
c.added > :timeago
LIMIT
100
) results, (SELECT #rn:=0) t2
) final
WHERE
final.rank > :lastRank
LIMIT
5
The "Alternative" mysql query works as I would like it too also. Content is ordered by number of likes by users and I can offset by inserting the last row number. What I have attempted to do here is limit the result sets so if and when the tables get large performance isn't hindered too badly. In this example only content from within a timespan, and limit to 100 will be returned. Then I can offset by the row number (lazy load/pagination)
Any help or advice always appreciated. I am relatively a newbie to mysql so be kind :)
You can eliminate the subquery:
SELECT results.content_id, results.likes, results.rank
FROM (SELECT cl.content_id, COUNT(cl.content_id) AS likes, #rn:=#rn+1 AS rank
FROMc content_likes cl cross join
(SELECT #rn:=0) t2
GROUP BY cl.content_id
ORDER BY likes DESC, added DESC
) results LEFT JOIN
content c
ON c.id = results.content_id
WHERE c.deleted <> 1 AND
results.rank > :lastRank
LIMIT 10;
However, I don't think that will have an appreciable affect on performance. What you should probably do is store the last number number of likes and "added" value and use these to filter the data. The query needs to be a little fixed up, because added is not unambiguously defined in the order by clause:
SELECT results.content_id, results.likes, results.rank, results.added
FROM (SELECT cl.content_id, COUNT(cl.content_id) AS likes, MAX(added) as added, #rn:=#rn+1 AS rank
FROMc content_likes cl cross join
(SELECT #rn := :lastRank) t2
WHERE likes < :likes or
likes = :likes and added < :added
GROUP BY cl.content_id
ORDER BY likes DESC, added DESC
) results LEFT JOIN
content c
ON c.id = results.content_id
WHERE c.deleted <> 1 AND
results.rank > :lastRank
LIMIT 10;
This will at least reduce the number of rows that need to be sorted.

MySQL Query with some type of join? Not sure

If anyone could recommend a good book for learning mySQL as well, that would be great :).
I have two tables, tags, codes_tags
CREATE TABLE `tags` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(40) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=190 DEFAULT CHARSET=utf8
CREATE TABLE `codes_tags` (
`code_id` int(11) unsigned NOT NULL,
`tag_id` int(11) unsigned NOT NULL,
KEY `sourcecode_id` (`code_id`),
KEY `tag_id` (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
What I am trying to do is select the name from 'tags', and how many of that tag_id there are in 'codes_tags', and order them by that count. If there is no records in codes_tags for that tag_id, 'count' should be equal to 0 or NULL (preferably 0).
This is the closest I have come so far:
SELECT tags.name, COUNT( codes_tags.tag_id ) AS count
FROM tags
LEFT JOIN codes_tags ON tags.id = codes_tags.tag_id
GROUP BY tag_id
ORDER BY count DESC
LIMIT 0 , 30
It seems to do what I am wanting, however it is only returning four rows when it should return 30.
What am I doing wrong here?
Thanks.
I've tested this out on MySQL with some dummy data and the query appears to return more than 4 rows for me. I ran your create table statements and then populated them with the following statements:
insert into tags (name) values ('java'), ('mysql'), ('php'), ('ruby'), ('.net'), ('python');
insert into codes_tags (code_id, tag_id) values (1,194), (2,194), (3,194), (1,191), (2,191), (3,191), (4,191), (5,191), (1,192), (1,195), (1,193);
When I run your query on that data, it returns 6 rows. In order to help further debug this, can you post the results of the following 2 queries:
select count(*) from tags;
select * from tags limit 10;
Also, in order to make sure you have proper data integrity, can you add the following foreign key and see if it succeeds?
alter table codes_tags add foreign key codes_tags_tag_id_key(tag_id) references tags(id);
I think if you change your COUNT(codes_tags.tag_id) to COUNT(*) in the SELECT, that NULLs will also be included. (If it's nulls or 0 counts that you're missing. Otherwise, the query looks fine).
EDIT: On second thought, I missed the LEFT JOIN. That would mean you want all of the tags even if they're not related to something in the codes_tags table. Is that what you want?
I would probably do something like the following:
SELECT tags.name, COUNT(*) AS count
FROM tags
INNER JOIN codes_tags ON tags.id = codes_tags.tag_id
GROUP BY tags.id
ORDER BY count(*) DESC
It can be inferred from the items not in the list which tags are not also included in codes_tags. However, if you wanted to explicitly do that as well:
SELECT tags.name, COUNT(*) AS count
FROM tags
INNER JOIN codes_tags ON tags.id = codes_tags.tag_id
GROUP BY tags.id
UNION
SELECT tags.name, '0'
from tags
where tags.name not in
(SELECT tags.name
FROM tags
INNER JOIN codes_tags ON tags.id = codes_tags.tag_id)
ORDER BY count(*) DESC
(I don't have access to a SQL box at the moment, so take the queries with a grain of salt; they're untested.)
Change the LEFT JOIN to LEFT OUTER JOIN