MySql Join slow with SUM() of results - mysql

anyone know a more efficient way to execute this query?
SELECT SQL_CALC_FOUND_ROWS p.*, IFNULL(SUM(v.visits),0) AS visits,
FROM posts AS p
LEFT JOIN visits_day v ON v.post_id=p.post_id
GROUP BY post_id
ORDER BY post_id DESC LIMIT 20 OFFSET 0
The visits_day table has one record per day, per user, per post. With the growth of the table this query is extremely slow.
I cant add a column with the total visit count because I need to list the posts by more visits per day or per week, etc.
Does anyone know a beter solution to this?
Thanks
CREATE TABLE `visits_day` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`post_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`day` date NOT NULL,
`visits` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=52302 DEFAULT CHARSET=utf8
CREATE TABLE `posts` (
`post_id` int(11) NOT NULL AUTO_INCREMENT,
`link` varchar(300) NOT NULL,
`date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`title` varchar(500) NOT NULL,
`img` varchar(300) NOT NULL,
PRIMARY KEY (`post_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1027 DEFAULT CHARSET=utf8

With SQL_CALC_FOUND_ROWS, the query must evaluate everything, just not deliver all the rows. Getting rid of that should be beneficial.
To actually touch only 20 rows, we need to get through the WHERE, GROUP BY and ORDER BY with a single index. Otherwise, we might have to touch all the rows, sort them then deliver 20. The obvious index is (post_id); I suspect that is already indexed as PRIMARY KEY(post_id)? (It would help if you provide SHOW CREATE TABLE when asking questions.)
Another way to do the join, and get the desired result of zero, is as follows. Note that it eliminates the need for GROUP BY.
SELECT p.*,
IFNULL( ( SELECT SUM(v.visits)
FROM visits_day
WHERE post_id = p.post_id
),
0) AS visits
FROM posts AS p
ORDER BY post_id DESC
LIMIT 20 OFFSET 0
If you really need the count, then consider SELECT COUNT(*) FROM posts.
ON v.post_id=p.post_id in your query and WHERE post_id = p.post_id beg for INDEX(post_id) on visits_day. That will speed up both variants considerably.

Related

Optimize a query

How can I proceed to make my response time more faster, approximately the average time of response is 0.2s ( 8039 records in my items table & 81 records in my tracking table )
Query
SELECT a.name, b.cnt FROM `items` a LEFT JOIN
(SELECT guid, COUNT(*) cnt FROM tracking WHERE
date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day ) GROUP BY guid) b ON
a.`id` = b.guid WHERE a.`type` = 'streaming' AND a.`state` = 1
ORDER BY b.cnt DESC LIMIT 15 OFFSET 75
Tracking table structure
CREATE TABLE `tracking` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`ip` int(11) NOT NULL,
`date` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `i1` (`ip`,`guid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4303 DEFAULT CHARSET=latin1;
Items table structure
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`embed` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`tags` varchar(255) DEFAULT NULL,
`date` int(11) DEFAULT NULL,
`vote_val_total` float DEFAULT '0',
`vote_total` float(11,0) DEFAULT '0',
`rate` float DEFAULT '0',
`icon` text CHARACTER SET ascii,
`state` int(11) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9258 DEFAULT CHARSET=latin1;
Your query, as written, doesn't make much sense. It produces all possible combinations of rows in your two tables and then groups them.
You may want this:
SELECT a.*, b.cnt
FROM `items` a
LEFT JOIN (
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE `date` > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day)
GROUP BY guid
) b ON a.guid = b.guid
ORDER BY b.cnt DESC
The high-volume data in this query come from the relatively large tracking table. So, you should add a compound index to it, using the columns (date, guid). This will allow your query to random-access the index by date and then scan it for guid values.
ALTER TABLE tracking ADD INDEX guid_summary (`date`, guid);
I suppose you'll see a nice performance improvement.
Pro tip: Don't use SELECT *. Instead, give a list of the columns you want in your result set. For example,
SELECT a.guid, a.name, a.description, b.cnt
Why is this important?
First, it makes your software more resilient against somebody adding columns to your tables in the future.
Second, it tells the MySQL server to sling around only the information you want. That can improve performance really dramatically, especially when your tables get big.
Since tracking has significantly fewer rows than items, I will propose the following.
SELECT i.name, c.cnt
FROM
(
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day )
GROUP BY guid
) AS c
JOIN items AS i ON i.id = c.guid
WHERE i.type = 'streaming'
AND i.state = 1;
ORDER BY c.cnt DESC
LIMIT 15 OFFSET 75
It will fail to display any items for which cnt is 0. (Your version displays the items with NULL for the count.)
Composite indexes needed:
items: The PRIMARY KEY(id) is sufficient.
tracking: INDEX(date, guid) -- "covering"
Other issues:
If ip is an IP-address, it needs to be INT UNSIGNED. But that covers only IPv4, not IPv6.
It seems like date is not just a "date", but really a date+time. Please rename it to avoid confusion.
float(11,0) -- Don't use FLOAT for integers. Don't use (m,n) on FLOAT or DOUBLE. INT UNSIGNED makes more sense here.
OFFSET is naughty when it comes to performance -- it must scan over the skipped records. But, in your query, there is no way to avoid collecting all the possible rows, sorting them, stepping over 75, and only finally delivering 15 rows. (And, with no more than 81, it won't be a full 15.)
What version are you using? There have been important changes to the Optimization of LEFT JOIN ( SELECT ... ). Please provide EXPLAIN SELECT for each query under discussion.

Optimizing simple MySQL query response time

I have two tables in MySQL. One contains users and the other contains transaction data.
I am trying to find out the top 50 users that made the most profit.
CREATE TABLE IF NOT EXISTS `t_logs` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`userid` int(11) NOT NULL,
`amount` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`tid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
CREATE TABLE IF NOT EXISTS `t_users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`login` varchar(100) NOT NULL,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
Query:
SELECT *, (SELECT SUM(amount) FROM `t_logs` WHERE userid=u.id LIMIT 0,1) AS profit
FROM `t_users` u
ORDER BY profit DESC
LIMIT 0,50
Now the problem is that if I have 1000 entries in the t_users table and 3000 transactions in the t_logs table, this query takes 25seconds on a VDS with Apache and 2GB of RAM, or 9seconds on my local computer using XAMPP(I have 16GB of RAM).
Question is: Is there anything more that I can do to optimize all this? Maybe change the table engine from MyISAM to something else? Or maybe my query is not effective? Or the only solution is to add more RAM to the VDS.
If we try to add 10000 users and 10000 logs, the query takes 250 seconds on the VDS. What are my options if I expect to have more than 50000 users and more than 1 million logs?
SELECT u.* , p.profit
FROM `t_users` u
LEFT JOIN (
SELECT userid, SUM(amount) as profit FROM `t_logs` GROUP BY userid) AS p
ON p.userid=u.id
ORDER BY p.profit DESC
LIMIT 0,50
You should be using a LEFT JOIN with GROUP BY instead of a subquery, and you need an index on t_logs.userid:
SELECT u.*, SUM(l.amount) AS profit
FROM t_users u
LEFT JOIN t_logs l
ON l.userid = u.id
GROUP BY u.id
ORDER BY profit DESC
LIMIT 50

How to join two tables without messing up the query

I have this query for example (good, it works how I want it to)
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Example Results:
memberid postcount
3 283
6 230
9 198
Now I want to join the memberid of the discusComments table with that of the discusTopic table (because what I really want to do is only get my results from a specific GROUP, and the group id is only in the topic table and not in the comment one hence the join.
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
LEFT JOIN `discusTopics` ON `discusComments`.`memberID` = `discusTopics`.`memberID`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Example Results:
memberid postcount
3 14789
6 8678
9 6987
How can I stop this huge increase happening in the postcount? I need to preserve it as before.
Once I have this sorted I want to have some kind of line which says WHERE discusTopics.groupID = 6, for example
CREATE TABLE IF NOT EXISTS `discusComments` (
`id` bigint(255) NOT NULL auto_increment,
`topicID` bigint(255) NOT NULL,
`comment` text NOT NULL,
`timeStamp` bigint(12) NOT NULL,
`memberID` bigint(255) NOT NULL,
`thumbsUp` int(15) NOT NULL default '0',
`thumbsDown` int(15) NOT NULL default '0',
`status` int(1) NOT NULL default '1',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7190 ;
.
CREATE TABLE IF NOT EXISTS `discusTopics` (
`id` bigint(255) NOT NULL auto_increment,
`groupID` bigint(255) NOT NULL,
`memberID` bigint(255) NOT NULL,
`name` varchar(255) NOT NULL,
`views` bigint(255) NOT NULL default '0',
`lastUpdated` bigint(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `groupID` (`groupID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=913 ;
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
JOIN `discusTopics` ON `discusComments`.`topicID` = `discusTopics`.`id`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Joining the topicid in both tables solved the memberID issue. Thanks #Andiry M
You need to use just JOIN not LEFT JOIN and you can add AND discusTopics.memberID = 6 after ON discusComments.memberID = discusTopics.memberID
You can use subqueries lik this
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments` where `discusComments`.`memberID` in
(select distinct memberid from `discusTopics` WHERE GROUPID = 6)
If i understand your question right you do not need to use JOIN here at all. JOINs are needed in case when you have many to many relationships and you need for each value in one table select all corresponding values in another table.
But here you have many to one relationship if i got it right. Then you can simply do select from two tables like this
SELECT a.*, b.id FROM a, b WHERE a.pid = b.id
This is simple request and won't create a giant overhead as JOIN does
PS: In the future try to experiment with your queries, try to avoid JOINs especially in MySQL. They are slow and dangerous in their complexity. For 90% of cases when you want to use JOIN there is simple and much faster solution.

Eliminating values from one table with another. Super slow

In the same datbase I have a table messages whos columns: id, title, text I want. I want only the records of which title has no entries in the table lastlogon who's title equivalent is then named username.
I have been using this SQL command in PHP, it generally took 2-3 seconds to pull up:
SELECT DISTINCT * FROM messages WHERE title NOT IN (SELECT username FROM lastlogon) LIMIT 1000
This was all good until the table lastlogon started to have about 80% of the values table messages. Messages has about 8000 entries, lastlogon about 7000. Now it takes about a minute to 2 minutes for it to go through. MySQL shoots up to very high CPU usage.
I tried the following but had no luck reducing the time:
SELECT id,title,text FROM messages a LEFT OUTER JOIN lastlogon b ON (a.title = b.username) LIMIT 1000
Why all of a sudden is it taking so long for such low amount of entries? I tried restarting mysql and apache multiple times. I am using debian linux.
Edit: Here are the structures
--
-- Table structure for table `lastlogon`
--
CREATE TABLE IF NOT EXISTS `lastlogon` (
`username` varchar(25) NOT NULL,
`lastlogon` date NOT NULL,
`datechecked` date NOT NULL,
PRIMARY KEY (`username`),
KEY `username` (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `messages`
--
CREATE TABLE IF NOT EXISTS `messages` (
`id` smallint(9) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`email` varchar(50) NOT NULL,
`text` mediumtext,
`folder` tinyint(2) NOT NULL,
`read` smallint(5) unsigned NOT NULL,
`dateline` int(10) unsigned NOT NULL,
`ip` varchar(15) NOT NULL,
`attachment` varchar(255) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`username` varchar(300) NOT NULL,
`error` varchar(500) NOT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=9010 ;
Edit 2
Edited structure with new indexes.
After putting an index on both messages.title and lastlogon.username I came up with these results:
Showing rows 0 - 29 (623 total, Query took 74.4938 sec)
First: replace the key on title, with a compound key on title + id
ALTER TABLE messages DROP INDEX title;
ALTER TABLE messages ADD INDEX title (title, id);
Now change the select to:
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
LIMIT 1000;
Or
SELECT m.* FROM messages m
WHERE m.title NOT IN (SELECT l.username FROM lastlogon l)
-- GROUP BY m.id DESC -- faster than distinct, I don't think you need it though.
LIMIT 1000;
Another problem with the slowness is the SELECT m.* part.
By selecting all column, you are forcing MySQL to do extra work.
Only select the columns you need:
SELECT m.title, m.name, m.email, ......
This will speed up the query as well.
There's another trick you can use:
Replace the limit 1000 with a cutoff date.
Step 1: Add an index on timestamp (or whatever field you want to use for the cutoff).
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE (m.id > (SELECT MIN(M2.ID) FROM messages m2 WHERE m2.timestamp >= '2011-09-01'))
AND l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
I suggest you to add an index on messages.title . Then try to run again the query and test the performance.

speeding up mysql queries / mysql views in django

I use the following code to select popular news entries (by date) from the database:
popular = Entry.objects.filter(type='A', is_public=True).extra(select = {'dpub': 'date(dt_published)'}).order_by('-dpub', '-views', '-dt_written', 'headline')[0:5]
To compare the execution speeds of a normal query and this one I ran the following mysql queries:
SELECT *, date(dt_published) as dpub FROM `news_entry` order by dpub DESC LIMIT 500
# Showing rows 0 - 29 (500 total, Query took 0.1386 sec)
-
SELECT * , DATE( dt_published ) AS dpub FROM `news_entry` ORDER BY id DESC LIMIT 500
# Showing rows 0 - 29 (500 total, Query took 0.0021 sec) [id: 58079 - 57580]
As you can see the normal query is much faster. Is there a way to speed this up?
Is it possible to use mysql views with django?
I realize I could just split the datetime field into two fields (date and time), but I'm curious.
Structure:
CREATE TABLE IF NOT EXISTS `news_entry` (
`id` int(11) NOT NULL DEFAULT '0',
`views` int(11) NOT NULL,
`user_views` int(11) NOT NULL,
`old_id` int(11) DEFAULT NULL,
`type` varchar(1) NOT NULL,
`headline` varchar(256) NOT NULL,
`subheadline` varchar(256) NOT NULL,
`slug` varchar(50) NOT NULL,
`category_id` int(11) DEFAULT NULL,
`is_public` tinyint(1) NOT NULL,
`is_featured` tinyint(1) NOT NULL,
`dt_written` datetime DEFAULT NULL,
`dt_modified` datetime DEFAULT NULL,
`dt_published` datetime DEFAULT NULL,
`author_id` int(11) DEFAULT NULL,
`author_alt` varchar(256) NOT NULL,
`email_alt` varchar(256) NOT NULL,
`tags` varchar(255) NOT NULL,
`content` longtext NOT NULL
) ENGINE=MyISAM DEFAULT;
SELECT *, date(dt_published) as dpub FROM `news_entry` order by dpub DESC LIMIT 500
This query orders on dpub, while this one:
SELECT * , DATE( dt_published ) AS dpub FROM `news_entry` ORDER BY id DESC LIMIT 500
orders on id.
Since id is most probably a PRIMARY KEY for your table, and each PRIMARY KEY has an implicit index backing it, ORDER BY does not need to sort.
dpub is a computed field and MySQL does not support indexes on computed fields. However, ORDER BY dt_published is an ORDER BY dpub as well.
You need to change your query to this:
SELECT *, date(dt_published) as dpub FROM `news_entry` order by date_published DESC LIMIT 500
and create an index on news_entry (dt_published).
Update:
Since DATE is a monotonic function, you may employ this trick:
SELECT *, DATE(dt_published) AS dpub
FROM news_entry
WHERE dt_published >=
(
SELECT md
FROM (
SELECT DATE(dt_published) AS md
FROM news_entry
ORDER BY
dt_published DESC
LIMIT 499, 1
) q
UNION ALL
SELECT DATE(MIN(dt_published))
FROM news_entry
LIMIT 1
)
ORDER BY
dpub DESC, views DESC, dt_written DESC, headline
LIMIT 500
This query does the following:
Selects the 500th record in dt_published DESC order, or the first record posted should there be less than 500 records in the table.
Fetches all records posted later than the date of the last record selected. Since DATE(x) is always less or equal to x, there can be more than 500 records, but still
much less than the whole table.
Orders and limits these records as appropriate.
You may find this article interesting, since it covers a similar problem:
Things SQL needs: sargability of monotonic functions
May need an index on dt_published. Could you post the query plans for the two queries?