Eliminating values from one table with another. Super slow - mysql

In the same datbase I have a table messages whos columns: id, title, text I want. I want only the records of which title has no entries in the table lastlogon who's title equivalent is then named username.
I have been using this SQL command in PHP, it generally took 2-3 seconds to pull up:
SELECT DISTINCT * FROM messages WHERE title NOT IN (SELECT username FROM lastlogon) LIMIT 1000
This was all good until the table lastlogon started to have about 80% of the values table messages. Messages has about 8000 entries, lastlogon about 7000. Now it takes about a minute to 2 minutes for it to go through. MySQL shoots up to very high CPU usage.
I tried the following but had no luck reducing the time:
SELECT id,title,text FROM messages a LEFT OUTER JOIN lastlogon b ON (a.title = b.username) LIMIT 1000
Why all of a sudden is it taking so long for such low amount of entries? I tried restarting mysql and apache multiple times. I am using debian linux.
Edit: Here are the structures
--
-- Table structure for table `lastlogon`
--
CREATE TABLE IF NOT EXISTS `lastlogon` (
`username` varchar(25) NOT NULL,
`lastlogon` date NOT NULL,
`datechecked` date NOT NULL,
PRIMARY KEY (`username`),
KEY `username` (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `messages`
--
CREATE TABLE IF NOT EXISTS `messages` (
`id` smallint(9) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`email` varchar(50) NOT NULL,
`text` mediumtext,
`folder` tinyint(2) NOT NULL,
`read` smallint(5) unsigned NOT NULL,
`dateline` int(10) unsigned NOT NULL,
`ip` varchar(15) NOT NULL,
`attachment` varchar(255) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`username` varchar(300) NOT NULL,
`error` varchar(500) NOT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=9010 ;
Edit 2
Edited structure with new indexes.
After putting an index on both messages.title and lastlogon.username I came up with these results:
Showing rows 0 - 29 (623 total, Query took 74.4938 sec)

First: replace the key on title, with a compound key on title + id
ALTER TABLE messages DROP INDEX title;
ALTER TABLE messages ADD INDEX title (title, id);
Now change the select to:
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
LIMIT 1000;
Or
SELECT m.* FROM messages m
WHERE m.title NOT IN (SELECT l.username FROM lastlogon l)
-- GROUP BY m.id DESC -- faster than distinct, I don't think you need it though.
LIMIT 1000;
Another problem with the slowness is the SELECT m.* part.
By selecting all column, you are forcing MySQL to do extra work.
Only select the columns you need:
SELECT m.title, m.name, m.email, ......
This will speed up the query as well.
There's another trick you can use:
Replace the limit 1000 with a cutoff date.
Step 1: Add an index on timestamp (or whatever field you want to use for the cutoff).
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE (m.id > (SELECT MIN(M2.ID) FROM messages m2 WHERE m2.timestamp >= '2011-09-01'))
AND l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.

I suggest you to add an index on messages.title . Then try to run again the query and test the performance.

Related

MySql Join slow with SUM() of results

anyone know a more efficient way to execute this query?
SELECT SQL_CALC_FOUND_ROWS p.*, IFNULL(SUM(v.visits),0) AS visits,
FROM posts AS p
LEFT JOIN visits_day v ON v.post_id=p.post_id
GROUP BY post_id
ORDER BY post_id DESC LIMIT 20 OFFSET 0
The visits_day table has one record per day, per user, per post. With the growth of the table this query is extremely slow.
I cant add a column with the total visit count because I need to list the posts by more visits per day or per week, etc.
Does anyone know a beter solution to this?
Thanks
CREATE TABLE `visits_day` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`post_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`day` date NOT NULL,
`visits` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=52302 DEFAULT CHARSET=utf8
CREATE TABLE `posts` (
`post_id` int(11) NOT NULL AUTO_INCREMENT,
`link` varchar(300) NOT NULL,
`date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`title` varchar(500) NOT NULL,
`img` varchar(300) NOT NULL,
PRIMARY KEY (`post_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1027 DEFAULT CHARSET=utf8
With SQL_CALC_FOUND_ROWS, the query must evaluate everything, just not deliver all the rows. Getting rid of that should be beneficial.
To actually touch only 20 rows, we need to get through the WHERE, GROUP BY and ORDER BY with a single index. Otherwise, we might have to touch all the rows, sort them then deliver 20. The obvious index is (post_id); I suspect that is already indexed as PRIMARY KEY(post_id)? (It would help if you provide SHOW CREATE TABLE when asking questions.)
Another way to do the join, and get the desired result of zero, is as follows. Note that it eliminates the need for GROUP BY.
SELECT p.*,
IFNULL( ( SELECT SUM(v.visits)
FROM visits_day
WHERE post_id = p.post_id
),
0) AS visits
FROM posts AS p
ORDER BY post_id DESC
LIMIT 20 OFFSET 0
If you really need the count, then consider SELECT COUNT(*) FROM posts.
ON v.post_id=p.post_id in your query and WHERE post_id = p.post_id beg for INDEX(post_id) on visits_day. That will speed up both variants considerably.

Optimize a query

How can I proceed to make my response time more faster, approximately the average time of response is 0.2s ( 8039 records in my items table & 81 records in my tracking table )
Query
SELECT a.name, b.cnt FROM `items` a LEFT JOIN
(SELECT guid, COUNT(*) cnt FROM tracking WHERE
date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day ) GROUP BY guid) b ON
a.`id` = b.guid WHERE a.`type` = 'streaming' AND a.`state` = 1
ORDER BY b.cnt DESC LIMIT 15 OFFSET 75
Tracking table structure
CREATE TABLE `tracking` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`ip` int(11) NOT NULL,
`date` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `i1` (`ip`,`guid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4303 DEFAULT CHARSET=latin1;
Items table structure
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`embed` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`tags` varchar(255) DEFAULT NULL,
`date` int(11) DEFAULT NULL,
`vote_val_total` float DEFAULT '0',
`vote_total` float(11,0) DEFAULT '0',
`rate` float DEFAULT '0',
`icon` text CHARACTER SET ascii,
`state` int(11) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9258 DEFAULT CHARSET=latin1;
Your query, as written, doesn't make much sense. It produces all possible combinations of rows in your two tables and then groups them.
You may want this:
SELECT a.*, b.cnt
FROM `items` a
LEFT JOIN (
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE `date` > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day)
GROUP BY guid
) b ON a.guid = b.guid
ORDER BY b.cnt DESC
The high-volume data in this query come from the relatively large tracking table. So, you should add a compound index to it, using the columns (date, guid). This will allow your query to random-access the index by date and then scan it for guid values.
ALTER TABLE tracking ADD INDEX guid_summary (`date`, guid);
I suppose you'll see a nice performance improvement.
Pro tip: Don't use SELECT *. Instead, give a list of the columns you want in your result set. For example,
SELECT a.guid, a.name, a.description, b.cnt
Why is this important?
First, it makes your software more resilient against somebody adding columns to your tables in the future.
Second, it tells the MySQL server to sling around only the information you want. That can improve performance really dramatically, especially when your tables get big.
Since tracking has significantly fewer rows than items, I will propose the following.
SELECT i.name, c.cnt
FROM
(
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day )
GROUP BY guid
) AS c
JOIN items AS i ON i.id = c.guid
WHERE i.type = 'streaming'
AND i.state = 1;
ORDER BY c.cnt DESC
LIMIT 15 OFFSET 75
It will fail to display any items for which cnt is 0. (Your version displays the items with NULL for the count.)
Composite indexes needed:
items: The PRIMARY KEY(id) is sufficient.
tracking: INDEX(date, guid) -- "covering"
Other issues:
If ip is an IP-address, it needs to be INT UNSIGNED. But that covers only IPv4, not IPv6.
It seems like date is not just a "date", but really a date+time. Please rename it to avoid confusion.
float(11,0) -- Don't use FLOAT for integers. Don't use (m,n) on FLOAT or DOUBLE. INT UNSIGNED makes more sense here.
OFFSET is naughty when it comes to performance -- it must scan over the skipped records. But, in your query, there is no way to avoid collecting all the possible rows, sorting them, stepping over 75, and only finally delivering 15 rows. (And, with no more than 81, it won't be a full 15.)
What version are you using? There have been important changes to the Optimization of LEFT JOIN ( SELECT ... ). Please provide EXPLAIN SELECT for each query under discussion.

Counting records in normalized table

Older questions seen
Counting one table of records for matching records of another table
MySQL Count matching records from multiple tables
Count records from two tables grouped by one field
Table(s) Schema
Table entries having data from 2005-01-25
CREATE TABLE `entries` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`ctg` VARCHAR(15) NOT NULL,
`msg` VARCHAR(200) NOT NULL,
`nick` VARCHAR(30) NOT NULL,
`date` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `msg` (`msg`),
INDEX `date` (`date`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Child table magnets with regular data from 2011-11-08(There might be a few entries from before that)
CREATE TABLE `magnets` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`eid` INT(10) UNSIGNED NOT NULL,
`tth` CHAR(39) NOT NULL,
`size` BIGINT(20) UNSIGNED NOT NULL DEFAULT '0',
`nick` VARCHAR(30) NOT NULL DEFAULT 'hjpotter92',
`date` DATETIME NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `eid_tth` (`eid`, `tth`),
INDEX `entriedID` (`eid`),
INDEX `tth_size` (`tth`, `size`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Question
I want to get the count of total number of entries by any particular nick(or user) entered in either of the table.
One of the entry in entries is populated at the same time as magnets and the subsequent entries of magnets can be from the same nick or different.
My Code
Try 1
SELECT `e`.id, COUNT(1), `e`.nick, `m`.nick
FROM `entries` `e`
INNER JOIN `magnets` `m`
ON `m`.`eid` = `e`.id
GROUP BY `e`.nick
Try 2
SELECT `e`.id, COUNT(1), `e`.nick
FROM `entries` `e`
GROUP BY `e`.nick
UNION ALL
SELECT `m`.eid, COUNT(1), `m`.nick
FROM `magnets` `m`
GROUP BY `m`.nick
The second try is generating some relevant outputs, but it contains double entries for all the nick which appear in both tables.
Also, I don't want to count twice, those entries/magnets which were inserted in the first query. Which is what the second UNION statement is doing. It takes in all the values from both tables.
SQL Fiddle link
Here is the link to a SQL Fiddle along with randomly populated entries.
I really hope someone can guide me through this. If it's any help, I will be using PHP for final display of data. So, my last resort would be to nest loops in PHP for the counting(which I am currently doing).
Desired output
The output that should be generated on the fiddle should be:
************************************************
** Nick ||| Count **
************************************************
** Nick1 ||| 10 **
** Nick2 ||| 9 **
** Nick3 ||| 6 **
** Nick4 ||| 10 **
************************************************
There might be a more efficient way but this works if I understand correctly:
SELECT SUM(cnt), nick FROM
(SELECT count(*) cnt, e.nick FROM entries e
LEFT JOIN magnets m ON (e.id=m.eid AND e.nick=m.nick)
WHERE eid IS NULL GROUP BY e.nick
UNION ALL
SELECT count(*) cnt, nick FROM magnets m GROUP BY nick) u
GROUP BY nick

Some help needed with a SQL query

I need some help with a MySQL query. I have two tables, one with offers and one with statuses. An offer can has one or more statuses. What I would like to do is get all the offers and their latest status. For each status there's a table field named 'added' which can be used for sorting.
I know this can be easily done with two queries, but I need to make it with only one because I also have to apply some filters later in the project.
Here's my setup:
CREATE TABLE `test`.`offers` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`client` TEXT NOT NULL ,
`products` TEXT NOT NULL ,
`contact` TEXT NOT NULL
) ENGINE = MYISAM ;
CREATE TABLE `statuses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`offer_id` int(11) NOT NULL,
`options` text NOT NULL,
`deadline` date NOT NULL,
`added` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Should work but not very optimal imho :
SELECT *
FROM offers
INNER JOIN statuses ON (statuses.offer_id = offers.id
AND statuses.id =
(SELECT allStatuses.id
FROM statuses allStatuses
WHERE allStatuses.offer_id = offers.id
ORDER BY allStatuses.added DESC LIMIT 1))
Try this:
SELECT
o.*
FROM offers o
INNER JOIN statuses s ON o.id = s.offer_id
ORDER BY s.added
LIMIT 1

MySQL query killing my server

Looking at this query there's got to be something bogging it down that I'm not noticing. I ran it for 7 minutes and it only updated 2 rows.
//set product count for makes
$tru->query->run(array(
'name' => 'get-make-list',
'sql' => 'SELECT id, name FROM vehicle_make',
'connection' => 'core'
));
while($tempMake = $tru->query->getArray('get-make-list')) {
$tru->query->run(array(
'name' => 'update-product-count',
'sql' => 'UPDATE vehicle_make SET product_count = (
SELECT COUNT(product_id) FROM taxonomy_master WHERE v_id IN (
SELECT id FROM vehicle_catalog WHERE make_id = '.$tempMake['id'].'
)
) WHERE id = '.$tempMake['id'],
'connection' => 'core'
));
}
I'm sure this query can be optimized to perform better, but I can't think of how to do it.
vehicle_make = 45 rows
taxonomy_master = 11,223 rows
vehicle_catalog = 5,108 rows
All tables have appropriate indexes
UPDATE: I should note that this is a 1-time script so overhead isn't a big deal as long as it runs.
CREATE TABLE IF NOT EXISTS `vehicle_make` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(32) NOT NULL,
`product_count` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=46 ;
CREATE TABLE IF NOT EXISTS `taxonomy_master` (
`product_id` int(10) NOT NULL,
`v_id` int(10) NOT NULL,
`vehicle_requirement` varchar(255) DEFAULT NULL,
`is_sellable` enum('True','False') DEFAULT 'True',
`programming_override` varchar(25) DEFAULT NULL,
PRIMARY KEY (`product_id`,`v_id`),
KEY `idx2` (`product_id`),
KEY `idx3` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `vehicle_catalog` (
`v_id` int(10) NOT NULL,
`id` int(11) NOT NULL,
`v_make` varchar(255) NOT NULL,
`make_id` int(11) NOT NULL,
`v_model` varchar(255) NOT NULL,
`model_id` int(11) NOT NULL,
`v_year` varchar(255) NOT NULL,
PRIMARY KEY (`v_id`,`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx` (`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx2` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Update: The successful query to get what I needed is here....
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
without the tables/columns this is my best guess from reverse engineering the given queries:
UPDATE m
SET product_count =COUNT(t.product_id)
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.name
The given code loops over each make, and then runs a query the counts for each. My answer just does them all in one query and should be a lot faster.
have an index for each of these:
vehicle_make.id cover on name
vehicle_catalog.id cover make_id
taxonomy_master.v_id
EDIT
give this a try:
CREATE TEMPORARY TABLE CountsOf (
id int(11) NOT NULL
, CountOf int(11) NOT NULL DEFAULT 0.00
);
INSERT INTO CountsOf
(id, CountOf )
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
UPDATE taxonomy_master,CountsOf
SET taxonomy_master.product_count=CountsOf.CountOf
WHERE taxonomy_master.id=CountsOf.id;
instead of using nested query ,
you can separated this query to 2 or 3 queries,
and in php insert the result of the inner query to the out query ,
its faster !
#haim-evgi Separating the queries will not increase the speed significantly, it will just shift the load from the DB server to the Web server and create overhead of moving data between the two servers.
I am not sure with the appropriate indexes you run such query 7 minutes. Could you please show the table structure of the tables involved in these queries.
Seems like you need the following indices:
INDEX BTREE('make_id') on vehicle_catalog
INDEX BTREE('v_id') on taxonomy_master