Optimize mysql with inner joins and where

Optimize mysql with inner joins and where - mysql

I have query:
SELECT DISTINCT h.id,
h.host
FROM pozycje p
INNER JOIN hosty h ON p.host_id = h.id
INNER JOIN keywordy k ON k.id=p.key_id
AND k.bing=0
WHERE h.archive_data_checked IS NULL LIMIT 20
It's fast when some rows exists but if no results exists it takes 2,3 sek to execute. I would like to have less than 1 sec. Explain looks like:
http://tinyurl.com/gogx42n
Table pozycje has 30 000 000 rows, hosty has 4 000 000 rows and keywordy has 40 000 rows. Engine InnoDB, server with 32GB RAM
What indexes or improvements can I do to spped up query when no results exists?
edit:
show table keywordy;
CREATE TABLE `keywordy` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`main_kw` varchar(255) CHARACTER SET utf8 NOT NULL,
`keyword` varchar(255) CHARACTER SET utf8 NOT NULL,
`lang` varchar(10) CHARACTER SET utf8 NOT NULL,
`searches` int(11) NOT NULL,
`cpc` float NOT NULL,
`competition` float NOT NULL,
`currency` varchar(10) CHARACTER SET utf8 NOT NULL,
`data` date DEFAULT NULL,
`adwords` int(11) NOT NULL,
`monitoring` tinyint(1) NOT NULL DEFAULT '0',
`bing` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `keyword` (`keyword`,`lang`),
KEY `id_bing` (`id`,`bing`)
) ENGINE=InnoDB AUTO_INCREMENT=38362 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

can pls test this:
SELECT DISTINCT h.id,
h.host
FROM hosty h
WHERE
EXISTS ( SELECT 1 FROM keywordy WHERE id=p.key_id AND bing=0)
AND
EXISTS ( SELECT 1 FROM pozycje WHERE host_id = h.id)
AND h.archive_data_checked IS NULL LIMIT 20

I would first offer the following question. Which would have the smaller "set" if you did a query on
select count(*) from KeyWordy where bing = 0
vs
select count(*) from hosty where archive_date_checked IS NULL
I would then try to optimize the query knowing the smaller set and work with that as my primary criteria for indexing. If KeyWordy is more likely to be the smaller set, I would offer your tables to have the following indexes
table index
keywordy (bing, id) specifically NOT (id, bing) as bing FIRST is optimized for where or JOIN clause
pozycje (key_id, host_id )
hosty (archive_data_checked, id, host)
SELECT DISTINCT
h.id,
h.host
FROM
Keywordy k
JOIN pozycje p
ON k.id = p.key_id
JOIN hosty h
on archive_data_checked IS NULL
AND p.host_id = h.id
WHERE
k.bing = 0
LIMIT
20
if the HOSTY table would be smaller base on the archive_data_checked IS NULL, I offer the following
table index
pozycje (host_id, key_id ) reversed of other option
SELECT DISTINCT
h.id,
h.host
FROM
hosty h
JOIN pozycje p
ON h.id = p.host_id
JOIN Keywordy k
on k.bing = 0
AND p.key_id = k.id
WHERE
h.archive_data_checked IS NULL
LIMIT
20
One FINAL option, might be to add the keyword "STRAIGHT_JOIN" such as
select STRAIGHT_JOIN DISTINCT ... rest of query
If it works for you, what timing improvements does this offer.

Related

Group Concat Or Something Else? Mysql Query Taking So Long

This is my Tables structures:
DROP TABLE IF EXISTS `alljobs`;
CREATE TABLE IF NOT EXISTS `alljobs`
(
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Department` varchar(50) NULL,
`SourceSite` varchar(50) NULL,
`Title` varchar(255) NULL,
`Description` text NULL,
`JobType` varchar(50) NULL,
`MainCategory` varchar(100) NULL,
`SubCategory` varchar(100) NULL,
`Url` varchar(255) NULL,
`TimeOfCreation` varchar(50) NULL,
`UnixTime` varchar(25) NULL,
PRIMARY KEY (`ID`),
KEY `Department` (`Department`),
KEY `SourceSite` (`SourceSite`),
KEY `Title` (`Title`),
KEY `MainCategory` (`MainCategory`),
KEY `SubCategory` (`SubCategory`),
KEY `UnixTime` (`UnixTime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='To Hold All Job Information';
DROP TABLE IF EXISTS `joblocations`;
CREATE TABLE IF NOT EXISTS `joblocations`
(
`ID` int(11) NOT NULL AUTO_INCREMENT,
`jobId` varchar(25) NULL,
`Location` varchar(100) NULL,
PRIMARY KEY (`ID`),
KEY `jobId` (`jobId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='To Hold Job Locations';
this is the query i am trying:
(it is basically the query if user doesn't apply any filter on the search page)
Select j.Department, j.SourceSite, j.Title, j.Url,
Group_Concat(l.Location Separator '||') As Locations
From alljobs as j
Left Outer Join joblocations as l On l.jobId = j.ID
Group By j.ID
Order By j.Title
Limit 25 Offset 0;
But it's taking 4-5 minutes to execute the query on phpMyadmin and on php it's just timing out..
there is only 30-40k data in both tables.
but if the user apply any search filter and execute it, it's takes less than a second (as per phpMyAdmin results saying):
Select j.Department, j.SourceSite, j.Title, j.Url,
Group_Concat(l.Location Separator '||') As Locations
From alljobs as j
Left Outer Join joblocations as l On l.jobId = j.ID
Where Department In ('Healthcare', 'Food', 'Technology', 'Lifestyle')
And MainCategory Like '%Marketing%'And Location Like '%California%'
Group By j.ID
Order By j.Title
Limit 25 Offset 0 ;
so, i don't understand what i am doing wrong here? i just wanted to have first 25 rows of data and concat the locations on 2nd table (matched by job id) in to single column.
any suggestion will be highly appreciated.
best regards

I don't see a real reason why your query should be so slow. However, since you group by j.ID and then order by j.Title, followed by a limit, the query has to process all data in the two tables before it can output the result. You only need 25 of the 30-40k rows from the alljobs table.
So let's rethink this. First write down the query for one table:
SELECT
j.Department,
j.SourceSite,
j.Title,
j.Url,
<... Locations... >
FROM alljobs AS j
ORDER BY j.Title
LIMIT 25 OFFSET 0;
This should be very quick, since j.Title is an index. The only problem left is, of course, that Locations doesn't work. We could write a subquery for this:
SELECT
j.Department,
j.SourceSite,
j.Title,
j.Url,
(SELECT
GROUP_CONCAT(l.Location SEPARATOR '||')
FROM joblocations AS l
WHERE l.jobId = j.ID) AS Locations
FROM alljobs AS j
ORDER BY j.Title
LIMIT 25 OFFSET 0;
The query now has to execute this subquery 25 times. That's quite a lot but still a lot less than going through the whole table. What should make it reasonable quick is that l.jobId is an index.
The reason your query with the search filters is a lot quicker is because you already filter out a lot of rows in advance with those search filters. A smaller data set is just quicker. However, the data set might not be small for some search filters, making the query take a lot longer.
If you want to use a query with a subquery, as shown above, it should look something like:
SELECT
j.Department,
j.SourceSite,
j.Title,
j.Url,
(SELECT
GROUP_CONCAT(l.Location SEPARATOR '||')
FROM joblocations AS l
WHERE l.jobId = j.ID) AS Locations
FROM alljobs AS j
WHERE j.Department IN ('Healthcare', 'Food', 'Technology', 'Lifestyle')
AND j.MainCategory LIKE '%Marketing%'
AND EXISTS (SELECT ID
FROM joblocations AS m
WHERE m.jobId = j.ID
AND m.Location LIKE '%California%')
ORDER BY j.Title
LIMIT 25 OFFSET 0;
I hope this works.

Mysql query with multiple selects results in high CPU load

I'm trying to do a link exchange script and run into a bit of trouble.
Each link can be visited by an IP address a number of x times (frequency in links table). Each visit costs a number of credits (spend limit given in limit in links table)
I've got the following tables:
CREATE TABLE IF NOT EXISTS `contor` (
`key` varchar(25) NOT NULL,
`uniqueHandler` varchar(30) DEFAULT NULL,
`uniqueLink` varchar(30) DEFAULT NULL,
`uniqueUser` varchar(30) DEFAULT NULL,
`owner` varchar(50) NOT NULL,
`ip` varchar(15) DEFAULT NULL,
`credits` float NOT NULL,
`tstamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`key`),
KEY `uniqueLink` (`uniqueLink`),
KEY `uniqueHandler` (`uniqueHandler`),
KEY `uniqueUser` (`uniqueUser`),
KEY `owner` (`owner`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `links` (
`unique` varchar(30) NOT NULL DEFAULT '',
`url` varchar(1000) DEFAULT NULL,
`frequency` varchar(5) DEFAULT NULL,
`limit` float NOT NULL DEFAULT '0',
PRIMARY KEY (`unique`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I've got the following query:
$link = MYSQL_QUERY("
SELECT *
FROM `links`
WHERE (SELECT count(key) FROM contor WHERE ip = '$ip' AND contor.uniqueLink = links.unique) <= `frequency`
AND (SELECT sum(credits) as cost FROM contor WHERE contor.uniqueLink = links.unique) <= `limit`")
There are 20 rows in the table links.
The problem is that whenever there are about 200k rows in the table contor the CPU load is huge.
After applying the solution provided by #Barmar:
Added composite index on (uniqueLink, ip) and droping all other indexes except PRIMARY, EXPLAIN gives me this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY l ALL NULL NULL NULL NULL 18
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 15
2 DERIVED pop_contor index NULL contor_IX1 141 NULL 206122

Try using a join rather than a correlated subquery.
SELECT l.*
FROM links AS l
LEFT JOIN (
SELECT uniqueLink, SUM(ip = '$ip') AS ip_visits, SUM(credits) AS total_credits
FROM contor
GROUP BY uniqueLink
) AS c
ON c.uniqueLink = l.unique AND ip_visits <= frequency AND total_credits <= limit
If this doesn't help, try adding an index on contor.ip.

The current query is of the form:
SELECT l.*
FROM `links` l
WHERE l.frequency >= ( SELECT COUNT(ck.key)
FROM contor ck
WHERE ck.uniqueLink = l.unique
AND ck.ip = '$ip'
)
AND l.limit >= ( SELECT SUM(sc.credits)
FROM contor sc
WHERE sc.uniqueLink = l.unique
)
Those correlated subqueries are going to each your lunch. And your lunchbox too.
I'd suggest testing an inline view that performs both of the aggregations from contor in one pass, and then join the result from that to the links table.
Something like this:
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip' AND c.key IS NOT NULL) AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits
For optimal performance of the aggregation inline view query, provide a covering index that MySQL can use to optimize the GROUP BY (avoiding a Using filesort operation)
CREATE INDEX `contor_IX1` ON `contor` (`uniqueLink`, `credits`, `ip`) ;
Adding that index renders the uniqueLink index redundant, so also...
DROP INDEX `uniqueLink` ON `contor` ;
EDIT
Since we have a guarantee that contor.key column is non-NULL (i.e. the NOT NULL constraint), this part of the query above is unneeded AND c.key IS NOT NULL, and can be removed. (I also removed the key column from the covering index definition above.)
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip') AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits

MySQL joining most recent record from another query is slow

I have the two following tables:
CREATE TABLE `modlogs` (
`mod` int(11) NOT NULL,
`ip` varchar(39) CHARACTER SET ascii NOT NULL,
`board` varchar(58) CHARACTER SET utf8 DEFAULT NULL,
`time` int(11) NOT NULL,
`text` text NOT NULL,
KEY `time` (`time`),
KEY `mod` (`mod`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4
CREATE TABLE `mods` (
`id` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(30) NOT NULL,
`password` char(64) CHARACTER SET ascii NOT NULL COMMENT 'SHA256',
`salt` char(32) CHARACTER SET ascii NOT NULL,
`type` smallint(2) NOT NULL,
`boards` text CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`,`username`)
) ENGINE=MyISAM AUTO_INCREMENT=933 DEFAULT CHARSET=utf8mb4
I want to join the most recent log entry with the mod's name, however my query is very slow (takes 5.23 seconds):
SELECT *
FROM mods LEFT JOIN
modlogs
ON modlogs.mod = mods.id
AND modlogs.time = (SELECT MAX(time)
FROM mods
WHERE mods.id = modlogs.mod
);
All other answers on SO also seem to use dependent subqueries. Is there a way I can do this in a way that will return results more quickly?

Here's another solution, putting the subquery into a derived table avoids the problem of a dependent subquery. It'll run the subquery just once.
SELECT *
FROM mods AS m
LEFT JOIN (
SELECT ml1.*
FROM modlogs AS ml1
JOIN (
SELECT `mod`, MAX(time) AS time
FROM modlogs
GROUP BY `mod`
) AS ml2 USING (`mod`, time)
) AS ml ON m.id = ml.`mod`;

This is your query:
SELECT *
FROM mods LEFT JOIN
modlogs
ON modlogs.mod = (SELECT MAX(time)
FROM modlogs
WHERE mods.id = modlogs.mod
);
This query does not make sense. You are comparing something called mod to a max time. Sounds like it won't work to me, but then there are some very "clever" data models out there. I suspect you really want:
SELECT *
FROM mods LEFT JOIN
modlogs
ON mods.id = modlods.mod and
modlogs.time = (SELECT MAX(time)
FROM mods
WHERE mods.id = modlogs.mod
);
I wouldn't write the query this way, because join conditions in the on clause seem confusing to me. But, you did. You can get better performance with an index. I would suggest:
create index modlogs_mod_time on modlogs(mod, time);
I would write the query as:
SELECT *
FROM mods LEFT JOIN
modlogs
ON mods.id = modlods.mod
WHERE NOT EXISTS (SELECT 1
FROM modlogs ml2
WHERE modlogs.mod = ml2.mod and
ml2.time > modlogs.time
);

I think you can also solve this one with an anti-join, though I'm skeptical of the performance on this one:
SELECT mods.*, modlogs.*
FROM mods
LEFT JOIN modlogs
ON modlogs.mod = mods.id
LEFT JOIN mods m2
ON m2.id = modlogs.mod
AND m2.time < modlogs.time
WHERE m2.id IS NULL
Ensure you have an index on modlogs(mod), and consider the index mods(id, time) for better performance.

How to join two tables without messing up the query

I have this query for example (good, it works how I want it to)
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Example Results:
memberid postcount
3 283
6 230
9 198
Now I want to join the memberid of the discusComments table with that of the discusTopic table (because what I really want to do is only get my results from a specific GROUP, and the group id is only in the topic table and not in the comment one hence the join.
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
LEFT JOIN `discusTopics` ON `discusComments`.`memberID` = `discusTopics`.`memberID`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Example Results:
memberid postcount
3 14789
6 8678
9 6987
How can I stop this huge increase happening in the postcount? I need to preserve it as before.
Once I have this sorted I want to have some kind of line which says WHERE discusTopics.groupID = 6, for example
CREATE TABLE IF NOT EXISTS `discusComments` (
`id` bigint(255) NOT NULL auto_increment,
`topicID` bigint(255) NOT NULL,
`comment` text NOT NULL,
`timeStamp` bigint(12) NOT NULL,
`memberID` bigint(255) NOT NULL,
`thumbsUp` int(15) NOT NULL default '0',
`thumbsDown` int(15) NOT NULL default '0',
`status` int(1) NOT NULL default '1',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7190 ;
.
CREATE TABLE IF NOT EXISTS `discusTopics` (
`id` bigint(255) NOT NULL auto_increment,
`groupID` bigint(255) NOT NULL,
`memberID` bigint(255) NOT NULL,
`name` varchar(255) NOT NULL,
`views` bigint(255) NOT NULL default '0',
`lastUpdated` bigint(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `groupID` (`groupID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=913 ;

SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
JOIN `discusTopics` ON `discusComments`.`topicID` = `discusTopics`.`id`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Joining the topicid in both tables solved the memberID issue. Thanks #Andiry M

You need to use just JOIN not LEFT JOIN and you can add AND discusTopics.memberID = 6 after ON discusComments.memberID = discusTopics.memberID
You can use subqueries lik this
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments` where `discusComments`.`memberID` in
(select distinct memberid from `discusTopics` WHERE GROUPID = 6)

If i understand your question right you do not need to use JOIN here at all. JOINs are needed in case when you have many to many relationships and you need for each value in one table select all corresponding values in another table.
But here you have many to one relationship if i got it right. Then you can simply do select from two tables like this
SELECT a.*, b.id FROM a, b WHERE a.pid = b.id
This is simple request and won't create a giant overhead as JOIN does
PS: In the future try to experiment with your queries, try to avoid JOINs especially in MySQL. They are slow and dangerous in their complexity. For 90% of cases when you want to use JOIN there is simple and much faster solution.

Eliminating values from one table with another. Super slow

In the same datbase I have a table messages whos columns: id, title, text I want. I want only the records of which title has no entries in the table lastlogon who's title equivalent is then named username.
I have been using this SQL command in PHP, it generally took 2-3 seconds to pull up:
SELECT DISTINCT * FROM messages WHERE title NOT IN (SELECT username FROM lastlogon) LIMIT 1000
This was all good until the table lastlogon started to have about 80% of the values table messages. Messages has about 8000 entries, lastlogon about 7000. Now it takes about a minute to 2 minutes for it to go through. MySQL shoots up to very high CPU usage.
I tried the following but had no luck reducing the time:
SELECT id,title,text FROM messages a LEFT OUTER JOIN lastlogon b ON (a.title = b.username) LIMIT 1000
Why all of a sudden is it taking so long for such low amount of entries? I tried restarting mysql and apache multiple times. I am using debian linux.
Edit: Here are the structures
--
-- Table structure for table `lastlogon`
--
CREATE TABLE IF NOT EXISTS `lastlogon` (
`username` varchar(25) NOT NULL,
`lastlogon` date NOT NULL,
`datechecked` date NOT NULL,
PRIMARY KEY (`username`),
KEY `username` (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `messages`
--
CREATE TABLE IF NOT EXISTS `messages` (
`id` smallint(9) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`email` varchar(50) NOT NULL,
`text` mediumtext,
`folder` tinyint(2) NOT NULL,
`read` smallint(5) unsigned NOT NULL,
`dateline` int(10) unsigned NOT NULL,
`ip` varchar(15) NOT NULL,
`attachment` varchar(255) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`username` varchar(300) NOT NULL,
`error` varchar(500) NOT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=9010 ;
Edit 2
Edited structure with new indexes.
After putting an index on both messages.title and lastlogon.username I came up with these results:
Showing rows 0 - 29 (623 total, Query took 74.4938 sec)

First: replace the key on title, with a compound key on title + id
ALTER TABLE messages DROP INDEX title;
ALTER TABLE messages ADD INDEX title (title, id);
Now change the select to:
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
LIMIT 1000;
Or
SELECT m.* FROM messages m
WHERE m.title NOT IN (SELECT l.username FROM lastlogon l)
-- GROUP BY m.id DESC -- faster than distinct, I don't think you need it though.
LIMIT 1000;
Another problem with the slowness is the SELECT m.* part.
By selecting all column, you are forcing MySQL to do extra work.
Only select the columns you need:
SELECT m.title, m.name, m.email, ......
This will speed up the query as well.
There's another trick you can use:
Replace the limit 1000 with a cutoff date.
Step 1: Add an index on timestamp (or whatever field you want to use for the cutoff).
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE (m.id > (SELECT MIN(M2.ID) FROM messages m2 WHERE m2.timestamp >= '2011-09-01'))
AND l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.

I suggest you to add an index on messages.title . Then try to run again the query and test the performance.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008