Group Concat Or Something Else? Mysql Query Taking So Long - mysql

This is my Tables structures:
DROP TABLE IF EXISTS `alljobs`;
CREATE TABLE IF NOT EXISTS `alljobs`
(
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Department` varchar(50) NULL,
`SourceSite` varchar(50) NULL,
`Title` varchar(255) NULL,
`Description` text NULL,
`JobType` varchar(50) NULL,
`MainCategory` varchar(100) NULL,
`SubCategory` varchar(100) NULL,
`Url` varchar(255) NULL,
`TimeOfCreation` varchar(50) NULL,
`UnixTime` varchar(25) NULL,
PRIMARY KEY (`ID`),
KEY `Department` (`Department`),
KEY `SourceSite` (`SourceSite`),
KEY `Title` (`Title`),
KEY `MainCategory` (`MainCategory`),
KEY `SubCategory` (`SubCategory`),
KEY `UnixTime` (`UnixTime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='To Hold All Job Information';
DROP TABLE IF EXISTS `joblocations`;
CREATE TABLE IF NOT EXISTS `joblocations`
(
`ID` int(11) NOT NULL AUTO_INCREMENT,
`jobId` varchar(25) NULL,
`Location` varchar(100) NULL,
PRIMARY KEY (`ID`),
KEY `jobId` (`jobId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='To Hold Job Locations';
this is the query i am trying:
(it is basically the query if user doesn't apply any filter on the search page)
Select j.Department, j.SourceSite, j.Title, j.Url,
Group_Concat(l.Location Separator '||') As Locations
From alljobs as j
Left Outer Join joblocations as l On l.jobId = j.ID
Group By j.ID
Order By j.Title
Limit 25 Offset 0;
But it's taking 4-5 minutes to execute the query on phpMyadmin and on php it's just timing out..
there is only 30-40k data in both tables.
but if the user apply any search filter and execute it, it's takes less than a second (as per phpMyAdmin results saying):
Select j.Department, j.SourceSite, j.Title, j.Url,
Group_Concat(l.Location Separator '||') As Locations
From alljobs as j
Left Outer Join joblocations as l On l.jobId = j.ID
Where Department In ('Healthcare', 'Food', 'Technology', 'Lifestyle')
And MainCategory Like '%Marketing%'And Location Like '%California%'
Group By j.ID
Order By j.Title
Limit 25 Offset 0 ;
so, i don't understand what i am doing wrong here? i just wanted to have first 25 rows of data and concat the locations on 2nd table (matched by job id) in to single column.
any suggestion will be highly appreciated.
best regards

I don't see a real reason why your query should be so slow. However, since you group by j.ID and then order by j.Title, followed by a limit, the query has to process all data in the two tables before it can output the result. You only need 25 of the 30-40k rows from the alljobs table.
So let's rethink this. First write down the query for one table:
SELECT
j.Department,
j.SourceSite,
j.Title,
j.Url,
<... Locations... >
FROM alljobs AS j
ORDER BY j.Title
LIMIT 25 OFFSET 0;
This should be very quick, since j.Title is an index. The only problem left is, of course, that Locations doesn't work. We could write a subquery for this:
SELECT
j.Department,
j.SourceSite,
j.Title,
j.Url,
(SELECT
GROUP_CONCAT(l.Location SEPARATOR '||')
FROM joblocations AS l
WHERE l.jobId = j.ID) AS Locations
FROM alljobs AS j
ORDER BY j.Title
LIMIT 25 OFFSET 0;
The query now has to execute this subquery 25 times. That's quite a lot but still a lot less than going through the whole table. What should make it reasonable quick is that l.jobId is an index.
The reason your query with the search filters is a lot quicker is because you already filter out a lot of rows in advance with those search filters. A smaller data set is just quicker. However, the data set might not be small for some search filters, making the query take a lot longer.
If you want to use a query with a subquery, as shown above, it should look something like:
SELECT
j.Department,
j.SourceSite,
j.Title,
j.Url,
(SELECT
GROUP_CONCAT(l.Location SEPARATOR '||')
FROM joblocations AS l
WHERE l.jobId = j.ID) AS Locations
FROM alljobs AS j
WHERE j.Department IN ('Healthcare', 'Food', 'Technology', 'Lifestyle')
AND j.MainCategory LIKE '%Marketing%'
AND EXISTS (SELECT ID
FROM joblocations AS m
WHERE m.jobId = j.ID
AND m.Location LIKE '%California%')
ORDER BY j.Title
LIMIT 25 OFFSET 0;
I hope this works.

Related

Mysql query with multiple selects results in high CPU load

I'm trying to do a link exchange script and run into a bit of trouble.
Each link can be visited by an IP address a number of x times (frequency in links table). Each visit costs a number of credits (spend limit given in limit in links table)
I've got the following tables:
CREATE TABLE IF NOT EXISTS `contor` (
`key` varchar(25) NOT NULL,
`uniqueHandler` varchar(30) DEFAULT NULL,
`uniqueLink` varchar(30) DEFAULT NULL,
`uniqueUser` varchar(30) DEFAULT NULL,
`owner` varchar(50) NOT NULL,
`ip` varchar(15) DEFAULT NULL,
`credits` float NOT NULL,
`tstamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`key`),
KEY `uniqueLink` (`uniqueLink`),
KEY `uniqueHandler` (`uniqueHandler`),
KEY `uniqueUser` (`uniqueUser`),
KEY `owner` (`owner`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `links` (
`unique` varchar(30) NOT NULL DEFAULT '',
`url` varchar(1000) DEFAULT NULL,
`frequency` varchar(5) DEFAULT NULL,
`limit` float NOT NULL DEFAULT '0',
PRIMARY KEY (`unique`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I've got the following query:
$link = MYSQL_QUERY("
SELECT *
FROM `links`
WHERE (SELECT count(key) FROM contor WHERE ip = '$ip' AND contor.uniqueLink = links.unique) <= `frequency`
AND (SELECT sum(credits) as cost FROM contor WHERE contor.uniqueLink = links.unique) <= `limit`")
There are 20 rows in the table links.
The problem is that whenever there are about 200k rows in the table contor the CPU load is huge.
After applying the solution provided by #Barmar:
Added composite index on (uniqueLink, ip) and droping all other indexes except PRIMARY, EXPLAIN gives me this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY l ALL NULL NULL NULL NULL 18
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 15
2 DERIVED pop_contor index NULL contor_IX1 141 NULL 206122
Try using a join rather than a correlated subquery.
SELECT l.*
FROM links AS l
LEFT JOIN (
SELECT uniqueLink, SUM(ip = '$ip') AS ip_visits, SUM(credits) AS total_credits
FROM contor
GROUP BY uniqueLink
) AS c
ON c.uniqueLink = l.unique AND ip_visits <= frequency AND total_credits <= limit
If this doesn't help, try adding an index on contor.ip.
The current query is of the form:
SELECT l.*
FROM `links` l
WHERE l.frequency >= ( SELECT COUNT(ck.key)
FROM contor ck
WHERE ck.uniqueLink = l.unique
AND ck.ip = '$ip'
)
AND l.limit >= ( SELECT SUM(sc.credits)
FROM contor sc
WHERE sc.uniqueLink = l.unique
)
Those correlated subqueries are going to each your lunch. And your lunchbox too.
I'd suggest testing an inline view that performs both of the aggregations from contor in one pass, and then join the result from that to the links table.
Something like this:
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip' AND c.key IS NOT NULL) AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits
For optimal performance of the aggregation inline view query, provide a covering index that MySQL can use to optimize the GROUP BY (avoiding a Using filesort operation)
CREATE INDEX `contor_IX1` ON `contor` (`uniqueLink`, `credits`, `ip`) ;
Adding that index renders the uniqueLink index redundant, so also...
DROP INDEX `uniqueLink` ON `contor` ;
EDIT
Since we have a guarantee that contor.key column is non-NULL (i.e. the NOT NULL constraint), this part of the query above is unneeded AND c.key IS NOT NULL, and can be removed. (I also removed the key column from the covering index definition above.)
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip') AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits

Optimize a query

How can I proceed to make my response time more faster, approximately the average time of response is 0.2s ( 8039 records in my items table & 81 records in my tracking table )
Query
SELECT a.name, b.cnt FROM `items` a LEFT JOIN
(SELECT guid, COUNT(*) cnt FROM tracking WHERE
date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day ) GROUP BY guid) b ON
a.`id` = b.guid WHERE a.`type` = 'streaming' AND a.`state` = 1
ORDER BY b.cnt DESC LIMIT 15 OFFSET 75
Tracking table structure
CREATE TABLE `tracking` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`ip` int(11) NOT NULL,
`date` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `i1` (`ip`,`guid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4303 DEFAULT CHARSET=latin1;
Items table structure
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`embed` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`tags` varchar(255) DEFAULT NULL,
`date` int(11) DEFAULT NULL,
`vote_val_total` float DEFAULT '0',
`vote_total` float(11,0) DEFAULT '0',
`rate` float DEFAULT '0',
`icon` text CHARACTER SET ascii,
`state` int(11) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9258 DEFAULT CHARSET=latin1;
Your query, as written, doesn't make much sense. It produces all possible combinations of rows in your two tables and then groups them.
You may want this:
SELECT a.*, b.cnt
FROM `items` a
LEFT JOIN (
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE `date` > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day)
GROUP BY guid
) b ON a.guid = b.guid
ORDER BY b.cnt DESC
The high-volume data in this query come from the relatively large tracking table. So, you should add a compound index to it, using the columns (date, guid). This will allow your query to random-access the index by date and then scan it for guid values.
ALTER TABLE tracking ADD INDEX guid_summary (`date`, guid);
I suppose you'll see a nice performance improvement.
Pro tip: Don't use SELECT *. Instead, give a list of the columns you want in your result set. For example,
SELECT a.guid, a.name, a.description, b.cnt
Why is this important?
First, it makes your software more resilient against somebody adding columns to your tables in the future.
Second, it tells the MySQL server to sling around only the information you want. That can improve performance really dramatically, especially when your tables get big.
Since tracking has significantly fewer rows than items, I will propose the following.
SELECT i.name, c.cnt
FROM
(
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day )
GROUP BY guid
) AS c
JOIN items AS i ON i.id = c.guid
WHERE i.type = 'streaming'
AND i.state = 1;
ORDER BY c.cnt DESC
LIMIT 15 OFFSET 75
It will fail to display any items for which cnt is 0. (Your version displays the items with NULL for the count.)
Composite indexes needed:
items: The PRIMARY KEY(id) is sufficient.
tracking: INDEX(date, guid) -- "covering"
Other issues:
If ip is an IP-address, it needs to be INT UNSIGNED. But that covers only IPv4, not IPv6.
It seems like date is not just a "date", but really a date+time. Please rename it to avoid confusion.
float(11,0) -- Don't use FLOAT for integers. Don't use (m,n) on FLOAT or DOUBLE. INT UNSIGNED makes more sense here.
OFFSET is naughty when it comes to performance -- it must scan over the skipped records. But, in your query, there is no way to avoid collecting all the possible rows, sorting them, stepping over 75, and only finally delivering 15 rows. (And, with no more than 81, it won't be a full 15.)
What version are you using? There have been important changes to the Optimization of LEFT JOIN ( SELECT ... ). Please provide EXPLAIN SELECT for each query under discussion.

Optimize mysql with inner joins and where

I have query:
SELECT DISTINCT h.id,
h.host
FROM pozycje p
INNER JOIN hosty h ON p.host_id = h.id
INNER JOIN keywordy k ON k.id=p.key_id
AND k.bing=0
WHERE h.archive_data_checked IS NULL LIMIT 20
It's fast when some rows exists but if no results exists it takes 2,3 sek to execute. I would like to have less than 1 sec. Explain looks like:
http://tinyurl.com/gogx42n
Table pozycje has 30 000 000 rows, hosty has 4 000 000 rows and keywordy has 40 000 rows. Engine InnoDB, server with 32GB RAM
What indexes or improvements can I do to spped up query when no results exists?
edit:
show table keywordy;
CREATE TABLE `keywordy` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`main_kw` varchar(255) CHARACTER SET utf8 NOT NULL,
`keyword` varchar(255) CHARACTER SET utf8 NOT NULL,
`lang` varchar(10) CHARACTER SET utf8 NOT NULL,
`searches` int(11) NOT NULL,
`cpc` float NOT NULL,
`competition` float NOT NULL,
`currency` varchar(10) CHARACTER SET utf8 NOT NULL,
`data` date DEFAULT NULL,
`adwords` int(11) NOT NULL,
`monitoring` tinyint(1) NOT NULL DEFAULT '0',
`bing` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `keyword` (`keyword`,`lang`),
KEY `id_bing` (`id`,`bing`)
) ENGINE=InnoDB AUTO_INCREMENT=38362 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
can pls test this:
SELECT DISTINCT h.id,
h.host
FROM hosty h
WHERE
EXISTS ( SELECT 1 FROM keywordy WHERE id=p.key_id AND bing=0)
AND
EXISTS ( SELECT 1 FROM pozycje WHERE host_id = h.id)
AND h.archive_data_checked IS NULL LIMIT 20
I would first offer the following question. Which would have the smaller "set" if you did a query on
select count(*) from KeyWordy where bing = 0
vs
select count(*) from hosty where archive_date_checked IS NULL
I would then try to optimize the query knowing the smaller set and work with that as my primary criteria for indexing. If KeyWordy is more likely to be the smaller set, I would offer your tables to have the following indexes
table index
keywordy (bing, id) specifically NOT (id, bing) as bing FIRST is optimized for where or JOIN clause
pozycje (key_id, host_id )
hosty (archive_data_checked, id, host)
SELECT DISTINCT
h.id,
h.host
FROM
Keywordy k
JOIN pozycje p
ON k.id = p.key_id
JOIN hosty h
on archive_data_checked IS NULL
AND p.host_id = h.id
WHERE
k.bing = 0
LIMIT
20
if the HOSTY table would be smaller base on the archive_data_checked IS NULL, I offer the following
table index
pozycje (host_id, key_id ) reversed of other option
SELECT DISTINCT
h.id,
h.host
FROM
hosty h
JOIN pozycje p
ON h.id = p.host_id
JOIN Keywordy k
on k.bing = 0
AND p.key_id = k.id
WHERE
h.archive_data_checked IS NULL
LIMIT
20
One FINAL option, might be to add the keyword "STRAIGHT_JOIN" such as
select STRAIGHT_JOIN DISTINCT ... rest of query
If it works for you, what timing improvements does this offer.

MySQL joining most recent record from another query is slow

I have the two following tables:
CREATE TABLE `modlogs` (
`mod` int(11) NOT NULL,
`ip` varchar(39) CHARACTER SET ascii NOT NULL,
`board` varchar(58) CHARACTER SET utf8 DEFAULT NULL,
`time` int(11) NOT NULL,
`text` text NOT NULL,
KEY `time` (`time`),
KEY `mod` (`mod`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4
CREATE TABLE `mods` (
`id` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(30) NOT NULL,
`password` char(64) CHARACTER SET ascii NOT NULL COMMENT 'SHA256',
`salt` char(32) CHARACTER SET ascii NOT NULL,
`type` smallint(2) NOT NULL,
`boards` text CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`,`username`)
) ENGINE=MyISAM AUTO_INCREMENT=933 DEFAULT CHARSET=utf8mb4
I want to join the most recent log entry with the mod's name, however my query is very slow (takes 5.23 seconds):
SELECT *
FROM mods LEFT JOIN
modlogs
ON modlogs.mod = mods.id
AND modlogs.time = (SELECT MAX(time)
FROM mods
WHERE mods.id = modlogs.mod
);
All other answers on SO also seem to use dependent subqueries. Is there a way I can do this in a way that will return results more quickly?
Here's another solution, putting the subquery into a derived table avoids the problem of a dependent subquery. It'll run the subquery just once.
SELECT *
FROM mods AS m
LEFT JOIN (
SELECT ml1.*
FROM modlogs AS ml1
JOIN (
SELECT `mod`, MAX(time) AS time
FROM modlogs
GROUP BY `mod`
) AS ml2 USING (`mod`, time)
) AS ml ON m.id = ml.`mod`;
This is your query:
SELECT *
FROM mods LEFT JOIN
modlogs
ON modlogs.mod = (SELECT MAX(time)
FROM modlogs
WHERE mods.id = modlogs.mod
);
This query does not make sense. You are comparing something called mod to a max time. Sounds like it won't work to me, but then there are some very "clever" data models out there. I suspect you really want:
SELECT *
FROM mods LEFT JOIN
modlogs
ON mods.id = modlods.mod and
modlogs.time = (SELECT MAX(time)
FROM mods
WHERE mods.id = modlogs.mod
);
I wouldn't write the query this way, because join conditions in the on clause seem confusing to me. But, you did. You can get better performance with an index. I would suggest:
create index modlogs_mod_time on modlogs(mod, time);
I would write the query as:
SELECT *
FROM mods LEFT JOIN
modlogs
ON mods.id = modlods.mod
WHERE NOT EXISTS (SELECT 1
FROM modlogs ml2
WHERE modlogs.mod = ml2.mod and
ml2.time > modlogs.time
);
I think you can also solve this one with an anti-join, though I'm skeptical of the performance on this one:
SELECT mods.*, modlogs.*
FROM mods
LEFT JOIN modlogs
ON modlogs.mod = mods.id
LEFT JOIN mods m2
ON m2.id = modlogs.mod
AND m2.time < modlogs.time
WHERE m2.id IS NULL
Ensure you have an index on modlogs(mod), and consider the index mods(id, time) for better performance.

Eliminating values from one table with another. Super slow

In the same datbase I have a table messages whos columns: id, title, text I want. I want only the records of which title has no entries in the table lastlogon who's title equivalent is then named username.
I have been using this SQL command in PHP, it generally took 2-3 seconds to pull up:
SELECT DISTINCT * FROM messages WHERE title NOT IN (SELECT username FROM lastlogon) LIMIT 1000
This was all good until the table lastlogon started to have about 80% of the values table messages. Messages has about 8000 entries, lastlogon about 7000. Now it takes about a minute to 2 minutes for it to go through. MySQL shoots up to very high CPU usage.
I tried the following but had no luck reducing the time:
SELECT id,title,text FROM messages a LEFT OUTER JOIN lastlogon b ON (a.title = b.username) LIMIT 1000
Why all of a sudden is it taking so long for such low amount of entries? I tried restarting mysql and apache multiple times. I am using debian linux.
Edit: Here are the structures
--
-- Table structure for table `lastlogon`
--
CREATE TABLE IF NOT EXISTS `lastlogon` (
`username` varchar(25) NOT NULL,
`lastlogon` date NOT NULL,
`datechecked` date NOT NULL,
PRIMARY KEY (`username`),
KEY `username` (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `messages`
--
CREATE TABLE IF NOT EXISTS `messages` (
`id` smallint(9) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`email` varchar(50) NOT NULL,
`text` mediumtext,
`folder` tinyint(2) NOT NULL,
`read` smallint(5) unsigned NOT NULL,
`dateline` int(10) unsigned NOT NULL,
`ip` varchar(15) NOT NULL,
`attachment` varchar(255) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`username` varchar(300) NOT NULL,
`error` varchar(500) NOT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=9010 ;
Edit 2
Edited structure with new indexes.
After putting an index on both messages.title and lastlogon.username I came up with these results:
Showing rows 0 - 29 (623 total, Query took 74.4938 sec)
First: replace the key on title, with a compound key on title + id
ALTER TABLE messages DROP INDEX title;
ALTER TABLE messages ADD INDEX title (title, id);
Now change the select to:
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
LIMIT 1000;
Or
SELECT m.* FROM messages m
WHERE m.title NOT IN (SELECT l.username FROM lastlogon l)
-- GROUP BY m.id DESC -- faster than distinct, I don't think you need it though.
LIMIT 1000;
Another problem with the slowness is the SELECT m.* part.
By selecting all column, you are forcing MySQL to do extra work.
Only select the columns you need:
SELECT m.title, m.name, m.email, ......
This will speed up the query as well.
There's another trick you can use:
Replace the limit 1000 with a cutoff date.
Step 1: Add an index on timestamp (or whatever field you want to use for the cutoff).
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE (m.id > (SELECT MIN(M2.ID) FROM messages m2 WHERE m2.timestamp >= '2011-09-01'))
AND l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
I suggest you to add an index on messages.title . Then try to run again the query and test the performance.