Optimize MySQL UPDATE query that contains WHERE and ORDER BY? - mysql

How can I optimize this query? If I run it without the ORDER BY clause, it executes in <100ms. With the ORDER BY clause it takes many seconds, and crushes the server when more than one system is trying to make this query at once.
UPDATE companies
SET
crawling = 1
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1;
If I run this query as a SELECT, it's also fast ( <100ms ).
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1;
Here is my table schema:
CREATE TABLE `companies` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) DEFAULT NULL,
`url_scheme` varchar(10) DEFAULT NULL,
`url_host` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`crawl` tinyint(1) unsigned NOT NULL DEFAULT '1',
`crawling` tinyint(1) unsigned NOT NULL DEFAULT '0',
`last_crawled` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `url_host` (`url_host`),
KEY `crawl` (`crawl`),
KEY `crawling` (`crawling`),
KEY `last_crawled` (`last_crawled`),
KEY `url_scheme` (`url_scheme`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
UPDATE ONE
This query gives me the following error: You can't specify target table 'companies' for update in FROM clause
UPDATE companies
SET crawling = 1
WHERE id = (
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1
);
This query gives me the following error: This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
UPDATE companies
SET crawling = 1
WHERE id in (
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1
);

try not to use ORDER-BY and LIMIT for such small number of updates.
UPDATE companies t1
join
(
SELECT c.id,#RowNum:=#RowNum+1 AS RowID
FROM companies c, (SELECT #RowNum := 0)r
WHERE c.crawling = 0 AND c.url_host IS NOT NULL
ORDER BY c.last_crawled ASC
)t2
ON t2.RowID=1 AND t1.id=t2.id
SET t1.crawling = 1
EDIT:1
make sure you have the index on (last_crawled ASC , id ASC)
UPDATE companies t1
join
(
Select ID,RowID
From
(
SELECT c.id,#RowNum:=#RowNum+1 AS RowID
FROM companies c, (SELECT #RowNum := 0)r
WHERE c.crawling = 0 AND c.url_host IS NOT NULL
ORDER BY c.last_crawled ASC
)t2
WHERE ROWID=1
)t3
ON t1.id=t3.id
SET t1.crawling = 1

Related

How to get NULL at the top of the sort?

Two tables are defined:
CREATE TABLE `users` (
`user_id` mediumint(6) unsigned NOT NULL AUTO_INCREMENT,
`score` tinyint(1) unsigned DEFAULT NULL,
PRIMARY KEY (`user_id`)
);
CREATE TABLE `online` (
`user_id` mediumint(6) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) NOT NULL,
PRIMARY KEY (`user_id`)
);
How to combine the tables so that the result would be sorted by the score field from the largest to the smallest but at the top there were records with the value NULL?
This query does not sort the second sample:
(SELECT * FROM `online` JOIN `users` USING(`user_id`) WHERE `score` IS NULL)
UNION
(SELECT * FROM `online` JOIN `users` USING(`user_id`) WHERE `score` IS NOT NULL ORDER BY `score` DESC)
Use two keys in the sort:
SELECT *
FROM `online` o JOIN
`users`
USING (user_id)
ORDER BY (`score` IS NULL) DESC, Score DESC;
MySQL treats booleans as numbers in a numeric context, with "1" for true and "0" for false. So, DESC puts the true values first.
Incidentally, your version would look like it works if you used UNION ALL rather than UNION. However, it is not guaranteed that the results are in any particular order unless you explicitly have an ORDER BY.
The UNION incurs overhead for removing duplicates and in doing so rearranges the data.
Try:
select * from online join users using (user_id) order by ifnull(score, 10) desc;
You can use order by Nulls Last in the end of your sql to show nulls on the first.
You can try below -
select * from
(
SELECT *,1 as ord FROM `online` JOIN `users` USING(`user_id`) WHERE `score` IS NULL
UNION
SELECT *,2 FROM `online` JOIN `users` USING(`user_id`) WHERE `score` IS NOT NULL
)A ORDER BY ord asc,`score` DESC

Mysql query with multiple selects results in high CPU load

I'm trying to do a link exchange script and run into a bit of trouble.
Each link can be visited by an IP address a number of x times (frequency in links table). Each visit costs a number of credits (spend limit given in limit in links table)
I've got the following tables:
CREATE TABLE IF NOT EXISTS `contor` (
`key` varchar(25) NOT NULL,
`uniqueHandler` varchar(30) DEFAULT NULL,
`uniqueLink` varchar(30) DEFAULT NULL,
`uniqueUser` varchar(30) DEFAULT NULL,
`owner` varchar(50) NOT NULL,
`ip` varchar(15) DEFAULT NULL,
`credits` float NOT NULL,
`tstamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`key`),
KEY `uniqueLink` (`uniqueLink`),
KEY `uniqueHandler` (`uniqueHandler`),
KEY `uniqueUser` (`uniqueUser`),
KEY `owner` (`owner`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `links` (
`unique` varchar(30) NOT NULL DEFAULT '',
`url` varchar(1000) DEFAULT NULL,
`frequency` varchar(5) DEFAULT NULL,
`limit` float NOT NULL DEFAULT '0',
PRIMARY KEY (`unique`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I've got the following query:
$link = MYSQL_QUERY("
SELECT *
FROM `links`
WHERE (SELECT count(key) FROM contor WHERE ip = '$ip' AND contor.uniqueLink = links.unique) <= `frequency`
AND (SELECT sum(credits) as cost FROM contor WHERE contor.uniqueLink = links.unique) <= `limit`")
There are 20 rows in the table links.
The problem is that whenever there are about 200k rows in the table contor the CPU load is huge.
After applying the solution provided by #Barmar:
Added composite index on (uniqueLink, ip) and droping all other indexes except PRIMARY, EXPLAIN gives me this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY l ALL NULL NULL NULL NULL 18
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 15
2 DERIVED pop_contor index NULL contor_IX1 141 NULL 206122
Try using a join rather than a correlated subquery.
SELECT l.*
FROM links AS l
LEFT JOIN (
SELECT uniqueLink, SUM(ip = '$ip') AS ip_visits, SUM(credits) AS total_credits
FROM contor
GROUP BY uniqueLink
) AS c
ON c.uniqueLink = l.unique AND ip_visits <= frequency AND total_credits <= limit
If this doesn't help, try adding an index on contor.ip.
The current query is of the form:
SELECT l.*
FROM `links` l
WHERE l.frequency >= ( SELECT COUNT(ck.key)
FROM contor ck
WHERE ck.uniqueLink = l.unique
AND ck.ip = '$ip'
)
AND l.limit >= ( SELECT SUM(sc.credits)
FROM contor sc
WHERE sc.uniqueLink = l.unique
)
Those correlated subqueries are going to each your lunch. And your lunchbox too.
I'd suggest testing an inline view that performs both of the aggregations from contor in one pass, and then join the result from that to the links table.
Something like this:
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip' AND c.key IS NOT NULL) AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits
For optimal performance of the aggregation inline view query, provide a covering index that MySQL can use to optimize the GROUP BY (avoiding a Using filesort operation)
CREATE INDEX `contor_IX1` ON `contor` (`uniqueLink`, `credits`, `ip`) ;
Adding that index renders the uniqueLink index redundant, so also...
DROP INDEX `uniqueLink` ON `contor` ;
EDIT
Since we have a guarantee that contor.key column is non-NULL (i.e. the NOT NULL constraint), this part of the query above is unneeded AND c.key IS NOT NULL, and can be removed. (I also removed the key column from the covering index definition above.)
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip') AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits

Sorting result of mysql join by avg of third table?

I have three tables.
One table contains submissions which has about 75,000 rows
One table contains submission ratings and only has < 10 rows
One table contains submission => competition mappings and for my test data also has about 75,000 rows.
What I want to do is
Get the top 50 submissions in a round of a competition.
Top is classified as highest average rating, followed by highest amount of votes
Here is the query I am using which works, but the problem is that it takes over 45 seconds to complete! I profiled the query (results at bottom) and the bottlenecks are copying the data to a tmp table and then sorting it so how can I speed this up?
SELECT `submission_submissions`.*
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
submission_submissions
CREATE TABLE `submission_submissions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`description` varchar(255) DEFAULT NULL,
`genre` int(11) NOT NULL,
`goals` text,
`submission` text NOT NULL,
`date_created` datetime DEFAULT NULL,
`date_modified` datetime DEFAULT NULL,
`date_deleted` datetime DEFAULT NULL,
`cover_image` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `genre` (`genre`),
KEY `account_id` (`account_id`),
KEY `date_created` (`date_created`)
) ENGINE=InnoDB AUTO_INCREMENT=115037 DEFAULT CHARSET=latin1;
submission_ratings
CREATE TABLE `submission_ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`stars` tinyint(1) NOT NULL,
`date_created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `submission_id` (`submission_id`),
KEY `account_id` (`account_id`),
KEY `stars` (`stars`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
competition_submissions
CREATE TABLE `competition_submissions` (
`competition_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`top_round` int(11) DEFAULT '1',
PRIMARY KEY (`submission_id`),
KEY `competition_id` (`competition_id`),
KEY `top_round` (`top_round`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SHOW PROFILE Result (ordered by duration)
state duration (summed) in sec percentage
Copying to tmp table 33.15621 68.46924
Sorting result 11.83148 24.43260
removing tmp table 3.06054 6.32017
Sending data 0.37560 0.77563
... insignificant amounts removed ...
Total 48.42497 100.00000
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE competition_submissions index_merge PRIMARY,competition_id,top_round competition_id,top_round 4,5 18596 Using intersect(competition_id,top_round); Using where; Using index; Using temporary; Using filesort
1 SIMPLE submission_submissions eq_ref PRIMARY PRIMARY 4 inkstakes.competition_submissions.submission_id 1 Using where
1 SIMPLE submission_ratings ALL submission_id 5 Using where; Using join buffer (flat, BNL join)
Assuming that in reality you won't be interested in unrated submissions, and that a given submission only has a single competition_submissions entry for a given match and top_round, I suggest:
SELECT s.*
FROM (SELECT `submission_id`,
AVG(`stars`) AvgStars,
COUNT(`id`) CountId
FROM `submission_ratings`
GROUP BY `submission_id`
ORDER BY AVG(`stars`) DESC, COUNT(`id`) DESC
LIMIT 50) r
JOIN `submission_submissions` s
ON r.`submission_id` = s.`id` AND
s.`date_deleted` IS NULL
JOIN `competition_submissions` c
ON c.`submission_id` = s.`id` AND
c.`top_round` = 1 AND
c.`competition_id` = '2'
ORDER BY r.AvgStars DESC,
r.CountId DESC
(If there is more than one competition_submissions entry per submission for a given match and top_round, then you can add the GROUP BY clause back in to the main query.)
If you do want to see unrated submissions, you can union the results of this query to a LEFT JOIN ... WHERE NULL query.
There is a simple trick that works on MySql and helps to avoid copying/sorting huge temp tables in queries like this (with LIMIT X).
Just avoid SELECT *, this copies all columns to the temporary table, then this huge table is sorted, and in the end, the query takes only 50 records from this huge table ( 50 / 70000 = 0,07 % ).
Select only columns that are really necessary to perform sort and limit, and then join missing columns only for selected 50 records by id.
select ss.*
from submission_submissions ss
join (
SELECT `submission_submissions`.id,
AVG(submission_ratings.`stars`) stars,
COUNT(submission_ratings.`id`) cnt
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
) xx
ON ss.id = xx.id
ORDER BY xx.stars DESC,
xx.cnt DESC;

mysql query with multiple groups?

I have a query that gets product IDs based on keywords.
SELECT indx_search.pid
FROM indx_search
LEFT JOIN word_index_mem ON (word_index_mem.word = indx_search.word)
WHERE indx_search.word = "phone"
GROUP BY indx_search.pid
ORDER BY indx_search.pid ASC
LIMIT 0,20
This works well but now I'm trying to go a step further and implement "price range" into this query.
CREATE TABLE `price_range` (
`pid` int(11) NOT NULL,
`range_id` tinyint(3) NOT NULL,
PRIMARY KEY (`pid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
This table simply contains product IDs and a range_id. The price range values are stored here:
CREATE TABLE `price_range_values` (
`ID` tinyint(3) NOT NULL AUTO_INCREMENT,
`rangeFrom` float(10,2) NOT NULL,
`rangeTo` float(10,2) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM AUTO_INCREMENT=32 DEFAULT CHARSET=latin1
I want to GROUP price_range.range_id with COUNT() of how many products match the certain price range within the current query. I still want to receive my 20 results of product IDs.
So something along the lines of:
SELECT indx_search.pid, price_range.range_id as PriceRangeID, COUNT(price_range.range_id) as PriceGroupTotal
FROM indx_search
LEFT JOIN windex_mem ON ( windex_mem.word = indx_search.word )
LEFT JOIN price_range ON ( price_range.pid = indx_search.pid )
WHERE indx_search.word = "memory"
GROUP BY indx_search.pid, PriceRangeID
ORDER BY indx_search.pid ASC
LIMIT 0 , 20
Is this possible to accomplish without busting an additional query?
Try to use a subquery with a LIMIT clause, e.g. -
SELECT
i_s.pid, p_r.range_id as PriceRangeID, COUNT(p_r.range_id) as PriceGroupTotal
FROM
(SELECT * FROM indx_search WHERE i_s.word = 'memory' ORDER BY i_s.pid ASC LIMIT 0 , 20) i_s
LEFT JOIN
windex_mem w_m ON w_m.word = i_s.word
LEFT JOIN
price_range p_r ON p_r.pid = i_s.pid
GROUP BY
i_s.pid, PriceRangeID

How to delete the records from a table after a limit

I have a table like this:
CREATE TABLE vhist ( id int(10)
unsigned NOT NULL auto_increment,
userId varchar(45) NOT NULL,
mktCode int(10) unsigned NOT NULL,
insertDate datetime NOT NULL,
default NULL, PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
A user can have more than one record.
I need an SQL statement which will keep the most recent 50 records and delete any thing after that limit.
I need that in a single sql statement.
I tried this but failed
delete from vhist v where v.id not in
(select v.id from vhist v where
v.userId=12 order by insertDate desc
limit 50)
but this failed on MYSQL saying IN cannot be used with a limit.
Any help?
You need a subquery, like this:
DELETE FROM vhist WHERE id NOT IN (
SELECT id FROM (
SELECT id FROM vhist WHERE userId = 12 ORDER BY insertDate DESC LIMIT 50
) as foo
);