how to optimize mysql group by - mysql

table:
CREATE TABLE IF NOT EXISTS `l_not_200_page` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`server` tinyint(3) unsigned NOT NULL,
`domain` tinyint(3) unsigned NOT NULL,
`page` varchar(128) NOT NULL,
`query_string` varchar(384) NOT NULL,
`status` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_time_domain_status_page` (`time`,`domain`,`status`,`page`),
KEY `page` (`page`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
explain:
EXPLAIN SELECT *
FROM `l_not_200_page`
WHERE TIME
BETWEEN TIMESTAMP( '2014-03-25' )
AND TIMESTAMP( '2014-03-25 23:59:59' )
AND domain =1
AND STATUS = 404
GROUP BY PAGE
1
SIMPLE
l_not_200_page
range
idx_time_domain_status_page
idx_time_domain_status_page
7
NULL
1
Using where; Using temporary; Using filesort
it's very slow, how to optimize ?
sql:
SELECT PAGE , COUNT( * ) AS cnt
FROM l_not_200_page
WHERE TIME
BETWEEN TIMESTAMP( '2014-03-26 12:00:00' )
AND TIMESTAMP( '2014-03-26 12:30:00' )
AND domain =1
AND STATUS = 499
GROUP BY PAGE ORDER BY cnt DESC
LIMIT 100
the daily amount of data about 900w

Change the index to:
create index `idx_domain_status_time_page` on l_not_200_page(`domain`, `status`, `time`, `page`)
When MySQL uses an index for a where clause, the best index has all the fields in equality comparisons followed by one with an inequality, such as between. With time as the first element, it doesn't use the index for domain and status (well, it uses an index scan instead of a direct lookup).
For further optimization, you can get rid of the group by by choosing one row per page:
SELECT lp.* FROM l_not_200_page lp WHERE TIME BETWEEN TIMESTAMP( '2014-03-25' ) AND TIMESTAMP( '2014-03-25 23:59:59' ) AND
domain = 1 AND STATUS = 404 AND
NOT EXISTS (select 1
from l_not_200_page lp2
where lp2.page = lp.page and
lp2.domain = 1 and lp2.status = 404 and
lp2.TIME BETWEEN TIMESTAMP('2014-03-25) AND TIMESTAMP('2014-03-25 23:59:59') AND
lp2.id > lp.id
)
For this, an additional index on (page, domain, status, time) would help.

Related

SQL Views for single row (with where clause)

I'm working on a system that administrates courses, with multiple classes, with multiple lessons and multiple consumers on them. As the system grows more data were required so with some performance issues I've decided to go with SQL Views. We're using MySQL.
So I've replaced old calls to the DB (for example for the single lesson)
select * from `courses_classes_lessons` where `courses_classes_lessons`.`deleted_at` is null limit 1;
select count(consumer_id) as consumers_count from `courses_classes_lessons_consumers` where `lesson_id` = '448' limit 1;
select `max_consumers` from `courses_classes` where `id` = '65' limit 1;
select `id` from `courses_classes_lessons` left join `courses_classes_lessons_consumers` on `courses_classes_lessons_consumers`.`lesson_id` = `courses_classes_lessons`.`id` where `id` = '448' group by `courses_classes_lessons`.`id` having count(courses_classes_lessons_consumers.consumer_id) < '4' limit 1;
select courses_classes.max_consumers - LEAST(count(courses_classes_lessons_consumers.consumer_id), courses_classes.max_consumers) as available_spaces from `courses_classes_lessons` left join `courses_classes_lessons_consumers` on `courses_classes_lessons_consumers`.`lesson_id` = `courses_classes_lessons`.`id` left join `courses_classes` on `courses_classes_lessons`.`class_id` = `courses_classes`.`id` where `courses_classes_lessons`.`id` = '448' group by `courses_classes`.`id` limit 1;
The above took around 4-5ms
with the SQL View as follow:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
(SELECT
max_consumers
FROM
courses_classes
WHERE
id = courses_classes_lessons.class_id
LIMIT 1) AS class_max_consumers,
(SELECT
count(consumer_id)
FROM
courses_classes_lessons_consumers
WHERE
lesson_id = courses_classes_lessons.id) AS consumers_count,
(SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
The problem I'm having is that doesn't matter if I'm loading the whole View or a single row from it - it always takes about 6-9s to load! But when I've tried the same query with a WHERE clause it takes about 500μs. I'm new to SQL View and confused - why there are no indexes/primary keys that I could use to load a single row quickly? Am I doing something wrong?
EXPLAIN RESULT
INSERT INTO `courses_classes` (`id`, `select_type`, `table`, `partitions`, `type`, `possible_keys`, `key`, `key_len`, `ref`, `rows`, `filtered`, `Extra`) VALUES
(1, 'PRIMARY', 'courses_classes_lessons', NULL, 'ALL', NULL, NULL, NULL, NULL, 478832, 100.00, NULL),
(3, 'DEPENDENT SUBQUERY', 'courses_classes_lessons_consumers', NULL, 'ref', 'PRIMARY,courses_classes_lessons_consumers_lesson_id_index', 'courses_classes_lessons_consumers_lesson_id_index', '4', 'api.courses_classes_lessons.id', 3, 100.00, 'Using index'),
(2, 'DEPENDENT SUBQUERY', 'courses_classes', NULL, 'eq_ref', 'PRIMARY,courses_classes_id_parent_id_index', 'PRIMARY', '4', 'api.courses_classes_lessons.class_id', 1, 100.00, NULL);
TABLE STRUCTURE
Lessons
CREATE TABLE `courses_classes_lessons` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`franchisee_id` int(10) unsigned NOT NULL,
`class_id` int(10) unsigned NOT NULL,
`instructor_id` int(10) unsigned NOT NULL,
`instructor_rate` int(10) unsigned NOT NULL DEFAULT '0',
`instructor_total` int(10) unsigned NOT NULL DEFAULT '0',
`instructor_paid` tinyint(1) NOT NULL DEFAULT '0',
`starts_at` timestamp NULL DEFAULT NULL,
`ends_at` timestamp NULL DEFAULT NULL,
`completed_at` timestamp NULL DEFAULT NULL,
`cancelled_at` timestamp NULL DEFAULT NULL,
`cancelled_reason` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`cancelled_reason_extra` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `courses_classes_lessons_franchisee_id_foreign` (`franchisee_id`),
KEY `courses_classes_lessons_class_id_foreign` (`class_id`),
KEY `courses_classes_lessons_instructor_id_foreign` (`instructor_id`),
KEY `courses_classes_lessons_starts_at_ends_at_index` (`starts_at`,`ends_at`),
KEY `courses_classes_lessons_completed_at_index` (`completed_at`),
KEY `courses_classes_lessons_cancelled_at_index` (`cancelled_at`),
KEY `courses_classes_lessons_class_id_deleted_at_index` (`class_id`,`deleted_at`),
KEY `courses_classes_lessons_deleted_at_index` (`deleted_at`),
KEY `class_ownership_index` (`class_id`,`starts_at`,`cancelled_at`,`deleted_at`),
CONSTRAINT `courses_classes_lessons_class_id_foreign` FOREIGN KEY (`class_id`) REFERENCES `courses_classes` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_franchisee_id_foreign` FOREIGN KEY (`franchisee_id`) REFERENCES `franchisees` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_instructor_id_foreign` FOREIGN KEY (`instructor_id`) REFERENCES `instructors` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=487853 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Lessons consumers
CREATE TABLE `courses_classes_lessons_consumers` (
`lesson_id` int(10) unsigned NOT NULL,
`consumer_id` int(10) unsigned NOT NULL,
`present` tinyint(1) DEFAULT NULL,
`plan_id` int(10) unsigned DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`lesson_id`,`consumer_id`),
KEY `courses_classes_lessons_consumers_consumer_id_foreign` (`consumer_id`),
KEY `courses_classes_lessons_consumers_plan_id_foreign` (`plan_id`),
KEY `courses_classes_lessons_consumers_lesson_id_index` (`lesson_id`),
KEY `courses_classes_lessons_consumers_present_index` (`present`),
CONSTRAINT `courses_classes_lessons_consumers_consumer_id_foreign` FOREIGN KEY (`consumer_id`) REFERENCES `customers_consumers` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_consumers_lesson_id_foreign` FOREIGN KEY (`lesson_id`) REFERENCES `courses_classes_lessons` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_consumers_plan_id_foreign` FOREIGN KEY (`plan_id`) REFERENCES `customers_plans` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
From classes it's only using max_consumers int(10) unsigned NOT NULL DEFAULT '0',
UPDATE 1
I've changed SQL View to the following one:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
lessons_consumers.consumers_count AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
JOIN (
SELECT
lesson_id,
count(*) AS consumers_count
FROM
courses_classes_lessons_consumers
GROUP BY
courses_classes_lessons_consumers.lesson_id) AS lessons_consumers ON lessons_consumers.lesson_id = courses_classes_lessons.id;
and even though the SELECT query itself seems to be way slower than the previous one then as the View it seems to perform way better. It's still not as fast as I wish it will be but it's a step forward.
Overall improvement jumps from 6-7s to around 800ms, the aim here is in the area of 500μs-1ms. Any adivces how I can improve my SQL View more?
UPDATE 2
Ok, I've found the bottleneck! Again - it's kinda similar to the last one (SELECT query works fast for a single row, but SQL VIEW is trying to access the whole table at once every time.
My new lesson SQL VIEW:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
IFNULL(lessons_consumers.consumers_count,0) AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
IFNULL(class_max_consumers, 0) - LEAST(IFNULL(consumers_count,0), class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
LEFT JOIN courses_classes_lessons_consumers_view AS lessons_consumers ON lessons_consumers.lesson_id = courses_classes_lessons.id;
Another SQL View - this time for consumers:
CREATE OR REPLACE VIEW `courses_classes_lessons_consumers_view` AS
SELECT
lesson_id,
IFNULL(count(
consumer_id),0) AS consumers_count
FROM
courses_classes_lessons_consumers
GROUP BY
courses_classes_lessons_consumers.lesson_id;
And looks like this one is the trouble maker! The consumers table is above, and here is the explain for the above SELECT query:
INSERT INTO `courses_classes_lessons_consumers` (`id`, `select_type`, `table`, `partitions`, `type`, `possible_keys`, `key`, `key_len`, `ref`, `rows`, `filtered`, `Extra`)
VALUES(1, 'SIMPLE', 'courses_classes_lessons_consumers', NULL, 'index', 'PRIMARY,courses_classes_lessons_consumers_consumer_id_foreign,courses_classes_lessons_consumers_plan_id_foreign,courses_classes_lessons_consumers_lesson_id_index,courses_classes_lessons_consumers_present_index', 'courses_classes_lessons_consumers_lesson_id_index', '4', NULL, 1330649, 100.00, 'Using index');
Any idea how to spread up this count?
Consider writing a Stored procedure; it may be able to get the 448 put into place to be better optimized.
If you know there will be only one row (such as when doing COUNT(*)), skip the LIMIT 1.
Unless consumer_id might be NULL, use COUNT(*) instead of COUNT(consumer_id).
A LIMIT without an ORDER BY leaves you getting a random row.
If courses_classes_lessons_consumers is a many-to-many mapping table, I will probably have some index advice after I see SHOW CREATE TABLE.
Which of the 5 SELECTs is the slowest?
After many attempts, it looks like that the Procedure way is the best approach and I won't be spending more time on the SQL Views
Here's the procedure I wrote:
CREATE PROCEDURE `LessonData`(
IN lessonId INT(10)
)
BEGIN
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
IFNULL((SELECT
count(consumer_id) as consumers_count
FROM
courses_classes_lessons_consumers
WHERE
lesson_id = courses_classes_lessons.id
GROUP BY
courses_classes_lessons_consumers.lesson_id), 0) AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
WHERE courses_classes_lessons.id = lessonId;
END
And the execution time for it is around 500μs-1ms.
Thank you all for your help!

select taking 8 seconds. improve ideas

I have this select to get chat (like facebook inbox).
It will show most recent messages, grouping by user who sent them.
SELECT c.id, c.from, c.to, c.sent, c.message, c.recd FROM chat c
WHERE c.id IN(
SELECT MAX(id) FROM chat
WHERE (`to` = 1 and `del_to_status` = '0') or (`from` = 1 and `del_from_status` = '0')
GROUP BY CASE WHEN 1 = `to` THEN `from` ELSE `to` END
)
ORDER BY id DESC
limit 60
The problem is it is taking about 8 seconds.
`chat` (
`id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`from` int(11) UNSIGNED NOT NULL,
`to` int(11) UNSIGNED NOT NULL,
`message` text NOT NULL,
`sent` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`recd` tinyint(1) NOT NULL DEFAULT '0',
`del_from_status` tinyint(1) NOT NULL DEFAULT '0',
`del_to_status` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `from` (`from`),
KEY `to` (`to`),
FOREIGN KEY (`from`) REFERENCES cadastro (`id`),
FOREIGN KEY (`to`) REFERENCES cadastro (`id`)
)
any ideas of indexing or re-writing this select to get better speed?
I am assuming chat.id is indexed. If not, of course you should add an index.
If it is indexed, MySQL is often very slow with sub selects.
One thing you can do is convert your sub select to a temporary table and join with it.
It will look something like
CREATE TEMPORARY TABLE IF NOT EXISTS max_chat_ids
( INDEX(id) )
ENGINE=MEMORY
AS ( 'SELECT MAX(id) as id FROM chat
WHERE (`to` = 1 and `del_to_status` = '0') or (`from` = 1 and `del_from_status` = '0')
GROUP BY CASE WHEN 1 = `to` THEN `from` ELSE `to` END' );
then, you need to just join with the temp table:
SELECT c.id, c.from, c.to, c.sent, c.message, c.recd FROM chat c
join max_chat_ids d on c.id=d.id
ORDER BY c.id DESC
limit 60
temp tables only live during the duration of the session, so if you test this in phpmyadmin remember to execute both queries together with ';' between them.
If you try this share your result.
I'll assume the column id is already indexed since it probably is the primary key of the table. If it's not the case, add the index:
create index ix1_chat on chat (id);
Then, if the selectivity of the subquery is good then an index will help. The selectivity is the percentage of rows the select is reading compared to the total number of rows. Is it 50%, 5%, 0.5%? If it's 5% or less then the following index will help:
create index ix2_chat on chat (`to`, del_to_status, `from`, del_from_status);
As a side note, please don't use reserved words for column names: I'm talking about the from column. It just makes life difficult for everyone.

Mysql query with multiple selects results in high CPU load

I'm trying to do a link exchange script and run into a bit of trouble.
Each link can be visited by an IP address a number of x times (frequency in links table). Each visit costs a number of credits (spend limit given in limit in links table)
I've got the following tables:
CREATE TABLE IF NOT EXISTS `contor` (
`key` varchar(25) NOT NULL,
`uniqueHandler` varchar(30) DEFAULT NULL,
`uniqueLink` varchar(30) DEFAULT NULL,
`uniqueUser` varchar(30) DEFAULT NULL,
`owner` varchar(50) NOT NULL,
`ip` varchar(15) DEFAULT NULL,
`credits` float NOT NULL,
`tstamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`key`),
KEY `uniqueLink` (`uniqueLink`),
KEY `uniqueHandler` (`uniqueHandler`),
KEY `uniqueUser` (`uniqueUser`),
KEY `owner` (`owner`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `links` (
`unique` varchar(30) NOT NULL DEFAULT '',
`url` varchar(1000) DEFAULT NULL,
`frequency` varchar(5) DEFAULT NULL,
`limit` float NOT NULL DEFAULT '0',
PRIMARY KEY (`unique`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I've got the following query:
$link = MYSQL_QUERY("
SELECT *
FROM `links`
WHERE (SELECT count(key) FROM contor WHERE ip = '$ip' AND contor.uniqueLink = links.unique) <= `frequency`
AND (SELECT sum(credits) as cost FROM contor WHERE contor.uniqueLink = links.unique) <= `limit`")
There are 20 rows in the table links.
The problem is that whenever there are about 200k rows in the table contor the CPU load is huge.
After applying the solution provided by #Barmar:
Added composite index on (uniqueLink, ip) and droping all other indexes except PRIMARY, EXPLAIN gives me this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY l ALL NULL NULL NULL NULL 18
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 15
2 DERIVED pop_contor index NULL contor_IX1 141 NULL 206122
Try using a join rather than a correlated subquery.
SELECT l.*
FROM links AS l
LEFT JOIN (
SELECT uniqueLink, SUM(ip = '$ip') AS ip_visits, SUM(credits) AS total_credits
FROM contor
GROUP BY uniqueLink
) AS c
ON c.uniqueLink = l.unique AND ip_visits <= frequency AND total_credits <= limit
If this doesn't help, try adding an index on contor.ip.
The current query is of the form:
SELECT l.*
FROM `links` l
WHERE l.frequency >= ( SELECT COUNT(ck.key)
FROM contor ck
WHERE ck.uniqueLink = l.unique
AND ck.ip = '$ip'
)
AND l.limit >= ( SELECT SUM(sc.credits)
FROM contor sc
WHERE sc.uniqueLink = l.unique
)
Those correlated subqueries are going to each your lunch. And your lunchbox too.
I'd suggest testing an inline view that performs both of the aggregations from contor in one pass, and then join the result from that to the links table.
Something like this:
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip' AND c.key IS NOT NULL) AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits
For optimal performance of the aggregation inline view query, provide a covering index that MySQL can use to optimize the GROUP BY (avoiding a Using filesort operation)
CREATE INDEX `contor_IX1` ON `contor` (`uniqueLink`, `credits`, `ip`) ;
Adding that index renders the uniqueLink index redundant, so also...
DROP INDEX `uniqueLink` ON `contor` ;
EDIT
Since we have a guarantee that contor.key column is non-NULL (i.e. the NOT NULL constraint), this part of the query above is unneeded AND c.key IS NOT NULL, and can be removed. (I also removed the key column from the covering index definition above.)
SELECT l.*
FROM ( SELECT c.uniqueLink
, SUM(c.ip = '$ip') AS count_key
, SUM(c.credits) AS sum_credits
FROM `contor` c
GROUP
BY c.uniqueLink
) d
JOIN `links` l
ON l.unique = d.uniqueLink
AND l.frequency >= d.count_key
AND l.limit >= d.sum_credits

Optimize MySQL UPDATE query that contains WHERE and ORDER BY?

How can I optimize this query? If I run it without the ORDER BY clause, it executes in <100ms. With the ORDER BY clause it takes many seconds, and crushes the server when more than one system is trying to make this query at once.
UPDATE companies
SET
crawling = 1
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1;
If I run this query as a SELECT, it's also fast ( <100ms ).
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1;
Here is my table schema:
CREATE TABLE `companies` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) DEFAULT NULL,
`url_scheme` varchar(10) DEFAULT NULL,
`url_host` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`crawl` tinyint(1) unsigned NOT NULL DEFAULT '1',
`crawling` tinyint(1) unsigned NOT NULL DEFAULT '0',
`last_crawled` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `url_host` (`url_host`),
KEY `crawl` (`crawl`),
KEY `crawling` (`crawling`),
KEY `last_crawled` (`last_crawled`),
KEY `url_scheme` (`url_scheme`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
UPDATE ONE
This query gives me the following error: You can't specify target table 'companies' for update in FROM clause
UPDATE companies
SET crawling = 1
WHERE id = (
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1
);
This query gives me the following error: This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
UPDATE companies
SET crawling = 1
WHERE id in (
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1
);
try not to use ORDER-BY and LIMIT for such small number of updates.
UPDATE companies t1
join
(
SELECT c.id,#RowNum:=#RowNum+1 AS RowID
FROM companies c, (SELECT #RowNum := 0)r
WHERE c.crawling = 0 AND c.url_host IS NOT NULL
ORDER BY c.last_crawled ASC
)t2
ON t2.RowID=1 AND t1.id=t2.id
SET t1.crawling = 1
EDIT:1
make sure you have the index on (last_crawled ASC , id ASC)
UPDATE companies t1
join
(
Select ID,RowID
From
(
SELECT c.id,#RowNum:=#RowNum+1 AS RowID
FROM companies c, (SELECT #RowNum := 0)r
WHERE c.crawling = 0 AND c.url_host IS NOT NULL
ORDER BY c.last_crawled ASC
)t2
WHERE ROWID=1
)t3
ON t1.id=t3.id
SET t1.crawling = 1

Sorting result of mysql join by avg of third table?

I have three tables.
One table contains submissions which has about 75,000 rows
One table contains submission ratings and only has < 10 rows
One table contains submission => competition mappings and for my test data also has about 75,000 rows.
What I want to do is
Get the top 50 submissions in a round of a competition.
Top is classified as highest average rating, followed by highest amount of votes
Here is the query I am using which works, but the problem is that it takes over 45 seconds to complete! I profiled the query (results at bottom) and the bottlenecks are copying the data to a tmp table and then sorting it so how can I speed this up?
SELECT `submission_submissions`.*
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
submission_submissions
CREATE TABLE `submission_submissions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`description` varchar(255) DEFAULT NULL,
`genre` int(11) NOT NULL,
`goals` text,
`submission` text NOT NULL,
`date_created` datetime DEFAULT NULL,
`date_modified` datetime DEFAULT NULL,
`date_deleted` datetime DEFAULT NULL,
`cover_image` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `genre` (`genre`),
KEY `account_id` (`account_id`),
KEY `date_created` (`date_created`)
) ENGINE=InnoDB AUTO_INCREMENT=115037 DEFAULT CHARSET=latin1;
submission_ratings
CREATE TABLE `submission_ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`stars` tinyint(1) NOT NULL,
`date_created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `submission_id` (`submission_id`),
KEY `account_id` (`account_id`),
KEY `stars` (`stars`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
competition_submissions
CREATE TABLE `competition_submissions` (
`competition_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`top_round` int(11) DEFAULT '1',
PRIMARY KEY (`submission_id`),
KEY `competition_id` (`competition_id`),
KEY `top_round` (`top_round`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SHOW PROFILE Result (ordered by duration)
state duration (summed) in sec percentage
Copying to tmp table 33.15621 68.46924
Sorting result 11.83148 24.43260
removing tmp table 3.06054 6.32017
Sending data 0.37560 0.77563
... insignificant amounts removed ...
Total 48.42497 100.00000
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE competition_submissions index_merge PRIMARY,competition_id,top_round competition_id,top_round 4,5 18596 Using intersect(competition_id,top_round); Using where; Using index; Using temporary; Using filesort
1 SIMPLE submission_submissions eq_ref PRIMARY PRIMARY 4 inkstakes.competition_submissions.submission_id 1 Using where
1 SIMPLE submission_ratings ALL submission_id 5 Using where; Using join buffer (flat, BNL join)
Assuming that in reality you won't be interested in unrated submissions, and that a given submission only has a single competition_submissions entry for a given match and top_round, I suggest:
SELECT s.*
FROM (SELECT `submission_id`,
AVG(`stars`) AvgStars,
COUNT(`id`) CountId
FROM `submission_ratings`
GROUP BY `submission_id`
ORDER BY AVG(`stars`) DESC, COUNT(`id`) DESC
LIMIT 50) r
JOIN `submission_submissions` s
ON r.`submission_id` = s.`id` AND
s.`date_deleted` IS NULL
JOIN `competition_submissions` c
ON c.`submission_id` = s.`id` AND
c.`top_round` = 1 AND
c.`competition_id` = '2'
ORDER BY r.AvgStars DESC,
r.CountId DESC
(If there is more than one competition_submissions entry per submission for a given match and top_round, then you can add the GROUP BY clause back in to the main query.)
If you do want to see unrated submissions, you can union the results of this query to a LEFT JOIN ... WHERE NULL query.
There is a simple trick that works on MySql and helps to avoid copying/sorting huge temp tables in queries like this (with LIMIT X).
Just avoid SELECT *, this copies all columns to the temporary table, then this huge table is sorted, and in the end, the query takes only 50 records from this huge table ( 50 / 70000 = 0,07 % ).
Select only columns that are really necessary to perform sort and limit, and then join missing columns only for selected 50 records by id.
select ss.*
from submission_submissions ss
join (
SELECT `submission_submissions`.id,
AVG(submission_ratings.`stars`) stars,
COUNT(submission_ratings.`id`) cnt
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
) xx
ON ss.id = xx.id
ORDER BY xx.stars DESC,
xx.cnt DESC;