MySQL Optimization Problem - mysql

I'm having problems with a query optimization. The following query takes more than 30 seconds to get the expected result.
SELECT tbl_history.buffet_q_rating, tbl_history.cod_stock, tbl_history.bqqq_change_month, stocks.ticker, countries.country, stocks.company
FROM tbl_history
INNER JOIN stocks ON tbl_history.cod_stock = stocks.cod_stock
INNER JOIN exchange ON stocks.cod_exchange = exchange.cod_exchange
INNER JOIN countries ON exchange.cod_country = countries.cod_country
WHERE exchange.cod_country =125
AND DATE = '2011-07-25'
AND bqqq_change_month IS NOT NULL
AND buffet_q_rating IS NOT NULL
ORDER BY bqqq_change_month DESC
LIMIT 10
The tables are:
CREATE TABLE IF NOT EXISTS `tbl_history` (
`cod_stock` int(11) NOT NULL DEFAULT '0',
`date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`price` decimal(11,3) DEFAULT NULL,
`buffet_q_rating` decimal(11,4) DEFAULT NULL,
`bqqq_change_day` decimal(11,2) DEFAULT NULL,
`bqqq_change_month` decimal(11,2) DEFAULT NULL,
(...)
PRIMARY KEY (`cod_stock`,`date`),
KEY `cod_stock` (`cod_stock`),
KEY `buf_rating` (`buffet_q_rating`),
KEY `data` (`date`),
KEY `bqqq_change_month` (`bqqq_change_month`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `stocks` (
`cod_stock` int(11) NOT NULL AUTO_INCREMENT,
`cod_exchange` int(11) DEFAULT NULL,
PRIMARY KEY (`cod_stock`),
KEY `exchangestocks` (`cod_exchange`),
KEY `codstock` (`cod_stock`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
CREATE TABLE IF NOT EXISTS `exchange` (
`cod_exchange` int(11) NOT NULL AUTO_INCREMENT,
`exchange` varchar(100) DEFAULT NULL,
`cod_country` int(11) DEFAULT NULL,
PRIMARY KEY (`cod_exchange`),
KEY `countriesexchange` (`cod_country`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
CREATE TABLE IF NOT EXISTS `countries` (
`cod_country` int(11) NOT NULL AUTO_INCREMENT,
`country` varchar(100) DEFAULT NULL,
`initial_amount` double DEFAULT NULL,
PRIMARY KEY (`cod_country`),
KEY `codcountry` (`cod_country`),
KEY `country` (`country`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
The first table have more than 20 million rows, the second have 40k and the others have just a few rows (maybe 100).
Them problem seems to be the "order by" but I have no idea how to optimize it.
I already tried some things searching on google/stackoverflow but I was unable to get good results
Can someone give me some advice?
EDIT:
Forgot the EXPLAIN result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE countries const PRIMARY,codcountry PRIMARY 4 const 1 Using temporary; Using filesort
1 SIMPLE exchange ref PRIMARY,countriesexchange countriesexchange 5 const 15 Using where
1 SIMPLE stocks ref PRIMARY,exchangestocks,codstock exchangestocks 5 databaseName.exchange.cod_exchange 661 Using where
1 SIMPLE tbl_history eq_ref PRIMARY,cod_stock,buf_rating,data,bqqq_change_mont... PRIMARY 12 v.stocks.cod_stock,const 1 Using where
UPDATE
this is the new EXPLAIN I got:
id select_type table type possible_keys key key_len ref rows Extra |
1 SIMPLE tbl_history range monthstats monthstats 14 NULL 80053 Using where; Using index |
1 SIMPLE countries ref country country 4 const 1 Using index |
1 SIMPLE exchange ref PRIMARY,cod_country,countryexchange countryexchange 5 cons‌​t 5 Using where; Using index |
1 SIMPLE stocks ref info4stats info4stats 9 databaseName.exchange.cod_exchange,d‌​atabaseName.stock_... 1 Using where; Using index |

I would try to preemptively start with the Country records for 125 and work in reverse. By using a Straight_join will force the order of your query as entered...
I would also have an index on your Tbl_History table by the COD_Stock and DATE( date ). So the query will properly and efficiently match the join condition on the pre-qualified date portion of the date/time field.
SELECT STRAIGHT_JOIN
th.buffet_q_rating,
th.cod_stock,
th.bqqq_change_month,
stocks.ticker,
c.country,
s.company
FROM
Exchange e
join Countries c
on e.Cod_Country = c.Cod_Country
join Stocks s
on e.cod_exchange = s.cod_exchange
join tbl_history th
on s.cod_stock = th.cod_stock
AND th.`Date` = '2011-07-25'
AND th.bqqq_change_month IS NOT NULL
AND th.buffet_q_rating IS NOT NULL
WHERE
e.Cod_Country = 125
ORDER BY
th.bqqq_change_month DESC
LIMIT 10

If you want to limit the result, why do you do it after you join all the table?
Try to reduce the size of those big tables first (LIMIT or WHERE them) before joining them with other tables.
But you have to be sure that your original query and your modified query means the same.
Update (Sample) :
select
tbl_user.user_id,
tbl_group.group_name
from
tbl_grp_user
inner join
(
select
tbl_user.user_id,
tbl_user.user_name
from
tbl_user
limit
5
) as tbl_user
on
tbl_user.user_id = tbl_grp_user.user_id
inner join
(
select
group_id,
group_name
from
tbl_group
where
tbl_group.group_id > 5
) as tbl_group
on
tbl_group.group_id = tbl_grp_user.group_id
Hopefully, query above will give you a hint

Related

Optimizing MySQL CREATE TABLE Query

I have two tables I am trying to join in a third query and it seems to be taking far too long.
Here is the syntax I am using
CREATE TABLE active_users
(PRIMARY KEY ix_all (platform_id, login_year, login_month, person_id))
SELECT platform_id
, YEAR(my_timestamp) AS login_year
, MONTH(my_timestamp) AS login_month
, person_id
, COUNT(*) AS logins
FROM
my_login_table
GROUP BY 1,2,3,4;
CREATE TABLE active_alerts
(PRIMARY KEY ix_all (platform_id, alert_year, alert_month, person_id))
SELECT platform_id
, YEAR(alert_datetime) AS alert_year
, MONTH(alert_datetime) AS alert_month
, person_id
, COUNT(*) AS alerts
FROM
my_alert_table
GROUP BY 1,2,3,4;
CREATE TABLE all_data
(PRIMARY KEY ix_all (platform_id, theYear, theMonth, person_id))
SELECT a.platform_id
, a.login_year AS theyear
, a.login_month AS themonth
, a.person_id
, IFNULL(a.logins,0) AS logins
, IFNULL(b.alerts,0) AS job_alerts
FROM
active_users a
LEFT OUTER JOIN
active_alerts b
ON a.platform_id = b.platform_id
AND a.login_year = b.alert_year
AND a.login_month = b.alert_month
AND a.person_id = b.person_id;
The first table (logins) returns about half a million rows and takes less than 1 minute, the second table (alerts) returns about 200k rows and takes less than 1 minute.
If I run just the SELECT part of the third statement it runs in a few seconds, however as soon as I run it with the CREATE TABLE syntax it takes more than 30 minutes.
I have tried different types of indexes than a primary key, such as UNIQUE or INDEX as well as no key at all, but that doesn't seem to make much difference.
Is there something I can do to speed up the creation / insertion of this table?
EDIT:
Here is the output of the show create table statements
CREATE TABLE `active_users` (
`platform_id` int(11) NOT NULL,
`login_year` int(4) DEFAULT NULL,
`login_month` int(2) DEFAULT NULL,
`person_id` varchar(40) NOT NULL,
`logins` bigint(21) NOT NULL DEFAULT '0',
KEY `ix_all` (`platform_id`,`login_year`,`login_month`,`person_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
CREATE TABLE `alerts` (
`platform_id` int(11) NOT NULL,
`alert_year` int(4) DEFAULT NULL,
`alert_month` int(2) DEFAULT NULL,
`person_id` char(36) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`alerts` bigint(21) NOT NULL DEFAULT '0',
KEY `ix_all` (`platform_id`,`alert_year`,`alert_month`,`person_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
and the output of the EXPLAIN
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a (null) ALL (null) (null) (null) (null) 503504 100 (null)
1 SIMPLE b (null) ALL ix_all (null) (null) (null) 220187 100 Using where; Using join buffer (Block Nested Loop)
It's a bit of a hack but I figured out how to get it to run much faster.
I added a primary key to the third table on platform, year, month, person
I inserted the intersect data using an inner join, then insert ignore the left table plus a zero for alerts in a separate statement.

MySql group by optimization - avoid tmp table and/or filesort

I have a slow query, without the group by is fast (0.1-0.3 seconds), but with the (required) group by the duration is around 10-15s.
The query joins two tables, events (near 50 million rows) and events_locations (5 million rows).
Query:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
Table events:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The explain:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
From the doc:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
I already tried:
Create an index by 'el.some_id , el.group_alias'
Decrease the varchar size to 20
Increase the size of sort_buffer_size and read_rnd_buffer_size;
Any suggestions for performance tuning would be much appreciated!
In your case events table has time_span as indexing property. So before joining both tables first select required records from events table for specific date range with required details. Then join the event_location by using table relation properties.
Check your MySql Explain keyword to check how does your approach your table records. It will tell you how much rows are scanned for before selecting required records.
Number of rows that are scanned also involve in query execution time. Use my below logic to reduce the number of rows that are scanned.
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
The relationship between these tables is 1:1, so, I asked me why is a group by required and I found some duplicated rows, 200 in 50000 rows. So, somehow, my system is inserting duplicates and someone put that group by (years ago) instead of seek of the bug.
So, I will mark this as solved, more or less...

GROUP BY + ORDER BY make my query very slow

I am trying to figure out what I should do to my query and/ or to my tables structure to improve a query to get the best sellers which is run in over 1 sec.
Here is the query I'm talking about:
SELECT pr.id_prod, MAX(pr.stock) AS stock, MAX(pr.dt_add) AS dt_add, SUM(od.quantity) AS quantity
FROM orders AS o
INNER JOIN orders_details AS od ON od.id_order = o.id_order
INNER JOIN products_references AS pr ON pr.id_prod_ref = od.id_prod_ref
INNER JOIN products AS p ON p.id_prod = pr.id_prod
WHERE o.id_order_status > 11
AND pr.active = 1
GROUP BY p.id_prod
ORDER BY quantity
LIMIT 10
If I use GROUP BY p.id_prod instead of GROUP BY pr.id_prod and remove the ORDER BY, the query is run in 0.07sec.
is that EXPLAIN table OKAY?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE o range PRIMARY,id_order_status id_order_status 1 75940 Using where; Using index; Using temporary; Using filesort
1 SIMPLE od ref id_order,id_prod_ref id_order 4 dbname.o.id_order 1
1 SIMPLE pr eq_ref PRIMARY,id_prod PRIMARY 4 dbname.od.id_prod_ref 1 Using where
1 SIMPLE p eq_ref PRIMARY,name_url,id_brand,name PRIMARY 4 dbname.pr.id_prod 1 Using index
And this is the EXPLAIN without the ORDER BY
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index PRIMARY,name_url,id_brand,name PRIMARY 4 1 Using index
1 SIMPLE pr ref PRIMARY,id_prod id_prod 4 dbname.p.id_prod 2 Using where
1 SIMPLE od ref id_order,id_prod_ref id_prod_ref 4 dbname.pr.id_prod_ref 67
1 SIMPLE o eq_ref PRIMARY,id_order_status PRIMARY 4 dbname.od.id_order 1 Using where
And here is the table structures
CREATE TABLE `orders` (
`id_order` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_dir` int(10) unsigned DEFAULT NULL,
`id_status` tinyint(3) unsigned NOT NULL DEFAULT '11',
PRIMARY KEY (`id_order`),
KEY `id_dir` (`id_dir`),
KEY `id_status` (`id_status`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `orders_details` (
`id_order_det` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_order` int(10) unsigned NOT NULL,
`id_prod_ref` int(10) unsigned NOT NULL,
`quantity` smallint(5) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id_order_det`),
UNIQUE KEY `id_order` (`id_order`,`id_prod_ref`) USING BTREE,
KEY `id_prod_ref` (`id_prod_ref`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `products` (
`id_prod` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(60) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id_prod`),
FULLTEXT KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `products_references` (
`id_prod_ref` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_prod` int(10) unsigned NOT NULL,
`stock` smallint(6) NOT NULL DEFAULT '0',
`dt_add` datetime DEFAULT NULL,
`active` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id_prod_ref`),
KEY `id_prod` (`id_prod`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I also tried to give you the tables relations (ON UPDATE, ON DELETE CASCADE, ...) but didn't manage to export it. But I don't think it's crucial for now!
Try using the alias name in order by and not the value from the table
and use the group by for the value in select (is the same for join because is inner join on equal value and the value form the pr is not retrived for select result )
SELECT p.id_prod, p.name, SUM(od.quantity) AS quantity
FROM orders AS o
INNER JOIN orders_details AS od ON od.id_order = o.id_order
INNER JOIN products_references AS pr ON pr.id_prod_ref = od.id_prod_ref
INNER JOIN products AS p ON p.id_prod = pr.id_prod
WHERE pr.active = 1
GROUP BY p.id_prod
ORDER BY quantity
LIMIT 10
do not forget to use appropriate indexes on join columns
(Rewritten after OP added more info.)
SELECT pr.id_prod,
MAX(pr.stock) AS max_stock,
MAX(pr.dt_add) AS max_dt_add
SUM(od.quantity) AS sum_quantity
FROM orders AS o
INNER JOIN orders_details AS od
ON od.id_order = o.id_order
INNER JOIN products_references AS pr
ON pr.id_prod_ref = od.id_prod_ref
WHERE o.id_order_status > 11
AND pr.active = 1
GROUP BY pr.id_prod
ORDER BY sum_quantity
LIMIT 10
Note that p was removed as being irrelevant.
Beware of SUM() when using JOIN with GROUP BY -- you might get an incorrect, inflated, value.
Improvement on one table:
CREATE TABLE `orders_details` (
`id_order` int(10) unsigned NOT NULL,
`id_prod_ref` int(10) unsigned NOT NULL,
`quantity` smallint(5) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id_order`,`id_prod_ref`),
INDEX (id_prod_ref, id_order)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Here's why: od sounds like a many:many mapping table. See here for tips on improving performance in it.
GROUP BY usually involves a sort. ORDER BY, when it is not identical to the GROUP BY definitely requires another sort.
Removing the ORDER BY allows the query to return any 10 rows without the sort. (This may explain the timing difference.)
Note the alias sum_quantity to avoid ambiguity between the column quantity and your alias quantity.
Explaining EXPLAIN
1 SIMPLE o range id_order_status 1 75940 Using where; Using index; Using temporary; Using filesort
1 SIMPLE od ref id_order 4 o.id_order 1
1 SIMPLE pr eq_ref PRIMARY 4 od.id_prod_ref 1 Using where
1 SIMPLE p eq_ref PRIMARY 4 pr.id_prod 1 Using index
The tables will be accessed in the order given (o,od,pr,p).
o won't use the data ("Using index") but will scan the id_order_status index which includes (id_status, id_order). Note: The PRIMARY KEY columns are implicitedly added to any secondary key.
It estimates 76K will need to be scanned (for > 11).
Somewhere in the processing, there will a temp table and a sort of it. This may or may not involve disk I/O.
The reach into od might find 1 row, might find 0 or more than 1 ("ref").
The reaching into pr and p are known to get at most 1 row.
pr does a small amount of filtering (active=1), but not until the third line of EXPLAIN. And no index is useful for this filtering. This could be improved, but only slightly, by a composite index (active, id_prod_ref). With only 5-10% being filtered out, this won't help much.
After all the JOINing and filtering, there will be two temp tables and sorts, one for GROUP BY, one for ORDER BY.
Only after that, will 10 rows be peeled off from the 70K (or so) rows collected up to this point.
Without the ORDER BY, the EXPLAIN shows that a different order seems to be better. And the tmp & sort went away.
1 SIMPLE p index PRIMARY 4 1 Using index
1 SIMPLE pr ref id_prod 4 p.id_prod 2 Using where
1 SIMPLE od ref id_prod_ref 4 pr.id_prod_ref 67
1 SIMPLE o eq_ref PRIMARY 4 dbne.od.id_order 1 Using where
There seem to be only 1 row in p, correct? So, in a way, it does not matter when this table is accessed. When you have multiple "products" all this analysis may change!
"key=PRIMARY", "Using index" is sort of a misnomer. It is really using the data, but being able to efficiently access it because the PRIMARY KEY is "clustered" with the data.
There is only one pr row?? Perhaps the optimizer realized that GROUP BY was not needed?
When it got to od, it estimated that "67" rows would be needed per p+pr combo.
You removed the ORDER BY, so there is no need to sort, and any 10 rows can be delivered.

Sorting result of mysql join by avg of third table?

I have three tables.
One table contains submissions which has about 75,000 rows
One table contains submission ratings and only has < 10 rows
One table contains submission => competition mappings and for my test data also has about 75,000 rows.
What I want to do is
Get the top 50 submissions in a round of a competition.
Top is classified as highest average rating, followed by highest amount of votes
Here is the query I am using which works, but the problem is that it takes over 45 seconds to complete! I profiled the query (results at bottom) and the bottlenecks are copying the data to a tmp table and then sorting it so how can I speed this up?
SELECT `submission_submissions`.*
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
submission_submissions
CREATE TABLE `submission_submissions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`description` varchar(255) DEFAULT NULL,
`genre` int(11) NOT NULL,
`goals` text,
`submission` text NOT NULL,
`date_created` datetime DEFAULT NULL,
`date_modified` datetime DEFAULT NULL,
`date_deleted` datetime DEFAULT NULL,
`cover_image` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `genre` (`genre`),
KEY `account_id` (`account_id`),
KEY `date_created` (`date_created`)
) ENGINE=InnoDB AUTO_INCREMENT=115037 DEFAULT CHARSET=latin1;
submission_ratings
CREATE TABLE `submission_ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`stars` tinyint(1) NOT NULL,
`date_created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `submission_id` (`submission_id`),
KEY `account_id` (`account_id`),
KEY `stars` (`stars`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
competition_submissions
CREATE TABLE `competition_submissions` (
`competition_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`top_round` int(11) DEFAULT '1',
PRIMARY KEY (`submission_id`),
KEY `competition_id` (`competition_id`),
KEY `top_round` (`top_round`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SHOW PROFILE Result (ordered by duration)
state duration (summed) in sec percentage
Copying to tmp table 33.15621 68.46924
Sorting result 11.83148 24.43260
removing tmp table 3.06054 6.32017
Sending data 0.37560 0.77563
... insignificant amounts removed ...
Total 48.42497 100.00000
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE competition_submissions index_merge PRIMARY,competition_id,top_round competition_id,top_round 4,5 18596 Using intersect(competition_id,top_round); Using where; Using index; Using temporary; Using filesort
1 SIMPLE submission_submissions eq_ref PRIMARY PRIMARY 4 inkstakes.competition_submissions.submission_id 1 Using where
1 SIMPLE submission_ratings ALL submission_id 5 Using where; Using join buffer (flat, BNL join)
Assuming that in reality you won't be interested in unrated submissions, and that a given submission only has a single competition_submissions entry for a given match and top_round, I suggest:
SELECT s.*
FROM (SELECT `submission_id`,
AVG(`stars`) AvgStars,
COUNT(`id`) CountId
FROM `submission_ratings`
GROUP BY `submission_id`
ORDER BY AVG(`stars`) DESC, COUNT(`id`) DESC
LIMIT 50) r
JOIN `submission_submissions` s
ON r.`submission_id` = s.`id` AND
s.`date_deleted` IS NULL
JOIN `competition_submissions` c
ON c.`submission_id` = s.`id` AND
c.`top_round` = 1 AND
c.`competition_id` = '2'
ORDER BY r.AvgStars DESC,
r.CountId DESC
(If there is more than one competition_submissions entry per submission for a given match and top_round, then you can add the GROUP BY clause back in to the main query.)
If you do want to see unrated submissions, you can union the results of this query to a LEFT JOIN ... WHERE NULL query.
There is a simple trick that works on MySql and helps to avoid copying/sorting huge temp tables in queries like this (with LIMIT X).
Just avoid SELECT *, this copies all columns to the temporary table, then this huge table is sorted, and in the end, the query takes only 50 records from this huge table ( 50 / 70000 = 0,07 % ).
Select only columns that are really necessary to perform sort and limit, and then join missing columns only for selected 50 records by id.
select ss.*
from submission_submissions ss
join (
SELECT `submission_submissions`.id,
AVG(submission_ratings.`stars`) stars,
COUNT(submission_ratings.`id`) cnt
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
) xx
ON ss.id = xx.id
ORDER BY xx.stars DESC,
xx.cnt DESC;

Where am I going wrong in using a Join in the mysql query - Explain result posted too

I have this query which takes about 3.5 seconds just to fetch 2 records. However there are over 100k rows in testimonials, 13k in users, 850 in courses, 2 in exams.
SELECT t.*, u.name, f.feedback
FROM testmonials t
INNER JOIN user u ON u.id = t.userid
INNER JOIN courses co ON co.id = t.courseid
LEFT JOIN exam ex ON ex.id = t.exam_id
WHERE t.status = 4
AND t.verfication_required = 'Y'
AND t.verfication_completed = 'N'
ORDER BY t.submissiondate DESC
.Explain result: .
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE co ALL PRIMARY NULL NULL NULL 850 Using temporary; Using filesort
1 SIMPLE t ref CID,nuk_tran_user CID 4 kms.co.id 8 Using where
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.eval_id 1
If I remove the courses table join then the query returns the result pretty quick. I can't figure out why this query has to select all the courses rows i.e. 850?
Any ideas what I am doing wrong?
Edit:
I have an index on courseid, userid in testimonials table and these are primary keys of their respective tables.
EDIT 2
I have just removed the courseid index from the testimonials table (just to test) and interestingly the query returned result in 0.22 seconds!!!?? Everything else the same as above just removed only this index.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL nuk_tran_user NULL NULL NULL 130696 Using where; Using filesort
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE co eq_ref PRIMARY PRIMARY 4 kms.t.courseid 1
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.exam_id 1
EDIT 3
EDIT 3
CREATE TABLE IF NOT EXISTS `courses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
`duration` varchar(100) NOT NULL DEFAULT '',
`objectives` text NOT NULL,
`updated_at` datetime DEFAULT NULL,
`updated_by` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=851 ;
Testimonials
CREATE TABLE IF NOT EXISTS `testimonials` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`feedback` text NOT NULL,
`userid` int(10) unsigned NOT NULL DEFAULT '0',
`courseid` int(10) unsigned NOT NULL DEFAULT '0',
`eventid` int(10) unsigned NOT NULL DEFAULT '0',
`emr_date` datetime DEFAULT NULL,
`exam_required` enum('Y','N') NOT NULL DEFAULT 'N',
`exam_id` smallint(5) unsigned NOT NULL DEFAULT '0',
`emr_completed` enum('Y','N') NOT NULL DEFAULT 'N',
PRIMARY KEY (`id`),
KEY `event` (`eventid`),
KEY `nuk_tran_user` (`userid`),
KEY `emr_date` (`emr_date`),
KEY `courseid` (`courseid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=134691 ;
.. this is the latest Explain query result now ...
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL nuk_tran_user,courseid NULL NULL NULL 130696 Using where; Using filesort
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE co eq_ref PRIMARY PRIMARY 4 kms.t.courseid 1
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.exam_id 1
Doing an ORDER BY that does not have a corresponding index that can be utilized is known to cause delay issues. Even though this does not specifically answer your issue of the courses table.
Your original query looks MOSTLY ok, but you reference "f.feedback" and there is no "f" alias in the query. You also refer to "verification_required" and "verification_completed" but don't see those in the table structures but DO find "exam_required" and "emr_completed".
I would however change one thing. In the testimonials table, instead of individual column indexes, I would add one more with multiple columns to both take advantage of your multiple criteria query AND the order by
create table ...
KEY StatVerifySubmit ( status, verification_required, verification_completed, submissionDate )
but appears your query is referring to columns not listed in your table structure listing, but instead might be
KEY StatVerifySubmit ( status, exam_required, emr_completed, emr_Date)
Could you give a try to the following query instead of the original:
SELECT t.*, u.name, f.feedback
FROM testmonials t
INNER JOIN user u ON u.id = t.userid
LEFT JOIN exam ex ON ex.id = t.exam_id
WHERE t.status = 4
AND t.verfication_required = 'Y'
AND t.verfication_completed = 'N'
AND t.courseid in ( SELECT co.id FROM courses co)
ORDER BY t.submissiondate DESC
Do you need to select columns from the courses table?