Why is my INNER JOIN so slow? Server times out - mysql

my relatively simple query is taking so long to execute that the server times out. Incidentally, if I run the sub-query on it's own, it executes very quickly.
Essentially I'm trying to get the first date for each game_id, then get the corresponding score and duration...
My query:
SELECT
sq.*,
up.score AS score,
up.duration AS duration
FROM (
SELECT
up.uid AS uid,
up.lesson_id AS lesson_id,
up.level AS level,
up.game_id AS game_id,
MIN(up.date) AS first_date
FROM cdu_user_progress up
WHERE (up.score >= '0')
GROUP BY up.uid, up.lesson_id, up.level, up.game_id
) sq
INNER JOIN cdu_user_progress up ON up.uid = sq.uid AND up.lesson_id = sq.lesson_id AND up.level = sq.level AND up.game_id = sq.game_id AND up.date = sq.first_date
GROUP BY sq.uid, sq.lesson_id, sq.level, sq.game_id
cdu_user_progress is:
------------------------------------------------------------------------------------
|id |uid |lesson_id |game_id |level |score |duration |date |
------------------------------------------------------------------------------------
Explain:
Field Type Null Key Default Extra
--------------------------------------------------------------
id int(11) NO PRI NULL auto_increment
uid int(11) NO NULL
lesson_id int(11) NO NULL
game_id int(11) NO NULL
level int(11) NO NULL
score int(11) NO NULL
duration int(11) NO NULL
date int(11) NO NULL

First, I don't think you need the outer group by, unless you have duplicates by date. If so, perhaps you can use id instead of or in addition to date:
SELECT sq.*, up.score AS score, up.duration AS duration
FROM (SELECT up.uid, up.lesson_id, up.level, up.game_id,
MIN(up.date) AS first_date
FROM cdu_user_progress up
WHERE up.score >= 0
GROUP BY up.uid, up.lesson_id, up.level, up.game_id
) s INNER JOIN
cdu_user_progress up
ON up.uid = sq.uid AND up.lesson_id = sq.lesson_id AND
up.level = sq.level AND up.game_id = sq.game_id AND
up.date = sq.first_date;
The best indexes for this query are cdu_user_progress(score) and cdu_user_progress(uid, lesson_id, level, game_id, date). If the first query runs fast, the second index should be a big help.

Related

SQL query to select all rows with max column value

CREATE TABLE `user_activity` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`type` enum('request','response') DEFAULT NULL,
`data` longtext NOT NULL,
`created_at` datetime DEFAULT NULL,
`source` varchar(255) DEFAULT NULL,
`task_name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
);
I have this data:-
Now I need to select all rows for user_id=527 where created_at value is the maximum. So I need the last 3 rows in this image.
I wrote this query:-
SELECT *
FROM user_activity
WHERE user_id = 527
AND source = 'E1'
AND task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
AND created_at = (SELECT Max(created_at)
FROM user_activity
WHERE user_id = 527
AND source = 'E1'
AND task_name IN ( 'GetReportTask',
'StopMonitoringUserTask' ));
This is very inefficient because I am running the exact same query again as an inner query except that it disregards created_at. What's the right way to do this?
I would use a correlated subquery:
SELECT ua.*
FROM user_activity ua
WHERE ua.user_id = 527 AND source = 'E1' AND
ua.task_name IN ('GetReportTask', 'StopMonitoringUserTask' ) AND
ua.created_at = (SELECT MAX(ua2.created_at)
FROM user_activity ua2
WHERE ua2.user_id = ua.user_id AND
ua2.source = ua.source AND
ua2.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
);
Although this might seem inefficient, you can create an index on user_activity(user_id, source, task_name, created_at). With this index, the query should have decent performance.
Order by created_at desc and limit your query to return 1 row.
SELECT *
FROM user_activity
WHERE user_id = 527
AND source = 'E1'
AND task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
ORDER BY created_at DESC
LIMIT 1;
I used EverSQL and applied my own changes to come up with this single-select query that uses self-join:-
SELECT *
FROM user_activity AS ua1
LEFT JOIN user_activity AS ua2
ON ua2.user_id = ua1.user_id
AND ua2.source = ua1.source
AND ua2.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
AND ua1.created_at < ua2.created_at
WHERE ua1.user_id = 527
AND ua1.source = 'E1'
AND ua1.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
AND ua2.created_at IS NULL;
However, I noticed that the response times of both queries were similar. I tried to use Explain to identify any performance differences; and from what I understood from its output, there are no noticeable differences because proper indexing is in place. So for readability and maintainability, I'll just use the nested query.

Enum type and MySQL

i have simple task and i got stucked on it.
I have table login_history
`login_history_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`login_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`login_action` enum('login','logout') NOT NULL,
`user_id` int(11) unsigned NOT NULL, (this one is foreign key)
TASK: Write a query which will find a user who had most logouts on Wednesdays in September 2012.
As you can see i have login_action which is enum type and i need to find which user had most logouts on some specific day.. This is what i done so far i just need little push in right direction, someone to tell me where i am wrong here..
SELECT fullname FROM user WHERE user_id = (
SELECT user_id FROM login_history WHERE (user_id,login_action) = (
SELECT user_id, COUNT(login_action) FROM login_history WHERE login_action = 'logout' AND login_time = (
SELECT login_time FROM login_history WHERE YEAR(login_time) = 2012 AND MONTH(login_time) = 9 AND DAYOFWEEK(login_time) = 3)));
Try this:
select u.fullname from (select count(*) n,user_id
from login_history where
login_time between '2012-09-01' and '2012-10-01' and dayofweek(login_time) = 3 and login_action = 'logout'
group by user_id order by n desc limit 1) a, user u where a.user_id = u.user_id
For good performance, make sure you have a key on login_time column.

SQL query joining a few tables (MySQL)

I need a "little" help with an SQL query (MySQL).
I have the following tables:
COURIERS table:
+------------+
| COURIER_ID |
+------------+
DELIVERIES table:
+-------------+------------+------------+
| DELIVERY_ID | COURIER_ID | START_DATE |
+-------------+------------+------------+
ORDERS table:
+----------+-------------+-------------+
| ORDER_ID | DELIVERY_ID | FINISH_DATE |
+----------+-------------+-------------+
COORDINATES table:
+-------------+-----+-----+------+
| DELIVERY_ID | LAT | LNG | DATE |
+-------------+-----+-----+------+
In the real database I have more columns in each table, but for this example the above columns are enough.
What do I need?
An SQL query that returns all couriers [COURIER_ID], their last
delivery [DELIVERY_ID] (based on last START_DATE), the
delivery's last coordinate [LAT and LNG] (based on last DATE) and the remaining orders count (total of orders of the last delivery that have no FINISH_DATE).
A courier can have no deliveries, in this case I want DELIVERY_ID =
NULL, LAT = NULL and LNG = NULL in the result.
A delivery can have no coordinates, in this case I want LAT = NULL
and LNG = NULL in the result.
What was I able to do?
SELECT c.`COURIER_ID`,
d.`DELIVERY_ID`,
r.`LAT`,
r.`LNG`,
(SELECT COUNT(DISTINCT `ORDER_ID`)
FROM `ORDERS`
WHERE `DELIVERY_ID` = d.`DELIVERY_ID`
AND `FINISH_DATE` IS NULL) AS REMAINING_ORDERS
FROM `COURIERS` AS c
LEFT JOIN `DELIVERIES` AS d USING (`COURIER_ID`)
LEFT JOIN `COORDINATES` AS r ON r.`DELIVERY_ID` = d.`DELIVERY_ID`
WHERE (CASE WHEN
(SELECT MAX(`START_DATE`)
FROM `DELIVERIES`
WHERE `COURIER_ID` = c.`COURIER_ID`) IS NULL THEN d.`START_DATE` IS NULL ELSE d.`START_DATE` =
(SELECT MAX(`START_DATE`)
FROM `DELIVERIES`
WHERE `COURIER_ID` = c.`COURIER_ID`) END)
AND (CASE WHEN
(SELECT MAX(`DATE`)
FROM `COORDINATES`
WHERE `DELIVERY_ID` = d.`DELIVERY_ID`) IS NULL THEN r.`DATE` IS NULL ELSE r.`DATE` =
(SELECT MAX(`DATE`)
FROM `COORDINATES`
WHERE `DELIVERY_ID` = d.`DELIVERY_ID`) END)
GROUP BY c.`COURIER_ID`
ORDER BY d.`START_DATE` DESC
The problem is that this query is very slow (from 5 to 20 seconds) when I have over 5k COORDINATES and it does not returns all couriers sometimes.
Thank you so much for any solution.
Try this:
SELECT C.COURIER_ID, D.DELIVERY_ID, D.START_DATE, D.FINISH_DATE,
B.LAT, B.LNG, B.DATE, C.NoOfOrders
FROM COURIERS C
LEFT JOIN ( SELECT *
FROM (SELECT *
FROM DELIVERIES D
ORDER BY D.COURIER_ID, D.START_DATE DESC
) A
GROUP BY COURIER_ID
) AS A ON C.COURIER_ID = A.COURIER_ID
LEFT JOIN ( SELECT *
FROM (SELECT *
FROM COORDINATES CO
ORDER BY CO.DELIVERY_ID, CO.DATE DESC
) B
GROUP BY CO.DELIVERY_ID
) AS B ON A.DELIVERY_ID = B.DELIVERY_ID
LEFT JOIN ( SELECT O.DELIVERY_ID, COUNT(1) NoOfOrders
FROM ORDERS O WHERE FINISH_DATE IS NULL
GROUP BY O.DELIVERY_ID
) AS C ON A.DELIVERY_ID = C.DELIVERY_ID;
I haven't been able to test this query since I don't have a mysql database set up right now, much less with this schema and sample data. But I think this will work for you:
select
c.courier_id
, d.delivery_id
, co.lat
, co.lng
, oc.cnt as remaining_orders
from
couriers c
left join (
select
d.delivery_id
, d.courier_id
from
deliveries d
inner join (
select
d.delivery_id
, max(d.start_date) as start_date
from
deliveries d
group by
d.delivery_id
) dmax on dmax.delivery_id = d.delivery_id and dmax.start_date = d.start_date
) d on d.courier_id = c.courier_id
left join (
select
c.delivery_id
, c.lat
, c.lng
from
coordinates c
inner join (
select
c.delivery_id
, max(c.date) as date
from
coordinates c
group by
c.delivery_id
) cmax on cmax.delivery_id = c.delivery_id and cmax.date = c.date
) co on co.delivery_id = d.delivery_id
left join (
select
o.delivery_id
, count(o.order_id) as cnt
from
orders o
where
o.finish_date is null
group by
o.delivery_id
) oc on oc.delivery_id = d.delivery_id

Unknown Column in 'IN/ALL/ANY' subquery

I've got 2 tables: members and member_logs.
Members can belong to groups, which are in the members table. Given a date range and a group I'm trying to figure out how to get the 10 days with the highest number of successful logins. What I have so far is a massive nest of subquery terror.
SELECT count(member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and `reg_date` IN
(SELECT DISTINCT DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and (DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' and '2014-03-04'))
and `member_id` IN
(SELECT `member_id`
FROM members
WHERE `group_id` = 'XXXXXXX'
and `deleted` = 0)
ORDER BY `num_users` desc
LIMIT 0, 10
As far as I understand what is happening is that the WHERE clause is evaluating before the subqueries generate, and that I also should be using joins. If anyone can help me out or point me in the right direction that would be incredible.
EDIT: Limit was wrong, fixed it
The first subquery is totally unnecessary because you can filter by dates directly in the current table member_logs. I also prefer a JOIN for the second subquery. Then what you are missing is grouping by date (day).
A query like the following one (not tested) will do the job you want:
SELECT COUNT(ml.member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs ml
INNER JOIN members m ON ml.member_id = m.member_id
WHERE `login_success` = 1
AND DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' AND '2014-03-04'
AND `group_id` = 'XXXXXXX'
AND `deleted` = 0
GROUP BY `reg_date`
ORDER BY `num_users` desc
LIMIT 10
SELECT count(member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and `login_date` IN
(SELECT `login_date`
FROM member_logs
WHERE `login_success` = 1
and (DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' and '2014-03-04'))
and `member_id` IN
(SELECT `member_id`
FROM members
WHERE `group_id` = 'XXXXXXX'
and `deleted` = 0)
Group by `login_date`
ORDER BY `num_users` desc
LIMIT 0, 10
As a slightly more index friendly version of the previous answers;
To make the query index friendly, you shouldn't do per row calculations in the search conditions. This query removes the per row calculation of the string format date in the WHERE, so it should be faster if there are many rows to eliminate by date range;
SELECT COUNT(*) num_users, DATE(login_date) reg_date
FROM member_logs JOIN members ON member_logs.member_id = members.member_id
WHERE login_success = 1 AND group_id = 'XXX' AND deleted = 0
AND login_date >= '2012-02-25'
AND login_date < DATE_ADD('2014-03-04', INTERVAL 1 DAY)
GROUP BY DATE(login_date)
ORDER BY num_users DESC
LIMIT 10

optimize mysql query: medal standings

I have 2 tables:
olympic_medalists with columns gold_country, silver_country, bronze_country
flags with country column
I want to list the olympic medal table accordingly. I have this query, it works, but it seems to kill mysql. Hope someone can help me with an optimized query.
SELECT DISTINCT country AS sc,
IFNULL(
(SELECT COUNT(silver_country)
FROM olympic_medalists
WHERE silver_country = sc AND silver_country != ''
GROUP BY silver_country),0) AS silver_medals,
IFNULL(
(SELECT COUNT(gold_country)
FROM olympic_medalists
WHERE gold_country = sc AND gold_country != ''
GROUP BY gold_country),0) AS gold_medals,
IFNULL(
(SELECT COUNT(bronze_country)
FROM olympic_medalists
WHERE bronze_country = sc AND bronze_country != ''
GROUP BY bronze_country),0) AS bronze_medals
FROM olympic_medalists, flags
GROUP BY country, gold_medals, silver_country, bronze_medals HAVING (
silver_medals >= 1 || gold_medals >= 1 || bronze_medals >= 1)
ORDER BY gold_medals DESC, silver_medals DESC, bronze_medals DESC,
SUM(gold_medals+silver_medals+bronze_medals)
result will be like:
country | g | s | b | tot
---------------------------------
country1 | 9 | 5 | 2 | 16
country2 | 5 | 5 | 5 | 15
and so on
Thanks!
olympic medalists:
`id` int(8) NOT NULL auto_increment,
`gold_country` varchar(64) collate utf8_unicode_ci default NULL,
`silver_country` varchar(64) collate utf8_unicode_ci default NULL,
`bronze_country` varchar(64) collate utf8_unicode_ci default NULL, PRIMARY KEY (`id`)
flags
`id` int(11) NOT NULL auto_increment,
`country` varchar(128) default NULL,
PRIMARY KEY (`id`)
This will be much more efficient than your current solution of executing three different SELECT subqueries for each row in a cross-joined relation (and you wonder why it stalls out!):
SELECT a.country,
COALESCE(b.cnt,0) AS g,
COALESCE(c.cnt,0) AS s,
COALESCE(d.cnt,0) AS b,
COALESCE(b.cnt,0) +
COALESCE(c.cnt,0) +
COALESCE(d.cnt,0) AS tot
FROM flags a
LEFT JOIN (
SELECT gold_country, COUNT(*) AS cnt
FROM olympic_medalists
GROUP BY gold_country
) b ON a.country = b.gold_country
LEFT JOIN (
SELECT silver_country, COUNT(*) AS cnt
FROM olympic_medalists
GROUP BY silver_country
) c ON a.country = c.silver_country
LEFT JOIN (
SELECT bronze_country, COUNT(*) AS cnt
FROM olympic_medalists
GROUP BY bronze_country
) d ON a.country = d.bronze_country
What would be even faster is instead of storing the actual textual country name in each of the gold, silver, and bronze columns, just store the integer-based country id. Comparisons on integers are always going to be faster than comparisons on strings.
Moreover, once you replace each country name in the olympic_medalists table with the corresponding id's, you'll want to create an index on each column (gold, silver, and bronze).
Updating the textual names to be their corresponding id's instead is a simple task and could be done with a single UPDATE statement in conjunction with some ALTER TABLE commands.
try this:
SELECT F.COUNTRY,IFNULL(B.G,0) AS G,IFNULL(B.S,0) AS S,
IFNULL(B.B,0) AS B,IFNULL(B.G+B.S+B.B,0) AS TOTAL
FROM FLAGS F LEFT OUTER JOIN
(SELECT A.COUNTRY,
SUM(CASE WHEN MEDAL ='G' THEN 1 ELSE 0 END) AS G,
SUM(CASE WHEN MEDAL ='S' THEN 1 ELSE 0 END) AS S,
SUM(CASE WHEN MEDAL ='B' THEN 1 ELSE 0 END) AS B
FROM
(SELECT GOLD_COUNTRY AS COUNTRY,'G' AS MEDAL
FROM OLYMPIC_MEDALISTS WHERE GOLD_COUNTRY IS NOT NULL
UNION ALL
SELECT SILVER_COUNTRY AS COUNTRY,'S' AS MEDAL
FROM OLYMPIC_MEDALISTS WHERE SILVER_COUNTRY IS NOT NULL
UNION ALL
SELECT BRONZE_COUNTRY AS COUNTRY,'B' AS MEDAL
FROM OLYMPIC_MEDALISTS WHERE BRONZE_COUNTRY IS NOT NULL)A
GROUP BY A.COUNTRY)B
ON F.COUNTRY=B.COUNTRY
ORDER BY IFNULL(B.G,0) DESC,IFNULL(B.S,0) DESC,
IFNULL(B.B,0) DESC,IFNULL(B.G+B.S+B.B,0) DESC,F.COUNTRY