Summary Join query taking too long - mysql

I am trying to get the max record of each 'telephone_number' where process_status='0' and for that I am using the below query.
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (SELECT MAX( id ) AS maxid FROM SPRINTABLE_DATA GROUP BY telephone_number)dt
ON t.id = dt.maxid WHERE process_status = '0'
AND RESET_FLAG = '0'
ORDER BY id DESC limit 0,700
The above query is giving me the desired result but the problem is this is too slow..
My table has about 20 million rows and this query is taking about 15-20 mins at times.
What can be done to improve this?
This is the structure:.
CREATE TABLE `SPRINTABLE_DATA` (
`ID` bigint(11) NOT NULL AUTO_INCREMENT,
`CUSTID` int(11) DEFAULT NULL,
`telephone_number` varchar(20) DEFAULT NULL,
`TOTAL_USAGE` int(11) DEFAULT NULL,
`PROCESS_STATUS` tinyint(4) DEFAULT '0',
`RESET_FLAG` tinyint(4) DEFAULT '0',
`RESET_REASON` varchar(10) DEFAULT NULL,
`PLAN_ID` varchar(20) DEFAULT NULL,
`ACCOUNT_STATUS` varchar(30) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `telephone_number` (`telephone_number`),
KEY `CALL_CUST` (`CALL_START_TIME`,`CUSTID`),
KEY `telephone_number1` (`telephone_number `,`PROCESS_STATUS`,`SOC_ADDED`),
KEY `CURRENT_USAGE` (`CURRENT_USAGE`),
KEY `TOTAL_USAGE` (`TOTAL_USAGE`)
) ENGINE=InnoDB AUTO_INCREMENT=36392272 DEFAULT CHARSET=latin1

It seems that you are looking for the 700 most recently called numbers. (If that is not correct, please edit your question.
Your query follows a good practice for retrieving the latest log row for each item (telephone number in your case), as follows, in your subquery.
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
To optimize the performance of this subquery, you need a compound index on two fields: (telephone_number, id), in that order. If you don't have that index, add it in. This is to allow a so-called loose index scan, an extraordinarily efficient way of satisfying a query.
Secondly, you're looking for (I presume) a small subset of your data. Presumably you have plenty more than 700 distinct telephone_number values. This means you're sorting a lot of data with ORDER BY only to discard it with LIMIT. So, let's do a deferred join, sorting a minimal number of columns, and then retrieving all the information you need.
Here's how to get the ID values of the 700 rows you need
SELECT q.ID /* get our 700 records */
FROM SPRINTABLE_DATA q
JOIN (
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
) r ON q.id = r.id
WHERE q.process_status = '0'
AND q.RESET_FLAG = '0'
ORDER BY q.ID DESC
LIMIT 0,700
This pulls out 700 id numbers. You need to do some experimenting with indexes to find out what helps the most to optimize this. It's possible that an index on
process_status, RESET_FLAG, id
will help. It's also possible that changing the order of columns in the index will help, like this:
id, process_status, RESET_FLAG
Try them both.
Finally, we'll use this as a subquery to carry out the join (the so-called deferred join) to fetch the actual detail records. This technique gets rid of the need for sorting all that data.
SELECT t.ID, t.CUSTID, t.telephone_number, t.TOTAL_USAGE, t.ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (
SELECT q.ID /* get our 700 records */
FROM SPRINTABLE_DATA q
JOIN (
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
) r ON q.id = r.id
WHERE q.process_status = '0'
AND q.RESET_FLAG = '0'
ORDER BY q.ID DESC
LIMIT 0,700
) s ON t.ID = s.ID
ORDER BY t.ID DESC
This will yield the same results, but will be faster.
Now, finally, if it's possible to select the latest calls from the 700 numbers that meet your criteria, you can simplify this query a lot. This will change your result set in a subtle way, though. In that case your call-selection subquery will look like this:
SELECT MAX( id ) AS id /* 700 matching numbers */
FROM SPRINTABLE_DATA
WHERE process_status = '0'
AND reset_flag = '0'
GROUP BY telephone_number
ORDER BY ID desc
LIMIT 0,700
With a compound covering index on
reset_flag, process_status, telephone_number, ID
this query will be quite fast. Your final query in this case would be
SELECT t.ID, t.CUSTID, t.telephone_number, t.TOTAL_USAGE, t.ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (
SELECT MAX( id ) AS id /* 700 matching numbers */
FROM SPRINTABLE_DATA
WHERE process_status = '0'
AND reset_flag = '0'
GROUP BY telephone_number
ORDER BY ID desc
LIMIT 0,700
) s ON t.ID = s.ID
ORDER BY t.ID DESC

Made a slight modification to your query
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (SELECT telephone_number,MAX( id ) AS maxid FROM SPRINTABLE_DATA GROUP BY telephone_number)dt
ON t.id = dt.maxid WHERE process_status = '0'
AND RESET_FLAG = '0'
ORDER BY id DESC limit 0,700
Add these indexes if are not there already
ALTER TABLE SPRINTABLE_DATA ADD KEY (telephone_number,id)
ALTER TABLE SPRINTABLE_DATA ADD KEY (process_status,reset_flag,id)
Another option which is probably the fastest is to use a correlated subquery
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t WHERE EXISTS
(SELECT MAX( id ) FROM SPRINTABLE_DATA tt WHERE t.id=tt.id AND tt.process_status = '0'
AND tt.RESET_FLAG = '0' )
ORDER BY id DESC limit 0,700
For this you need
ALTER TABLE SPRINTABLE_DATA ADD KEY (id,process_status,reset_flag)

Related

Mysql Different order by result between inner join query and exists query

I have 2 tables in the database
User table
has columns (name, name_ar, ...)
User Profile table
has columns (user_id, office_id, address, mobile, ...)
the relationship between the two tables is one to one relation
Now, I'm trying to filter users by their office and order them by name_ar.
I tried two different queries to do this and I expect the same result from the two queries but the result is different in order.
SELECT
`id`, `name_ar`
FROM
`users`
WHERE EXISTS
(
SELECT
*
FROM
`user_profiles`
WHERE
`users`.`id` = `user_profiles`.`user_id` AND `office_id` = 1
) AND(
`group` = "doctor" AND `state` = "active"
) AND `users`.`deleted_at` IS NULL
ORDER BY
`name_ar` IS NULL, `name_ar` ASC
SELECT
`u`.`id`,
`name_ar`
FROM
`users` u
INNER JOIN `user_profiles` up ON
`u`.`id` = `up`.`user_id`
WHERE
`group` = "doctor" AND `state` = "active" AND `up`.`office_id` = 1
ORDER BY
`name_ar` IS NULL, `name_ar` ASC
the two results do not have the same order from the beginning of appearing null value in name_ar column (from the fifth row exactly the order is different between the two results), Can any explain for me why is this happens? Is it because of null values or another reason?
The 1st condition of the ORDER BY clause:
`name_ar` IS NULL
sends all nulls to the end of the results.
The 2nd:
`name_ar` ASC
sorts the non null names alphabetically but when it comes to the null names at the end there is not any defined order for them.
What you can do is add another final condition, like:
`id` ASC
so you have all the nulls (and duplicate names if they exist) sorted by id:
ORDER BY `name_ar` IS NULL, `name_ar`, `id`

MySQL 5.7.25 group by sub-query and order by "nonaggregated column" error

In short, I'm trying to order a dateset by date, and then group by another column, thus selecting the latest row of each.
Query:
SELECT name, datetime
FROM (
SELECT *
FROM `requests`
ORDER BY datetime
) a
GROUP BY a.name;
Error:
#1055 - Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'a.datetime' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Example table:
CREATE TABLE `requests` (
`id` int(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT NULL,
`datetime` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1
The goal is to prevent this error from happening without having to change the default sql-mode.
After reading more about group by and only_full_group_by, I currently do not understand why the sub-query is affecting the outer query.
Query is written in accordance to https://stackoverflow.com/a/16307932/3852461
You should not use GROUP BY without an aggregation function like sum() or min().
Use DISTINCT if you want a distinct result
SELECT distinct name, datetime
FROM (
SELECT *
FROM `requests`
ORDER BY datetime
) a
but if you need single rows for name the you should use an aggregation function for datetime eg
SELECT name, max(datetime)
FROM (
SELECT *
FROM `requests`
ORDER BY datetime
) a
group by name
If we want to return the latest datetime for each distinct value of name, the normative pattern would be:
SELECT t.name
, MAX(t.datetime) AS latest_datetime
FROM requests t
GROUP
BY t.name
ORDER
BY ...
If the (name,datetime) tuple is guaranteed to be unique, we can retrieve the row with with the latest time by joining the result of the query above back to the table
SELECT r.id
, r.name
, r.datetime
FROM ( SELECT t.name
, MAX(t.datetime) AS latest_datetime
FROM requests t
GROUP
BY t.name
) s
JOIN requests r
ON r.name <=> s.name
AND r.datetime <=> s.latest_datetime
ORDER
BY ...
If the (name,datetime) tuple is not unique, then the query above could potentially return multiple rows with the same values of name and datetime. There are approaches to handling that; given the defintion of the requests table, simplest would be to wrap the id column in an aggregate, and add a GROUP BY clause on the outer query ...
SELECT MIN(r.id) AS id
, r.name
, r.datetime
FROM ( SELECT t.name
, MAX(t.datetime) AS latest_datetime
FROM requests t
GROUP
BY t.name
) s
JOIN requests r
ON r.name <=> s.name
AND r.datetime <=> s.latest_datetime
GROUP
BY r.name
, r.datetime
ORDER
BY ...
https://www.db-fiddle.com/f/b2EAh6UiVyEdNVbEKbUEcQ/0
SELECT r.name, r.datetime
FROM `requests` r
LEFT JOIN `requests` r2
ON r.name = r2.name
AND r.datetime < r2.datetime
WHERE r2.name IS NULL;
or just regular GROUP BY:
SELECT r.name, MAX(r.datetime)
FROM `requests` r
GROUP BY r.name;

Why is this query really slow with 70k+ rows?

First of all, this is my table structure:
CREATE TABLE IF NOT EXISTS `site_forum_comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`forum_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`data` int(11) NOT NULL,
`comment` longtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
Before importing my backup, it had like 10-15 rows and I made a ranking system based on number of comments and this query was working flawlessly:
SELECT u.id, u.username, COUNT(f.id) AS rank
FROM site_users AS u
LEFT JOIN site_forum_comments AS f ON (f.user_id = u.id)
GROUP BY u.id
ORDER BY rank DESC
LIMIT :l
But now, with more than 70k rows inserted, the script won't even load and just crashes the server.
What have I possibly done wrong? Is this problem about the query specifically or is it the table structure?
Thanks in advance, cheers!
This is your query:
SELECT u.id, u.username, COUNT(f.id) AS rank
FROM site_users u LEFT JOIN
site_forum_comments f
ON f.user_id = u.id
GROUP BY u.id
ORDER BY rank DESC
LIMIT :l
Because you are choosing the highest ranked user, you can probably use an inner join rather than an outer join. In any case, this version doesn't have a great many optimization opportunities. But, you need an index on site_forum_comments(user_id, id).
You might get better performance with the same index and a correlated subquery:
SELECT u.id, u.username,
(SELECT COUNT(*)
FROM site_forum_comments f
WHERE f.user_id = u.id
) as rank
FROM site_users u
ORDER BY rank DESC
LIMIT :l;
You are currently joining all users to their comments without an index on the user_id column thats slow.
The following query will select the highest user first and only join that one user with the highest rank with the site_users table (using the index over site_users.id). So it should be faster.
SELECT site_users.id, site_users.username, a.rank
FROM (
SELECT user_id, COUNT(*) as rank
FROM site_forum_comments
GROUP BY user_id
ORDER BY rank DESC
LIMIT 1
) AS a
LEFT JOIN site_users ON a.user_id = site_users.id
note that with this query you won't get a result if the rank is 0

mysql explain slow where on left joined table

Playing with a mysql and thinking how to solve one thing in the future. I want to retrieve statuses which are posted by my friends (specific user ids) or are posted inside of the group I follow.
CREATE TABLE `status` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`status` text COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_F23501207E3C61F9` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1567559 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `group_status` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`group_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_F23501207E3C61F9` (`group_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I fed both tables with 1M rows.
The query I am running:
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
OR gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
ORDER BY s.id DESC
LIMIT 15
The result:
Question one:
Shouldn't be the extra role like: "using index" instead of "where" ?
Question two:
Why is the response time so slow? 2,3s
Edit after Tim's answer:
The filesort behaviour I guess is normal when using union no?
Why there is 'using where' in the second row of explain? If in the third is 'using where, using index' ?
In case of how many returned rows from selects you think this would get slow?
The union select seems to be super fast but there is only few rows returning each select currently. I will try to select more rows in each select.
Where you have an "OR" on different columns, mysql may use none of your indexes.
Usually we can solve the problem using "UNION" two separate queries with each matching one of the criteria.
SELECT id, status, group_id FROM
(
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
UNION
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
) t
ORDER BY id DESC
LIMIT 15
However, in your case, this may NOT help if either query returns large number of records.
Your status column is defined as text, which may cause the file sort. You can check it to a long varchar to see if the filesort goes way. Or try this to avoid worse case scenario:
SELECT ss.id, group_id, ss.status
FROM (
SELECT id, group_id FROM
(
SELECT s.id, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
UNION
SELECT s.id, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
) t
ORDER BY id DESC
LIMIT 15
) f
JOIN status ss
ON f.id =ss.id
ORDER BY ss.id

Optimising a working MYSQL statement

Background
I have a table of "users", a table of "content", and a table of "content_likes". When a user "likes" an item of content, a relation is added to "content_likes". Simple.
Now what I am trying to do is order content based on the number of likes it has received. This is relatively easy, however, I only want to retrieve 10 items at a time and then with a lazy load I am retrieving the next 10 items and so forth. If the select was ordered by time it would be easy to do the offset in the select statement, however, due to the ordering by number of "likes" I need another column I can offset by. So I've added a "rank" column to the result set, then on the next call of 10 items I can offset by this.
This query WORKS and does what I need to do. However, I am concerned about performance. Could anyone advise on optimising this query. Or even possibly a better way of doing it.
DB SCHEMA
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `content` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`owner_id` int(11) NOT NULL,
`added` int(11) NOT NULL,
`deleted` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `content_likes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`added` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
*columns omitted for simplicity
Breakdown of query
group content_id in content_likes relations table, and order by likes desc
add a column "rank" (or row number) to result set and order by this
join "content" table so that any content with a deleted flag can be ommited
only return results where "rank" (or row number) is greater than variable
limit result set to 10
THE MYSQL
SELECT
results.content_id, results.likes, results.rank
FROM
(
SELECT
t1.content_id, t1.likes, #rn:=#rn+1 AS rank
FROM
(
SELECT
cl.content_id,
COUNT(cl.content_id) AS likes
FROM
content_likes cl
GROUP BY
cl.content_id
ORDER BY
likes DESC,
added DESC
) t1, (SELECT #rn:=0) t2
ORDER BY
rank ASC
) results
LEFT JOIN
content c
ON
(c.id = results.content_id)
WHERE
c.deleted <> 1
AND
results.rank > :lastRank
LIMIT
10
MYSQL ALTERNATIVE
SELECT
*
FROM
(
SELECT
results.*, #rn:=#rn+1 AS rank
FROM
(
SELECT
c.id, cl.likes
FROM
content c
INNER JOIN
(SELECT content_id, COUNT(content_id) AS likes FROM content_likes GROUP BY content_id ORDER BY likes DESC, added DESC) cl
ON
c.id = cl.content_id
WHERE
c.deleted <> 1
AND
c.added > :timeago
LIMIT
100
) results, (SELECT #rn:=0) t2
) final
WHERE
final.rank > :lastRank
LIMIT
5
The "Alternative" mysql query works as I would like it too also. Content is ordered by number of likes by users and I can offset by inserting the last row number. What I have attempted to do here is limit the result sets so if and when the tables get large performance isn't hindered too badly. In this example only content from within a timespan, and limit to 100 will be returned. Then I can offset by the row number (lazy load/pagination)
Any help or advice always appreciated. I am relatively a newbie to mysql so be kind :)
You can eliminate the subquery:
SELECT results.content_id, results.likes, results.rank
FROM (SELECT cl.content_id, COUNT(cl.content_id) AS likes, #rn:=#rn+1 AS rank
FROMc content_likes cl cross join
(SELECT #rn:=0) t2
GROUP BY cl.content_id
ORDER BY likes DESC, added DESC
) results LEFT JOIN
content c
ON c.id = results.content_id
WHERE c.deleted <> 1 AND
results.rank > :lastRank
LIMIT 10;
However, I don't think that will have an appreciable affect on performance. What you should probably do is store the last number number of likes and "added" value and use these to filter the data. The query needs to be a little fixed up, because added is not unambiguously defined in the order by clause:
SELECT results.content_id, results.likes, results.rank, results.added
FROM (SELECT cl.content_id, COUNT(cl.content_id) AS likes, MAX(added) as added, #rn:=#rn+1 AS rank
FROMc content_likes cl cross join
(SELECT #rn := :lastRank) t2
WHERE likes < :likes or
likes = :likes and added < :added
GROUP BY cl.content_id
ORDER BY likes DESC, added DESC
) results LEFT JOIN
content c
ON c.id = results.content_id
WHERE c.deleted <> 1 AND
results.rank > :lastRank
LIMIT 10;
This will at least reduce the number of rows that need to be sorted.