Need a sql efficient query - mysql

I have around 6 million rows in the table and I am query the table with below query.
SELECT * FROM FD_CPC_HISTORICAL_DATA WHERE id IN (SELECT MAX(id) FROM FD_CPC_HISTORICAL_DATA WHERE fb_ads_account_id=1462257067274960 AND created_at BETWEEN '2019-12-13 00:00:00' AND '2019-12-13 23:59:59' GROUP BY source_text) \G
I have created index for fb_ads_account_id, created_at, source_text. id is primary key.
My question is why this query takes around 9 seconds to get the result even though I have created indexes?
Is there any other way to create this query more efficient?
Here is mysql explain command explanation

This is your query:
SELECT hd.*
FROM FD_CPC_HISTORICAL_DATA hd
WHERE hd.id IN (SELECT MAX(hd2.id)
FROM FD_CPC_HISTORICAL_DATA hd2
WHERE hd2.fb_ads_account_id = 1462257067274960 AND
hd2.created_at >= '2019-12-13' AND
hd2.created_at < '2019-12-14'
GROUP BY source_text
);
I would recommend writing this as:
SELECT hd.*
FROM FD_CPC_HISTORICAL_DATA hd
WHERE hd.fb_ads_account_id = 1462257067274960 AND
hd.id = (SELECT MAX(hd2.id)
FROM FD_CPC_HISTORICAL_DATA hd2
WHERE hd2.fb_ads_account_id = hd.hd.fb_ads_account_id AND
hd2.source_text = hd.source_tx AND
hd2.created_at >= '2019-12-13' AND
hd2.created_at < '2019-12-14'
);
For this query, you want an index on FD_CPC_HISTORICAL_DATA(fb_ads_account_id, source_text,created_at).

This query probably can be performed without a subquery against the same table ie:
SELECT * FROM FD_CPC_HISTORICAL_DATA
WHERE fb_ads_account_id=1462257067274960
AND created_at BETWEEN '2019-12-13 00:00:00' AND '2019-12-13 23:59:59'
ORDER BY id DESC LIMIT 1
if you want the max ID. Or something similar, I am not sure you need the GROUP BY to get the desired result.

I think the index is exactly what you need. The part in the EXPLAIN that confuses me is the (guesstimated?) amount of rows from the subquery being so different from the one in the primary query.
To be honest, I'm not very familiar with MYSQL, but in MSSQL I would give it a try to first dump the results from the subquery into a temporary table, put a unique clustered index on it and then select everything from the original table joined to said temporary table on the ID column. (Don't use IN, use JOIN as there can't be any doubles in the temporary table)
This might also show where all the time is being spent.
My guess is that this is mostly a statistics issue but I don't really know how to force an update of the statistics on the index in MYSQL.
(there is some talk about FLUSH TABLE in https://dzone.com/articles/updating-innodb-table-statistics-manually but it seems to come with some downsides too, use with care)

SELECT f.*
FROM
( SELECT source_text, MAX(created_at) AS mx
FROM FD_CPC_HISTORICAL_DATA
WHERE fb_ads_account_id=1462257067274960
AND created_at >= '2019-12-13'
AND created_at < '2019-12-13' + INTERVAL 1 DAY
GROUP BY source_text
) AS x
JOIN FD_CPC_HISTORICAL_DATA AS f
ON f.account_id = x.account_id
AND f.source_text = x.source_text
AND f.created_at = x.mx
Then you need this composite index:
INDEX(account_id, source_text, created_at) -- in this order
If this does not quite work because of duplicate entries with the same created_at, then a tweak may be possible.

Related

Will adding an index to a column improve the select query (without where) performance in SQL?

I have a MySQL table that contains 20 000 000 rows, and columns like (user_id, registered_timestamp, etc). I have written a below query to get a count of users registered day wise. The query was taking a long time to execute. Will adding an index to the registered_timestamp column improve the execution time?
select date(registered_timestamp), count(userid) from table group by 1
Consider using this query to get a list of dates and the number of registrations on each date.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
GROUP BY date(registered_timestamp)
Then an index on table(registered_timestamp) will help a little because it's a covering index.
If you adapt your query to return dates from a limited range, for example.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
WHERE registered_timestamp >= CURDATE() - INTERVAL 8 DAY
AND registered_timestamp < CURDATE()
GROUP BY date(registered_timestamp)
the index will help. (This query returns results for the week ending yesterday.) However, the index will not help this query.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
WHERE DATE(registered_timestamp) >= CURDATE() - INTERVAL 8 DAY /* slow! */
GROUP BY date(registered_timestamp)
because the function on the column makes the query unsargeable.
You probably can address this performance issue with a MySQL generated column. This command:
ALTER TABLE `table`
ADD registered_date DATE
GENERATED ALWAYS AS DATE(registered_timestamp)
STORED;
Then you can add an index on the generated column
CREATE INDEX regdate ON `table` ( registered_date );
Then you can use that generated (derived) column in your query, and get a lot of help from that index.
SELECT registered_date, COUNT(*)
FROM table
GROUP BY registered_date;
But beware, creating the generated column and its index will take a while.
select date(registered_timestamp), count(userid) from table group by 1
Would benefit from INDEX(registered_timestamp, userid) but only because such an index is "covering". The query will still need to read every row of the index, and do a filesort.
If userid is the PRIMARY KEY, then this would give you the same answers without bothering to check each userid for being NOT NULL.
select date(registered_timestamp), count(*) from table group by 1
And INDEX(registered_timestamp) would be equivalent to the above suggestion. (This is because InnoDB implicitly tacks on the PK.)
If this query is common, then you could build and maintain a "summary table", which collects the count every night for the day's registrations. Then the query would be a much faster fetch from that smaller table.

Mysql query very slow and not using proper index (By using group by, IN operator )

Below query was taking 5+ sec time to execute ( Table contains 1m+ records ).
Outer query was not using proper index it always fetching data by using FULL table scan.can someone help me how to optimize it..
Query
SELECT x
FROM UserCardXref x
WHERE x.userCardXrefId IN(
SELECT MAX(y.userCardXrefId)
FROM UserCardXref y
WHERE y.usrId IN(1001,1002)
GROUP
BY y.usrId
HAVING COUNT(*) > 0
)
Query Explain
Query Statistics
Execution Plan
I would re-write the query as
select x.* from UserCardXref x
join (
select max(userCardXrefId),usrId from UserCardXref
where usrId in (1001,1002) group by usrId
)y on x.userCardXrefId = y.userCardXrefId
The indexes you will need as
alter table UserCardXref add index userCardXrefId_idx(userCardXrefId)
usrId is already indexed as per the explain plan so no need to add that
Also you have having count(*)>0 you are already using max() function and it would never have 0 rows for a given group so I have removed that.

Select / order by desc / limit query speed

I see a big difference in speed between these two queries; the first one runs in 0.3 seconds, and the second in 76 seconds.
The first query only selects the key, whereas the second query select an additional field, which is an int(11). I can substitute the second field for any other field, same result. Selecting only the key is much faster for some reason?
Can anyone possibly explain the huge difference in speed? I'm stumped by this.
Q1:
SELECT ID
FROM TRMSMain.tblcalldata
WHERE (
CallStarted BETWEEN '2014/06/13' AND '2014/06/13 23:59:59')
ORDER BY ID DESC LIMIT 0 , 50
Q2:
SELECT ID, Chanid
FROMTRM SMain.tblcalldata
WHERE (
CallStarted BETWEEN '2014/06/13' AND '2014/06/13 23:59:59')
ORDER BY ID DESC LIMIT 0 , 50
Regards
I suppose you have an index on your CallStarted column and ID is the primary key. Your first query can do a so-called range scan on that index, and retrieve the row identities quickly because secondary indexes on InnoDB also include the primary key.
So it ends up doing just a little bit of work.
Your second query has to fetch data from the main table. In particular it has to fetch the ChanID variable. It then has to sort the whole mess, grab 50 values from the end, and discard the rest of the sort.
Do a deferred join to pick up the extra columns. That is, just sort the ID numbers, then grab the rest of the data you need from the table. That way you only have to grab 50 rows' worth of data.
Like so:
SELECT a.ID, a.Chanid, a.WhatEver, a.WhatElse
FROM TRMSMain.tblcalldata a
JOIN (
SELECT ID
FROM TRMSMain.tblcalldata
WHERE CallStarted BETWEEN '2014/06/13' AND '2014/06/13 23:59:59'
ORDER BY ID DESC
LIMIT 0, 50
) b ON a.ID = b.ID
ORDER BY a.ID DESC
You know the inner query is fast; you've proven that. The JOIN merely exploits that quickness (based on good use of indexes) to get the detail data for the rows it needs.
Pro tip: Avoid BETWEEN for date/time ranges, because as you know it handles the end of the range poorly. This will perform just as well and avoid the 59:59 nonsense.
WHERE CallStarted >= '2014/06/13'
AND CallStarted < '2014/06/13' + INTERVAL 1 DAY
It grabs records starting at midnight on June 13, and gets them all up to but not including (<) midnight on the next day.

How to improve SQL query performance in MySQL

I have a MySQL table which stores backup log entries from a large number of devices (currently about 750). I need to use a query to get details of the last entry for each device. I am currently using a nested query to achieve this, which was working fine initially. However the table now has thousands of rows and the query is taking a long time to run. I would like to improve the performance of the query, and would like to know if this is possible through the use of joins instead of a nested select statement, or through some other improvement I could make.
The current query is:
SELECT id, pcname, pcuser, checkTime as lastCheckTime,
TIMESTAMPDIFF(SECOND,checkTime,now()) as lastCheckAge,
lastBackup, diff, threshold, backupStatus,
TIMESTAMPDIFF(SECOND,lastBackup,now()) as ageNow
FROM checkresult
WHERE (checkTime, pcname) IN
(
SELECT max(checkTime), pcname
FROM checkresult
GROUP BY pcname
)
ORDER BY id desc;
id is the primary key for the table, and is the only indexed column.
The table uses InnoDB
Try using an explicit join instead:
SELECT id, checkresult.pcname, pcuser, checkTime as lastCheckTime,
TIMESTAMPDIFF(SECOND,checkTime,now()) as lastCheckAge,
lastBackup, diff, threshold, backupStatus,
TIMESTAMPDIFF(SECOND,lastBackup,now()) as ageNow
FROM checkresult join
(SELECT pcname, max(checkTime) as maxct
FROM checkresult
GROUP BY pcname
) pm
on checkresult.pcname = pm.pcname and checkresult.checkTime = pm.maxct
ORDER BY id desc;

SQL query: Delete all records from the table except latest N?

Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?
Something like this, only it doesn't work :)
delete from table order by id ASC limit ((select count(*) from table ) - N)
Thanks.
You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.
This works (tested in MySQL 5.0.67):
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
The intermediate subquery is required. Without it we'd run into two errors:
SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator.
Fortunately, using an intermediate subquery allows us to bypass both of these limitations.
Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answer as well to see if it fits yours.
I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.
The solutions that actually worked were the Alex Barrett's double sub-query/NOT IN method (similar to Bill Karwin's), and Quassnoi's LEFT JOIN method.
Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.
What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <= instead of NOT IN:
DELETE FROM `test_sandbox`
WHERE id <= (
SELECT id
FROM (
SELECT id
FROM `test_sandbox`
ORDER BY id DESC
LIMIT 1 OFFSET 42 -- keep this many records
) foo
);
It uses OFFSET to get the id of the Nth record and deletes that record and all previous records.
Since ordering is already an assumption of this problem (ORDER BY id DESC), <= is a perfect fit.
It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.
Test case
I tested the three working methods and the new method above in two test cases.
Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).
+-----------+------------------------+----------------------+
| | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN | 3.2542 seconds | 0.1629 seconds |
| NOT IN v2 | 4.5863 seconds | 0.1650 seconds |
| <=,OFFSET | 0.0204 seconds | 0.1076 seconds |
+-----------+------------------------+----------------------+
What's interesting is that the <= method sees better performance across the board, but actually gets better the more you keep, instead of worse.
Unfortunately for all the answers given by other folks, you can't DELETE and SELECT from a given table in the same query.
DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);
ERROR 1093 (HY000): You can't specify target table 'mytable' for update
in FROM clause
Nor can MySQL support LIMIT in a subquery. These are limitations of MySQL.
DELETE FROM mytable WHERE id NOT IN
(SELECT id FROM mytable ORDER BY id DESC LIMIT 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'
The best answer I can come up with is to do this in two stages:
SELECT id FROM mytable ORDER BY id DESC LIMIT n;
Collect the id's and make them into a comma-separated string:
DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );
(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be integer values from the database itself.)
note: Though this doesn't get the job done in a single query, sometimes a more simple, get-it-done solution is the most effective.
DELETE i1.*
FROM items i1
LEFT JOIN
(
SELECT id
FROM items ii
ORDER BY
id DESC
LIMIT 20
) i2
ON i1.id = i2.id
WHERE i2.id IS NULL
If your id is incremental then use something like
delete from table where id < (select max(id) from table)-N
To delete all the records except te last N you may use the query reported below.
It's a single query but with many statements so it's actually not a single query the way it was intended in the original question.
Also you need a variable and a built-in (in the query) prepared statement due to a bug in MySQL.
Hope it may be useful anyway...
nnn are the rows to keep and theTable is the table you're working on.
I'm assuming you have an autoincrementing record named id
SELECT #ROWS_TO_DELETE := COUNT(*) - nnn FROM `theTable`;
SELECT #ROWS_TO_DELETE := IF(#ROWS_TO_DELETE<0,0,#ROWS_TO_DELETE);
PREPARE STMT FROM "DELETE FROM `theTable` ORDER BY `id` ASC LIMIT ?";
EXECUTE STMT USING #ROWS_TO_DELETE;
The good thing about this approach is performance: I've tested the query on a local DB with about 13,000 record, keeping the last 1,000. It runs in 0.08 seconds.
The script from the accepted answer...
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
Takes 0.55 seconds. About 7 times more.
Test environment: mySQL 5.5.25 on a late 2011 i7 MacBookPro with SSD
DELETE FROM table WHERE ID NOT IN
(SELECT MAX(ID) ID FROM table)
try below query:
DELETE FROM tablename WHERE id < (SELECT * FROM (SELECT (MAX(id)-10) FROM tablename ) AS a)
the inner sub query will return the top 10 value and the outer query will delete all the records except the top 10.
What about :
SELECT * FROM table del
LEFT JOIN table keep
ON del.id < keep.id
GROUP BY del.* HAVING count(*) > N;
It returns rows with more than N rows before.
Could be useful ?
Using id for this task is not an option in many cases. For example - table with twitter statuses. Here is a variant with specified timestamp field.
delete from table
where access_time >=
(
select access_time from
(
select access_time from table
order by access_time limit 150000,1
) foo
)
Just wanted to throw this into the mix for anyone using Microsoft SQL Server instead of MySQL. The keyword 'Limit' isn't supported by MSSQL, so you'll need to use an alternative. This code worked in SQL 2008, and is based on this SO post. https://stackoverflow.com/a/1104447/993856
-- Keep the last 10 most recent passwords for this user.
DECLARE #UserID int; SET #UserID = 1004
DECLARE #ThresholdID int -- Position of 10th password.
SELECT #ThresholdID = UserPasswordHistoryID FROM
(
SELECT ROW_NUMBER()
OVER (ORDER BY UserPasswordHistoryID DESC) AS RowNum, UserPasswordHistoryID
FROM UserPasswordHistory
WHERE UserID = #UserID
) sub
WHERE (RowNum = 10) -- Keep this many records.
DELETE UserPasswordHistory
WHERE (UserID = #UserID)
AND (UserPasswordHistoryID < #ThresholdID)
Admittedly, this is not elegant. If you're able to optimize this for Microsoft SQL, please share your solution. Thanks!
If you need to delete the records based on some other column as well, then here is a solution:
DELETE
FROM articles
WHERE id IN
(SELECT id
FROM
(SELECT id
FROM articles
WHERE user_id = :userId
ORDER BY created_at DESC LIMIT 500, 10000000) abc)
AND user_id = :userId
This should work as well:
DELETE FROM [table]
INNER JOIN (
SELECT [id]
FROM (
SELECT [id]
FROM [table]
ORDER BY [id] DESC
LIMIT N
) AS Temp
) AS Temp2 ON [table].[id] = [Temp2].[id]
DELETE FROM table WHERE id NOT IN (
SELECT id FROM table ORDER BY id, desc LIMIT 0, 10
)
Stumbled across this and thought I'd update.
This is a modification of something that was posted before. I would have commented, but unfortunately don't have 50 reputation...
LOCK Tables TestTable WRITE, TestTable as TestTableRead READ;
DELETE FROM TestTable
WHERE ID <= (
SELECT ID
FROM TestTable as TestTableRead -- (the 'as' declaration is required for some reason)
ORDER BY ID DESC LIMIT 1 OFFSET 42 -- keep this many records);
UNLOCK TABLES;
The use of 'Where' and 'Offset' circumvents the sub-query.
You also cannot read and write from the same table in the same query, as you may modify entries as they're being used. The Locks allow to circumvent this. This is also safe for parallel access to the database by other processes.
For performance and further explanation see the linked answer.
Tested with mysql Ver 15.1 Distrib 10.5.18-MariaDB
For further details on locks, see here
Why not
DELETE FROM table ORDER BY id DESC LIMIT 1, 123456789
Just delete all but the first row (order is DESC!), using a very very large nummber as second LIMIT-argument. See here
Answering this after a long time...Came across the same situation and instead of using the answers mentioned, I came with below -
DELETE FROM table_name order by ID limit 10
This will delete the 1st 10 records and keep the latest records.