One of my SQL queries is very slow. I need to COUNT on a table with a total of close to 300,000 records, but it takes 8 seconds for the query to return results.
SELECT oc_subject.*,
(SELECT COUNT(sid) FROM oc_details
WHERE DATE(oc_details.created) > DATE(NOW() - INTERVAL 1 DAY)
AND oc_details.sid = oc_subject.id) as totalDetails
FROM oc_subject
WHERE oc_subject.status='1'
ORDER BY created DESC LIMIT " . (int)$start . ", " . (int)$limit;
In this way: total 50, Query 8.5837 second
SELECT oc_subject.*
FROM oc_subject
WHERE oc_subject.status='1'
ORDER BY created DESC LIMIT 0, 50
Without Count: total 50, Query 0.0457 second
Lots of improvements possible:
Firstly, let's talk about the outer query (main SELECT query) on the oc_subject table. This query can take the benefit of ORDER BY Optimization by using the composite index: (status, created). So, define the following index (if not defined already):
ALTER TABLE oc_subject ADD INDEX (status, created);
Secondly, your subquery to get Count is not Sargeable, because of using Date() function on the column inside WHERE clause. Due to this, it cannot use indexes properly.
Also, DATE(oc_details.created) > DATE(NOW() - INTERVAL 1 DAY) simply means that you are trying to consider those details which are created on the current date (today). This can be simply written as: oc_details.created >= CURRENT_DATE . Trick here is that even if created column is of datetime type, MySQL will implictly typecast the CURRENT_DATE value to CURRENT_DATE 00:00:00.
So change the inner subquery to as follows:
SELECT COUNT(sid)
FROM oc_details
WHERE oc_details.created >= CURRENT_DATE
AND oc_details.sid = oc_subject.id
Now, all the improvements on inner subquery will only be useful when you have a proper index defined on the oc_details table. So, define the following Composite (and Covering) Index on the oc_details table: (sid, created). Note that the order of columns is important here because created is a Range condition, hence it should appear at the end. So, define the following index (if not defined already):
ALTER TABLE oc_details ADD INDEX (sid, created);
Fourthly, in case of multi-table queries, it is advisable to use Aliasing, for code clarity (enhanced readability), and avoiding unambiguous behaviour.
So, once you have defined all the indexes (as discussed above), you can use the following query:
SELECT s.*,
(SELECT COUNT(d.sid)
FROM oc_details AS d
WHERE d.created >= CURRENT_DATE
AND d.sid = s.id) as totalDetails
FROM oc_subject AS s
WHERE s.status='1'
ORDER BY s.created DESC LIMIT " . (int)$start . ", " . (int)$limit;
Instead of several subquery (one of each row in your oc_subject table) you could try using a join on a single subquery grouped by sid, and date()
SELECT oc_subject.*, b.count_sid
FROM oc_subject
INNER JOIN (
SELECT sid, DATE(oc_details.created) date_created, COUNT(sid) count_sid
FROM oc_details
GROUP BY sid, DATE(oc_details.created)
) b on b.sid = oc_subject.id
AND b.date_created > DATE(NOW() - INTERVAL 1 DAY)
WHERE oc_subject.status='1'
ORDER BY created DESC LIMIT " . (int)$start . ", " . (int)$limit;
anyway be careful using php var for limit .. be sure you use sanited values for avoid sqlinjection .
Related
I have the following mysql query that I think should be faster. The database table has 1 million records and the query table 3.5 seconds
set #numberofdayssinceexpiration = 1;
set #today = DATE(now());
set #start_position = (#pagenumber-1)* #pagesize;
SELECT *
FROM (SELECT ad.id,
title,
description,
startson,
expireson,
ad.appuserid UserId,
user.email UserName,
ExpiredCount.totalcount
FROM advertisement ad
LEFT JOIN (SELECT servicetypeid,
Count(*) AS TotalCount
FROM advertisement
WHERE Datediff(#today,expireson) =
#numberofdayssinceexpiration
AND sendreminderafterexpiration = 1
GROUP BY servicetypeid) AS ExpiredCount
ON ExpiredCount.servicetypeid = ad.servicetypeid
LEFT JOIN aspnetusers user
ON user.id = ad.appuserid
WHERE Datediff(#today,expireson) = #numberofdayssinceexpiration
AND sendreminderafterexpiration = 1
ORDER BY ad.id) AS expiredAds
LIMIT 20 offset 1;
Here's the execution plan:
Here are the indexes defined on the table:
I wonder what I am doing wrong.
Thanks for any help
First, I would like to point out some problems. Then I will get into your Question.
LIMIT 20 OFFSET 1 gives you 20 rows starting with the second row.
The lack of an ORDER BY in the outer query may lead to an unpredictable ordering. In particular, the Limit and Offset can pick whatever they want. New versions will actually throw away the ORDER BY in the subquery.
DATEDIFF, being a function, makes that part of the WHERE not 'sargeable'. That is it can't use an INDEX. The usual way (which is sargeable) to compare dates is (assuming expireson is of datatype DATE):
WHERE expireson >= CURDATE() - INTERVAL 1 DAY
Please qualify each column name. With that, I may be able to advise on optimal indexes.
Please provide SHOW CREATE TABLE so that we can see what column(s) are in each index.
The question is about using a column (which is created in query) as where clause criteria.
There are 2 tables named transactions and transactionmovements.
In transaction, there are unique info for transaction like date, counterparty etc.
In transactionmovements, there are articles which is used in transaction. Such as product, quantity, price, etc. And transactionmovements has a 'transaction' column which references to transactions.id shows which transaction the movement belongs.
In the query, I created a totalPrice value with sum of quantity*price of each movement that belongs to a transaction.
Everything works perfectly but the last parameter of WHERE clause. If I delete "AND
totalPrice > 10" part, it gives me everything including totalPrice and totalQuantity of a transaction.
But if I place "AND totalPrice > 10" to the end, it returns following error:
-#1054 - Unknown column 'totalPrice' in 'where clause'
SELECT
`transactions`.id,
`transactions`.type,
`transactions`.date,
`transactions`.VAT,
`transactions`.currency,
`companies`.name AS counterparty,
COALESCE
(sum(`transactionmovements`.price*`transactionmovements`.quantity)
+(`transactions`.shippingQuantity*`transactions`.shippingPrice)) as totalPrice,
COALESCE
(sum(`transactionmovements`.quantity)) as totalQuantity
FROM
`transactions`
LEFT JOIN `companies` ON `transactions`.counterparty = `companies`.id
LEFT JOIN `transactionmovements` ON `transactions`.id=`transactionmovements`.transaction
WHERE ( `transactions`.type = 'p' OR `transactions`.type = 'r' OR `transactions`.type = 's' OR `transactions`.type = 't')
AND
(`transactions`.date BETWEEN IFNULL('','1900-01-01') AND IFNULL('2020-02-14',NOW()))
AND
totalPrice > 10
GROUP BY `transactions`.id
ORDER BY id desc
LIMIT 10
I tried using the whole math operation in WHERE clause, but no gains. I tried to use HAVING with WHERE but couldn't manage it.
The last solution I have is running it without filtering by totalPrice and store it into a php array. Then filter in array, but there I can't use LIMIT so array will be very big.
As per how SQL query are executed in the order that dictate SELECT is executed after WHERE, you can't use ALIAS in WHERE since it is unknown at the moment. Therefor, you should change every alias uses outside of SELECT to its definition.
As Thomas Jeriko suggested in comments, creating a view was exactly what I need.
First I create a virtual table with Create View.
CREATE VIEW transactionreportview
AS SELECT
transactions.id id,
transactions.type type,
transactions.date date,
transactions.VAT VAT,
transactions.currency currency,
transactions.counterparty counterparty,
(SUM(transactionmovements.price*transactionmovements.quantity)+transactions.shippingQuantity*transactions.shippingPrice) totalPrice,
SUM(transactionmovements.quantity) totalQuantity
FROM transactions transactions, transactionmovements transactionmovements
WHERE transactions.id = transactionmovements.transaction
GROUP BY transactions.id;
While the virtual table acts like a simple mysql table, I ran a new query in the table
SELECT * FROM transactionReportView WHERE ..."
Then after finishing my work with the virtual table, drop it
DROP VIEW transactionreportview
This my query with its performance (slow_query_log):
SELECT j.`offer_id`, o.`offer_name`, j.`success_rate`
FROM
(
SELECT
t.`offer_id`,
(
SUM(CASE WHEN `offer_id` = t.`offer_id` AND `sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) / COUNT(*)
) AS `success_rate`
FROM `tblSales` AS t
WHERE DATE(t.`sales_time`) = CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
) AS j
LEFT JOIN `tblOffers` AS o
ON j.`offer_id` = o.`offer_id`
LIMIT 5;
# Time: 180113 18:51:19
# User#Host: root[root] # localhost [127.0.0.1] Id: 71
# Query_time: 10.472599 Lock_time: 0.001000 Rows_sent: 0 Rows_examined: 1156134
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
The query works fine and provides the output I needed. But it appears to be that its a bit slower.
offer_id and sales_status are already indexed in the tblSales. So do you have any suggestion on improving the inner query (where it calculates the success rate) so that performance can be improved? I have been playing with the math for more than 2hrs. But couldn't get a better way.
Btw, tblSales has lots of data. It contains those sales which are SUCCESSFUL, FAILED, PENDING, etc.
Thank you
EDIT
As you requested am including the table design also(only relevant fields are included):
tblSales
`sales_id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
`offer_id` bigint UNSIGNED NOT NULL DEFAULT '0',
`sales_time` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`sales_status` ENUM('WAITING', 'SUCCESS', 'FAILED', 'CANCELLED') NOT NULL DEFAULT 'WAITING',
PRIMARY KEY (`sales_id`),
KEY (`offer_id`),
KEY (`sales_status`)
There are some other fields also in this table, that holds some other info. Amount, user_id, etc. which are not relevant for my question.
Numerous 'problems', none of which involve "math".
JOINs make things difficult. LEFT JOIN says "I don't care whether the row exists in the 'right' table. (I suspect you don't need LEFT??) But it also says "There may be multiple rows in the right table. Based on the column names, I will guess that there is only one offer_name for each offer_id. If this is correct, then here my first recommendation. (This will convince the Optimizer that there is no issue with the JOIN.) Change from
SELECT ..., o.offer_name, ...
LEFT JOIN `tblOffers` AS o ON j.`offer_id` = o.`offer_id`
...
to
SELECT ...,
( SELECT offer_name FROM tbloffers WHERE offer_id j.offer_id
) AS offer_name, ...
It also gets rid of a bug wherein you are assuming that the inner ORDER BY will be preserved for the LIMIT. This used to be the case, but in newer versions of MariaDB / MySQL, it is not. The ORDER BY in a "derived table" (your subquery) is now ignored.
2 down, a few more to go.
"Don't hide an indexed column in a function." I am referring to DATE(t.sales_time) = CURDATE(). Assuming you have no sales_time values for the 'future', then that test can be changed to t.sales_time >= CURDATE(). If you really need to restrict to just today, then do this:
AND sales_time >= CURDATE()
AND sales_time < CURDATE() + INTERVAL 1 DAY
The ORDER BY and the LIMIT should usually be put together. In your case, you may as well add the LIMIT to the "derived table", thereby leading to only 5 rows for the outer query to work with. But... There is still the question of getting them sorted correctly. So change from
SELECT ...
FROM ( SELECT ...
ORDER BY ... )
LIMIT ...
to
SELECT ...
FROM ( SELECT ...
ORDER BY ...
LIMIT 5 ) -- trim sooner
ORDER BY ... -- deal with the loss of ordering from derived table
Rolling it all together, I have
SELECT j.`offer_id`,
( SELECT offer_name
FROM tbloffers
WHERE offer_id = j.offer_id
) AS offer_name,
j.`success_rate`
FROM
( SELECT t.`offer_id`,
AVG(t.sales_status = 'SUCCESS') AS `success_rate`
FROM `tblSales` AS t
WHERE t.sales_time >= CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 5
) AS j
ORDER BY `success_rate` DESC;
(I took the liberty of shortening the SUM(...) in two ways.)
Now for the indexes...
tblSales needs at least (sales_time), but let's go for a "covering" (with sales_time specifically first):
INDEX(sales_time, sales_status, order_id)
If tbloffers has PRIMARY KEY(offer_id), then no further index is worth adding. Else, add this covering index (in this order):
INDEX(offer_id, offer_name)
(Apologies to other Answerers; I stole some of your ideas.)
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
Approach this with a simple JOIN and GROUP BY:
SELECT s.offer_id, o.offer_name,
AVG(s.sales_status = 'SUCCESS') as success_rate
FROM tblSales s JOIN
tblOffers o
ON o.offer_id = s.offer_id
WHERE s.sales_time >= CURDATE() AND
s.sales_time < CURDATE() + INTERVAL 1 DAY
GROUP BY s.offer_id, o.offer_name
ORDER BY success_rate DESC;
Notes:
The use of date arithmetic allows the query to make use of an index on tblSales(sales_time) -- or better yet tblSales(salesTime, offer_id, sales_status).
The arithmetic for success_rate has been simplified -- although this has minimal impact on performance.
I added offer_name to the GROUP BY. If you are learning SQL, you should always have all the unaggregated keys in the GROUP BY clause.
A LEFT JOIN is only needed if you have offers in tblSales which are not in tblOffers. I am guessing you have proper foreign key relationships defined, and this is not the case.
Based on not much information that you have provided (i mean table schema) you could try the following.
SELECT `o`.`offer_id`, `o`.`offer_name`, SUM(CASE WHEN `t`.`sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) AS `success_rate`
FROM `tblOffers` `o`
INNER JOIN `tblSales` `t`
ON `o`.`offer_id` = `t`.`offer_id`
WHERE DATE(`t`.`sales_time`) = CURDATE()
GROUP BY `o`.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 0,5;
You can find a sample of this query in this SQL Fiddle example
Without knowing your schema, the lowest hanging fruit I see is this part....
WHERE DATE(t.`sales_time`) = CURDATE()
Try changing that to something that looks like
Where t.sales_time >= #12-midnight-of-current-date and t.sales_time <= #23:59:59-of-current-date
I've ran into some performance issues with my database structure "or better to say my query instead "
I have a the following table :
http://sqlfiddle.com/#!9/348cb
And following query trying to fetch certain data, and after that trying to check if there are other records matching my conditions, it's all in the following query.
it is working as expected, the only reason that I'm asking this question is that if there is a way I could increase its performance or use another way to get the results.
As you can see, There two ( SELECT )'s which trying to check if there are any other records containing current query data.
SELECT (
SELECT COUNT(*) FROM log AS LIKES
WHERE L.target_account=LIKES.target_account
AND LIKES.type='like'
) as liked,
(
SELECT COUNT(*) FROM log AS COMMENTS
WHERE L.target_account=COMMENTS.target_account
AND COMMENTS.type='follow_back'
) as follow_back,
(
SELECT COUNT(*) FROM log AS FOLLOW_BACK
WHERE L.target_account=FOLLOW_BACK.target_account
AND COMMENTS.type='follow_back'
) as follow_back,
L.*
FROM `log` as L
WHERE `L`.`information` = '".$target_name."'
AND `L`.`account_id` = '".$id."'
AND `L`.`date_ts` BETWEEN CURDATE() - INTERVAL ".$limit." DAY AND CURDATE()
This query takes too much time to fetch the data.
Thanks in advance.
You may be able to rewrite the query, depending on the relationship between target account and account id.
In the meantime, you want indexes. The two you want are instagram_log(target_account, type) and instagram_log(account_id, information, date_ts):
create index idx_instagram_log_1 on instagram_log(target_account, type);
create index idx_instagram_log_2 on instagram_log(account_id, information, date_ts);
SELECT SUM(LIKES) LIKES,SUM(FOLLOW_BACK) FOLLOW_BACK,SUM(COMMENTS) FROM
(
SELECT
CASE WHEN L.type='like' THEN 1 ELSE 0 END LIKES,
CASE WHEN L.type='follow_back' THEN 1 ELSE 0 END FOLLOW_BACK,
CASE WHEN L.type='comments' THEN 1 ELSE 0 END COMMENTS
FROM `log` as L
WHERE `L`.`information` = '".$target_name."'
AND `L`.`account_id` = '".$id."'
AND `L`.`date_ts` BETWEEN CURDATE() - INTERVAL ".$limit." DAY AND CURDATE()
)Z
Try the above query.
I have created an application to track progress in League of Legends for me and my friends. For this purpose, I collect information about the current rank several times a day into my MySQL database. To fetch the results and show the to them in the graph, I use the following query / queries:
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
WHERE
lol_summoner.name IN (". str_repeat('?, ', count($names) - 1) ."?)
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
The first query is used in case I want to retrieve all players which are saved in the database. The grid table is a temporary table which generated timestamps in a specific interval to retrive information in chunks of this interval. The two variable in this query are the interval. The second query is used if I want to retrieve information for specific players only.
The grid table is produces by the following stored procedure which is called with three parameters (n_first - first timestamp, n_last - last timestamp, n_increments - increments between two timestamps):
BEGIN
-- Create tmp table
DROP TEMPORARY TABLE IF EXISTS series_tmp;
CREATE TEMPORARY TABLE series_tmp (
series bigint
) engine = memory;
WHILE n_first <= n_last DO
-- Insert in tmp table
INSERT INTO series_tmp (series) VALUES (n_first);
-- Increment value by one
SET n_first = n_first + n_increment;
END WHILE;
END
The query works and finishes in reasonable time (~10 seconds) but I am thankful for any help to improve the query by either rewriting it or adding additional indexes to the database.
/Edit:
After review of #Rick James answer, I modified the queries as follows:
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
GROUP by lol_summoner.name, lol.timestamp div :range
ORDER by name, timestamp ASC
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
WHERE lol_summoner.name IN (<NAMES>)
GROUP by lol_summoner.name, lol.timestamp div " . $steps . "
ORDER by name, timestamp ASC
This improves the query execution time by a really good margin (finished way under 1s).
Problem 1 and Solution
You need a series of integers between two values? And they differ by 1? Or by some larger value?
First, create a permanent table of the numbers from 0 to some large enough value:
CREATE TABLE Num10 ( n INT );
INSERT INTO Num10 VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
CREATE TABLE Nums ( n INT, PRIMARY KEY(n))
SELECT a.n*1000 + b.n*100 + c.n*10 + d.n
FROM Num10 AS a
JOIN Num10 AS b -- note "cross join"
JOIN Num10 AS c
JOIN Num10 AS d;
Now Nums has 0..9999. (Make it bigger if you might need more.)
To get a sequence of consecutive numbers from 123 through 234:
SELECT 123 + n FROM Nums WHERE n < 234-123+1;
To get a sequence of consecutive numbers from 12345 through 23456, in steps of 15:
SELECT 12345 + 15*n FROM Nums WHERE n < (23456-12345+1)/15;
JOIN to a SELECT like one of those instead of to series_tmp.
Barring other issue, that should significantly speed things up.
Problem 2
You are GROUPing BY series, but ORDERing by timestamp. They are related, so you might get the 'right' answer. But think about it.
Problem 3
You seem to be building "buckets" (called "series"?) from "timestamps". Is this correct? If so, let's work backwards -- Turn a "timestamp" into a "bucket" number:
bucket_number = (timestamp - start) / bucket_size
By doing that throughout, you can avoid 'Problem 1' and eliminate my solution to it. That is, reformulate the entire queries in terms of buckets.