How would the following three queries compare in terms of performance? I'm trying to get all records with year=2017:
Using EXTRACT:
SELECT count(*), completed_by_id FROM table
WHERE EXTRACT(YEAR FROM completed_on)=2017
GROUP BY completed_by_id
# Took 11.8s
Using YEAR:
SELECT count(*), completed_by_id FROM table
WHERE YEAR(completed_on)=2017
GROUP BY completed_by_id
# Took 5.15s
Using LIKE 'YEAR%'
SELECT count(*), completed_by_id FROM table
WHERE completed_on LIKE '2017%'
GROUP BY completed_by_id
# Took 6.61s
Note: In my own testing I found YEAR() to be the fastest, LIKE to be the second fastest, and EXTRACT() to be the slowest.
There are about 5M rows in the table and completed_on is DATETIME field that has been indexed.
You haven't described your table or indexes so all advice about query performance is guesswork.
If your completed_on column is a DATETIME, DATE, or TIMESTAMP type and it is indexed, this query will radically outperform all the ones you have shown, and maintain its performance as your table grows.
SELECT count(*), completed_by_id
FROM table
WHERE completed_on >= '2017-01-01'
AND completed_on < '2017-01-01' + INTERVAL 1 YEAR
GROUP BY completed_by_id
Why? It can do a range scan on the index rather than a nonsargable function call on each row's value.
Notice the use of >= at the beginning of the date range and < at the end. We want to include all rows from the first moment of new years day 2017, up until but not including the first moment of new years day 2018. BETWEEN can't do this, because it uses <= rather than < at the end of its range.
If an index is in place, both BETWEEN and the syntax I have shown use a range scan, and perform about the same.
For best results speeding up this query use a compound index on (completed_on, completed_by_id).
If you are storing completed_on as DATE or DATETIME you can use:
SELECT count(*) as cnt, LEFT(completed_on, 4) AS year
FROM table
GROUP BY year
HAVING year=2017
Related
I have written a query to select all rows where value of a column 'gvA' in previous row is 0 and non-zero in current row. But my issue is this query takes too long to execute.
My table has 40000 rows and query takes about 60-65 seconds which is too much for a query. How can I improve query for better performance.Following is my query
SELECT device_no,datetime
FROM (
SELECT
gvA,
(SELECT e2.gvA
FROM tyn_records e2
WHERE e2.tyn_id < e1.tyn_id
ORDER BY tyn_id DESC LIMIT 1) as previous_value,
datetime,
device_no
FROM tyn_records e1
WHERE gvA > 0 AND DATE(datetime) = CURDATE() - INTERVAL 2 DAY
) selected
WHERE selected.previous_value = 0
Following are my tables
Devices:
tyn_records:
I would do two things:
I would rephrase the query a bit, specifically to remove the DATE() function in the left side of the filtering condition.
select
device_no,
datetime
from (
select
gva,
lag(gva) over(order by tyn_id) as previous_value,
datetime,
device_no
from tyn_records
where gva > 0
and datetime between curdate() - interval 2 day
and curdate() - interval 1 day
) x
where previous_value = 0
With the function on the left side of the predicate removed, you can create an index suitable to optimize the query:
create index ix1 on tyn_records (datetime, gva);
As a side note, the way you compute previous_value may not be deterministic, and could produce different results each time you run the query. This may happen if the column tyn_id is non unique.
I have seen several question in SO and based in that I improved my sql query also.
but it sometime take 12 second or it sometime takes 3 seconds to execute. so minimum time we can its 3 seconds. query is like this way
SELECT ANALYSIS.DEPARTMENT_ID
,SCORE.ID
,SCORE.KPI_ SCORE.R_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.CREATED_DATE
,SCORE.UPDATED_DATE
FROM SCORE_INDICATOR SCORE
,AG_SENTIMENT ANALYSIS
WHERE SCORE.TAG_ID = ANALYSIS.ID
AND ANALYSIS.ORGANIZATION_ID = 1
AND ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
AND DATE (ANALYSIS.REVIEW_DATE) BETWEEN DATE ('2016-05-02') AND DATE ('2017-05-02')
ORDER BY ANALYSIS.DEPARTMENT_ID
now one table SCORE_INDIACATOR has 19345116 and later has 19057025 rows total. and I added index on ORGANIZATION_ID and department_id and another as combination of ORGANIZATION_ID and department_id . is there any other way to improve it or is it maximum I can achieve with this amount of data?
Here is checklist:
1) Make sure logs table (ANALYSIS) uses MyISAM engine (it's fast for OLAP queries).
2) Make sure that You've indexed ANALYSIS.REVIEW_DATE field.
3) Make sure that ANALYSIS.REVIEW_DATE is type of DATE (not CHAR, VARCHAR)
4) Change query (rearrange query plan):
SELECT
ANALYSIS.DEPARTMENT_ID
,SCORE.ID
,SCORE.KPI_ SCORE.R_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.CREATED_DATE
,SCORE.UPDATED_DATE
FROM SCORE_INDICATOR SCORE
,AG_SENTIMENT ANALYSIS
WHERE
SCORE.TAG_ID = ANALYSIS.ID
AND
ANALYSIS.REVIEW_DATE >= '2016-05-02' AND ANALYSIS.REVIEW_DATE < '2016-05-03'
AND
ANALYSIS.ORGANIZATION_ID = 1
AND
ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
ORDER BY ANALYSIS.DEPARTMENT_ID;
I have changed the order and style to JOIN syntax. The Score table seems to be the child to the primary criteria of the Analysis table. All your criteria is based on qualifying Analysis records. Now, the indexing. By doing a DATE() function call on a column does not help the optimizer. So, to get all possible date/time components, I have changed from between to >= the first date and LESS THAN one day beyond the end. In your example DATE( '2017-05-02' ) is the same as LESS than '2017-05-03' which will include 2017-05-02 up to 23:59:59 and the date can be applied better.
Now for the index. DO a compound index based on fields for join and order by might help
AG_Segment table... index ON(Organization_ID, Department_ID, Review_Date, ID)
SELECT
ANALYSIS.DEPARTMENT_ID,
SCORE.ID,
SCORE.KPI_ SCORE.R_SCORE,
SCORE.FACTOR_SCORE,
SCORE.FACTOR_SCORE,
SCORE.FACTOR_SCORE,
SCORE.CREATED_DATE,
SCORE.UPDATED_DATE
FROM
AG_SENTIMENT ANALYSIS
JOIN SCORE_INDICATOR SCORE
ON ANALYSIS.ID = SCORE.TAG_ID
where
ANALYSIS.ORGANIZATION_ID = 1
AND ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
AND ANALYSIS.REVIEW_DATE >= '2016-05-02'
AND ANALYSIS.REVIEW_DATE < '2017-05-03'
ORDER BY
ANALYSIS.DEPARTMENT_ID
Hey guys I have a quick question regarding sql performance. I have a really really large table and it takes forever to run the query below, note that there is a column with timestamp
select name,emails,
count(*) as cnt
from table
where date(timestamp) between '2016-01-20' and '2016-02-3'
and name is not null
group by 1,2;
So my friend suggested to use this query below:
select name,emails,
count(*) as cnt
from table
where timestamp between date_sub(curdate(), interval 14 day)
and date_add(curdate(), interval 1 day)
and name is not null
group by 1,2;
And this takes much less time to run. Why? What's the difference between those two time function?
And is there another way to run this even faster? Like index?Can someone explain to me how mysql runs? Thanks a lot!
just add index on timestamp field and use query as per below-
select name,emails,
count(*) as cnt
from table
where `timestamp` between '2016-01-20 00:00:00' and '2016-02-03 23:59:59'
and name is not null
group by 1,2;
Why? What's the difference between those two time function
In first query you are getting dates from your own column but with date() function due to this reason mysql is not using index and doing table scan while 2nd suggested table you have removed date(timestamp) function so now mysql will check values from index instead of table scan so it is fast.
Same mysql will use index in my table also.
This is my table structure (about 1 millions records):
I need to select a few indices at certain dates, but only Year and Month are relevant:
SELECT `index_name`,`results` FROM `mst_ind` WHERE
((`index_name`='MSCI EAFE Mid NR USD' AND MONTH(`date`) = 3 AND YEAR(`date`) = 2003) OR
(`index_name`='MSCI Morocco PR USD' AND MONTH(`date`) = 3 AND YEAR(`date`) = 2003))
AND `time_period`='M1'
It works fine, but the performance is horrible. I run the query through profiler, but it could not suggest any possible keys.
The primary key contains index_id, date and time_period.
How can I optimize/improve this query?
Thanks!
Update: the explain report:
You are probably invalidating the use of an index as you are applying a transformation to fields that would be indexed by using functions such as MONTH and YEAR.
You could:
write the WHERE clause differently such that it doesn't use the MONTH and YEAR functions, such as:
date >= '2003-03-01' and date < '2003-04-01'
Edit: just realized you probably don't have any indexes on this table. Consider adding indexes to the index_name, date and time_period field.
I have a MySQL table like this one:
day int(11)
hour int(11)
amount int(11)
Day is an integer with a value that spans from 0 to 365, assume hour is a timestamp and amount is just a simple integer. What I want to do is to select the value of the amount field for a certain group of days (for example from 0 to 10) but I only need the last value of amount available for that day, which pratically is where the hour field has its max value (inside that day). This doesn't sound too hard but the solution I came up with is completely inefficient.
Here it is:
SELECT q.day, q.amount
FROM amt_table q
WHERE q.day >= 0 AND q.day <= 4 AND q.hour = (
SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day
) GROUP BY day
It takes 5 seconds to execute that query on a 11k rows table, and it just takes a span of 5 days; I may need to select a span of en entire month or year so this is not a valid solution.
Anybody who can help me find another solution or optimize this one is really appreciated
EDIT
No indexes are set, but (day, hour, amount) could be a PRIMARY KEY if needed
Use:
SELECT a.day,
a.amount
FROM AMT_TABLE a
JOIN (SELECT t.day,
MAX(t.hour) AS max_hour
FROM AMT_TABLE t
GROUP BY t.day) b ON b.day = a.day
AND b.max_hour = a.hour
WHERE a.day BETWEEN 0 AND 4
I think you're using the GROUP BY a.day just to get a single amount value per day, but it's not reliable because in MySQL, columns not in the GROUP BY are arbitrary -- the value could change. Sadly, MySQL doesn't yet support analytics (ROW_NUMBER, etc) which is what you'd typically use for cases like these.
Look at indexes on the primary keys first, then add indexes on the columns used to join tables together. Composite indexes (more than one column to an index) are an option too.
I think the problem is the subquery in the where clause. MySQl will at first calculate this "SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day" for the whole table and afterwards select the days. Not quite efficient :-)