select *
from `attendance_marks`
where exists (select *
from `attendables`
where `attendance_marks`.`attendable_id` = `attendables`.`id`
and `attendable_type` = 'student'
and `attendable_id` = 258672
and `attendables`.`deleted_at` is null
)
and (`marked_at` between '2022-09-01 00:00:00' and '2022-09-30 23:59:59')
this query is taking too much time approx 7-10 seconds.
I am trying to optimize it but stuck at here.
Attendance_marks indexes
Attendables Indexes
Please help me optimize it a little bit.
For reference
number of rows in attendable = 80966
number of rows in attendance_marks = 1853696
Explain select
I think if we use JOINS instead of Sub-Query, then it will be more performant. Unfortunately, I don't have the exact data to be able to benchmark the performance.
select *
from attendance_marks
inner join attendables on attendables.id = attendance_marks.attendable_id
where attendable_type = 'student'
and attendable_id = 258672
and attendables.deleted_at is null
and (marked_at between '2022-09-01 00:00:00' and '2022-09-30 23:59:59')
I'm not sure if your business requirement allows changing the PK, and adding index. Incase it does then:
Add index to attendable_id.
I assume that attendables.id is PK. Incase not, add an index to it. Or preferably make it the PK.
In case attendable_type have a lot of different values. Then consider adding an index there too.
If possible don't have granularity till the seconds' field in marked_at, instead round to the nearest minute. In our case, we can round off 2022-09-30 23:59:59 to 2022-10-01 00:00:00.
select b.*
from `attendance_marks` AS am
JOIN `attendables` AS b ON am.`attendable_id` = b.`id`
WHERE b.`attendable_type` = 'student'
and b.`attendable_id` = 258672
and b.`deleted_at` is null
AND am.`marked_at` >= '2022-09-01'
AND am.`marked_at` < '2022-09-01 + INTERVAL 1 MONTH
and have these
am: INDEX(marked_at, attendable_id)
am: INDEX(attendable_id, marked_at)
b: INDEX(attendable_type, attendable_id, attendables)
Note that the datetime range works for any granularity.
(Be sure to check that I got the aliases for the correct tables.)
This formulation, with these indexes should allow the Optimizer to pick which table is more efficient to start with.
Related
I have this SQL query running on a PHP website. This is an old site and the query was build by previous developer few years ago. But now as the site data is increased to around 230mb, this query has become pretty slow to execute. Take around 15-20 seconds. Is there any way I can make this run faster?
SELECT DISTINCT
NULL AS bannerID,
C1.url AS url,
LOWER(C1.Organization) AS company_name,
CONCAT(
'https://mywebsite.co.uk/logos/',
C1.userlogo
) AS logo_url
FROM
Company AS C1
INNER JOIN Vacancy AS V1 ON LOWER(V1.company_name) = LOWER(C1.Organization)
WHERE
V1.LiveDate <= CURDATE()
AND url = ''
AND V1.ClosingDate >= CURDATE()
AND C1.flag_show_logo = 1
As commented, your query is suffering from being non-sargable due to the use of lower function.
Additionally I suspect you can remove the distinct by using exists instead of joining your tables
select null as bannerID,
C1.url as url,
Lower(C1.Organization) as company_name,
Concat('https://mywebsite.co.uk/logos/', C1.userlogo) as logo_url
from Company c
where c.flag_show_logo = 1
and c.url = ''
and exists (
select * from Vacancy v
where v.LiveDate <= CURDATE()
and v.ClosingDate >= CURDATE()
and v.company_name = c.Organization
)
Avoid the sargable problem by changing to
ON V1.company_name = C1.Organization
and declaring those two columns to be the same collation, namely a collation ending with "_ci".
And have these composite indexes:
C1: INDEX(flag_show_logo, url, Organization, userlogo)
V1: INDEX(company_name, LiveDate, ClosingDate)
(These indexes should help Stu's answer, too.)
We have to check 7 million rows to make campagne statistics. It takes around 30 seconds to run the query and it doesnt improve with indexes.
Indexes didnt change the speed at all.
I tried adding indexes on the where fields, the where fields + group by and the where fields + sum.
Server type is MYSQL and the server version is 5.5.31.
SELECT
NOW(), `banner_campagne`.name, `banner_view`.banner_uid, SUM(`banner_view`.fetched) AS fetched,
SUM(`banner_view`.loaded) AS loaded,
SUM(`banner_view`.seen) AS seen
FROM `banner_view` INNER JOIN
`banner_campagne`
ON `banner_campagne`.uid = `banner_view`.banner_uid AND
`banner_campagne`.deleted = 0 AND
`banner_campagne`.weergeven = 1
WHERE
`banner_view`.campagne_uid = 6 AND `banner_view`.datetime >= '2019-07-31 00:00:00' AND `banner_view`.datetime < '2019-08-30 00:00:00'
GROUP BY
`banner_view`.banner_uid
I expect the query to run around 5 seconds.
The indexes that you want for this query are probably:
banner_view(campagne_uid, datetime)
banner_campagne(banner_uid, weergeven, deleted)
Note that the order of the columns in the index does matter.
I have seen several question in SO and based in that I improved my sql query also.
but it sometime take 12 second or it sometime takes 3 seconds to execute. so minimum time we can its 3 seconds. query is like this way
SELECT ANALYSIS.DEPARTMENT_ID
,SCORE.ID
,SCORE.KPI_ SCORE.R_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.CREATED_DATE
,SCORE.UPDATED_DATE
FROM SCORE_INDICATOR SCORE
,AG_SENTIMENT ANALYSIS
WHERE SCORE.TAG_ID = ANALYSIS.ID
AND ANALYSIS.ORGANIZATION_ID = 1
AND ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
AND DATE (ANALYSIS.REVIEW_DATE) BETWEEN DATE ('2016-05-02') AND DATE ('2017-05-02')
ORDER BY ANALYSIS.DEPARTMENT_ID
now one table SCORE_INDIACATOR has 19345116 and later has 19057025 rows total. and I added index on ORGANIZATION_ID and department_id and another as combination of ORGANIZATION_ID and department_id . is there any other way to improve it or is it maximum I can achieve with this amount of data?
Here is checklist:
1) Make sure logs table (ANALYSIS) uses MyISAM engine (it's fast for OLAP queries).
2) Make sure that You've indexed ANALYSIS.REVIEW_DATE field.
3) Make sure that ANALYSIS.REVIEW_DATE is type of DATE (not CHAR, VARCHAR)
4) Change query (rearrange query plan):
SELECT
ANALYSIS.DEPARTMENT_ID
,SCORE.ID
,SCORE.KPI_ SCORE.R_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.CREATED_DATE
,SCORE.UPDATED_DATE
FROM SCORE_INDICATOR SCORE
,AG_SENTIMENT ANALYSIS
WHERE
SCORE.TAG_ID = ANALYSIS.ID
AND
ANALYSIS.REVIEW_DATE >= '2016-05-02' AND ANALYSIS.REVIEW_DATE < '2016-05-03'
AND
ANALYSIS.ORGANIZATION_ID = 1
AND
ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
ORDER BY ANALYSIS.DEPARTMENT_ID;
I have changed the order and style to JOIN syntax. The Score table seems to be the child to the primary criteria of the Analysis table. All your criteria is based on qualifying Analysis records. Now, the indexing. By doing a DATE() function call on a column does not help the optimizer. So, to get all possible date/time components, I have changed from between to >= the first date and LESS THAN one day beyond the end. In your example DATE( '2017-05-02' ) is the same as LESS than '2017-05-03' which will include 2017-05-02 up to 23:59:59 and the date can be applied better.
Now for the index. DO a compound index based on fields for join and order by might help
AG_Segment table... index ON(Organization_ID, Department_ID, Review_Date, ID)
SELECT
ANALYSIS.DEPARTMENT_ID,
SCORE.ID,
SCORE.KPI_ SCORE.R_SCORE,
SCORE.FACTOR_SCORE,
SCORE.FACTOR_SCORE,
SCORE.FACTOR_SCORE,
SCORE.CREATED_DATE,
SCORE.UPDATED_DATE
FROM
AG_SENTIMENT ANALYSIS
JOIN SCORE_INDICATOR SCORE
ON ANALYSIS.ID = SCORE.TAG_ID
where
ANALYSIS.ORGANIZATION_ID = 1
AND ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
AND ANALYSIS.REVIEW_DATE >= '2016-05-02'
AND ANALYSIS.REVIEW_DATE < '2017-05-03'
ORDER BY
ANALYSIS.DEPARTMENT_ID
I have a table user_notifications that has 1100000 records and I have to run this below query but it takes more than 3 minutes to complete the query what can I do to improve the fetch time.
SELECT `user_notifications`.`user_id`
FROM `user_notifications`
WHERE `user_notifications`.`notification_template_id` = 175
AND (DATE(sent_at) >= DATE_SUB(CURDATE(), INTERVAL 4 day))
AND `user_notifications`.`user_id` IN (
1203, 1282, 1499, 2244, 2575, 2697, 2828, 2900, 3085, 3989,
5264, 5314, 5368, 5452, 5603, 6133, 6498..
)
the user ids in IN block are sometimes upto 1k.
for optimisation I have indexed on user_id and notification_template_id column in user_notification table.
Big IN() lists are inherently slow. Create a temporary table with an index and put the values in the IN() list into that tempory table instead, then you'll get the power of an indexed join instead of giant IN() list.
You seem to be querying for a small date range. How about having an index based on SENT_AT column? Do you know what index the current query is using?
(1) Don't hide columns in functions if you might need to use an index:
AND (DATE(sent_at) >= DATE_SUB(CURDATE(), INTERVAL 4 day))
-->
AND sent_at >= CURDATE() - INTERVAL 4 day
(2) Use a "composite" index for
WHERE `notification_template_id` = 175
AND sent_at >= ...
AND `user_id` IN (...)
The first column should be the one with '='. It is unclear what to put next, so I suggest adding both of these indexes:
INDEX(notification_template_id, user_id, sent_at)
INDEX(notification_template_id, sent_at)
The Optimizer will probably pick between them correctly.
Composite indexes are not the same as indexes on the individual columns.
(3) Yes, you could try putting the IN list in a tmp table, but the cost of doing such might outweigh the benefit. I don't think of 1K values in IN() as being "too many".
(4) My cookbook on building indexes.
I have a MySQL table like this one:
day int(11)
hour int(11)
amount int(11)
Day is an integer with a value that spans from 0 to 365, assume hour is a timestamp and amount is just a simple integer. What I want to do is to select the value of the amount field for a certain group of days (for example from 0 to 10) but I only need the last value of amount available for that day, which pratically is where the hour field has its max value (inside that day). This doesn't sound too hard but the solution I came up with is completely inefficient.
Here it is:
SELECT q.day, q.amount
FROM amt_table q
WHERE q.day >= 0 AND q.day <= 4 AND q.hour = (
SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day
) GROUP BY day
It takes 5 seconds to execute that query on a 11k rows table, and it just takes a span of 5 days; I may need to select a span of en entire month or year so this is not a valid solution.
Anybody who can help me find another solution or optimize this one is really appreciated
EDIT
No indexes are set, but (day, hour, amount) could be a PRIMARY KEY if needed
Use:
SELECT a.day,
a.amount
FROM AMT_TABLE a
JOIN (SELECT t.day,
MAX(t.hour) AS max_hour
FROM AMT_TABLE t
GROUP BY t.day) b ON b.day = a.day
AND b.max_hour = a.hour
WHERE a.day BETWEEN 0 AND 4
I think you're using the GROUP BY a.day just to get a single amount value per day, but it's not reliable because in MySQL, columns not in the GROUP BY are arbitrary -- the value could change. Sadly, MySQL doesn't yet support analytics (ROW_NUMBER, etc) which is what you'd typically use for cases like these.
Look at indexes on the primary keys first, then add indexes on the columns used to join tables together. Composite indexes (more than one column to an index) are an option too.
I think the problem is the subquery in the where clause. MySQl will at first calculate this "SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day" for the whole table and afterwards select the days. Not quite efficient :-)