We have to check 7 million rows to make campagne statistics. It takes around 30 seconds to run the query and it doesnt improve with indexes.
Indexes didnt change the speed at all.
I tried adding indexes on the where fields, the where fields + group by and the where fields + sum.
Server type is MYSQL and the server version is 5.5.31.
SELECT
NOW(), `banner_campagne`.name, `banner_view`.banner_uid, SUM(`banner_view`.fetched) AS fetched,
SUM(`banner_view`.loaded) AS loaded,
SUM(`banner_view`.seen) AS seen
FROM `banner_view` INNER JOIN
`banner_campagne`
ON `banner_campagne`.uid = `banner_view`.banner_uid AND
`banner_campagne`.deleted = 0 AND
`banner_campagne`.weergeven = 1
WHERE
`banner_view`.campagne_uid = 6 AND `banner_view`.datetime >= '2019-07-31 00:00:00' AND `banner_view`.datetime < '2019-08-30 00:00:00'
GROUP BY
`banner_view`.banner_uid
I expect the query to run around 5 seconds.
The indexes that you want for this query are probably:
banner_view(campagne_uid, datetime)
banner_campagne(banner_uid, weergeven, deleted)
Note that the order of the columns in the index does matter.
Related
select *
from `attendance_marks`
where exists (select *
from `attendables`
where `attendance_marks`.`attendable_id` = `attendables`.`id`
and `attendable_type` = 'student'
and `attendable_id` = 258672
and `attendables`.`deleted_at` is null
)
and (`marked_at` between '2022-09-01 00:00:00' and '2022-09-30 23:59:59')
this query is taking too much time approx 7-10 seconds.
I am trying to optimize it but stuck at here.
Attendance_marks indexes
Attendables Indexes
Please help me optimize it a little bit.
For reference
number of rows in attendable = 80966
number of rows in attendance_marks = 1853696
Explain select
I think if we use JOINS instead of Sub-Query, then it will be more performant. Unfortunately, I don't have the exact data to be able to benchmark the performance.
select *
from attendance_marks
inner join attendables on attendables.id = attendance_marks.attendable_id
where attendable_type = 'student'
and attendable_id = 258672
and attendables.deleted_at is null
and (marked_at between '2022-09-01 00:00:00' and '2022-09-30 23:59:59')
I'm not sure if your business requirement allows changing the PK, and adding index. Incase it does then:
Add index to attendable_id.
I assume that attendables.id is PK. Incase not, add an index to it. Or preferably make it the PK.
In case attendable_type have a lot of different values. Then consider adding an index there too.
If possible don't have granularity till the seconds' field in marked_at, instead round to the nearest minute. In our case, we can round off 2022-09-30 23:59:59 to 2022-10-01 00:00:00.
select b.*
from `attendance_marks` AS am
JOIN `attendables` AS b ON am.`attendable_id` = b.`id`
WHERE b.`attendable_type` = 'student'
and b.`attendable_id` = 258672
and b.`deleted_at` is null
AND am.`marked_at` >= '2022-09-01'
AND am.`marked_at` < '2022-09-01 + INTERVAL 1 MONTH
and have these
am: INDEX(marked_at, attendable_id)
am: INDEX(attendable_id, marked_at)
b: INDEX(attendable_type, attendable_id, attendables)
Note that the datetime range works for any granularity.
(Be sure to check that I got the aliases for the correct tables.)
This formulation, with these indexes should allow the Optimizer to pick which table is more efficient to start with.
I have a table which currently has about 80 million rows, created as follows:
create table records
(
id int auto_increment primary key,
created int not null,
status int default '0' not null
)
collate = utf8_unicode_ci;
create index created_and_status_idx
on records (created, status);
The created column contains unix timestamps and status can be an integer between -10 and 10. The records are evenly distributed regarding the created date, and around half of them are of status 0 or -10.
I have a cron that selects records that are between 32 and 8 days old, processes them and then deletes them, for certain statuses. The query is as follows:
SELECT
records.id
FROM records
WHERE
(records.status = 0 OR records.status = -10)
AND records.created BETWEEN UNIX_TIMESTAMP() - 32 * 86400 AND UNIX_TIMESTAMP() - 8 * 86400
LIMIT 500
The query was fast when the records were at the beginning of the creation interval, but now that the cleanup reaches the records at the end of interval it takes about 10 seconds to run. Explaining the query says it uses the index, but it parses about 40 million records.
My question is if there is anything I can do to improve the performance of the query, and if so, how exactly.
Thank you.
I think union all is your best approach:
(SELECT r.id
FROM records r
WHERE r.status = 0 AND
r.created BETWEEN UNIX_TIMESTAMP() - 32 * 86400 AND UNIX_TIMESTAMP() - 8 * 86400
LIMIT 500
) UNION ALL
(SELECT r.id
FROM records r
WHERE r.status = -10 AND
r.created BETWEEN UNIX_TIMESTAMP() - 32 * 86400 AND UNIX_TIMESTAMP() - 8 * 86400
LIMIT 500
)
LIMIT 500;
This can use an index on records(status, created, id).
Note: use union if records.id could have duplicates.
You are also using LIMIT with no ORDER BY. That is generally discouraged.
Your index is in the wrong order. You should put the IN column (status) first (you phrased it as an OR), and put the 'range' column (created) last:
INDEX(status, created)
(Don't give me any guff about "cardinality"; we are not looking at individual columns.)
Are there really only 3 columns in the table? Do you need id? If not, get rid of it and change to
PRIMARY KEY(status, created)
Other techniques for walking through large tables efficiently.
Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).
I have a table user_notifications that has 1100000 records and I have to run this below query but it takes more than 3 minutes to complete the query what can I do to improve the fetch time.
SELECT `user_notifications`.`user_id`
FROM `user_notifications`
WHERE `user_notifications`.`notification_template_id` = 175
AND (DATE(sent_at) >= DATE_SUB(CURDATE(), INTERVAL 4 day))
AND `user_notifications`.`user_id` IN (
1203, 1282, 1499, 2244, 2575, 2697, 2828, 2900, 3085, 3989,
5264, 5314, 5368, 5452, 5603, 6133, 6498..
)
the user ids in IN block are sometimes upto 1k.
for optimisation I have indexed on user_id and notification_template_id column in user_notification table.
Big IN() lists are inherently slow. Create a temporary table with an index and put the values in the IN() list into that tempory table instead, then you'll get the power of an indexed join instead of giant IN() list.
You seem to be querying for a small date range. How about having an index based on SENT_AT column? Do you know what index the current query is using?
(1) Don't hide columns in functions if you might need to use an index:
AND (DATE(sent_at) >= DATE_SUB(CURDATE(), INTERVAL 4 day))
-->
AND sent_at >= CURDATE() - INTERVAL 4 day
(2) Use a "composite" index for
WHERE `notification_template_id` = 175
AND sent_at >= ...
AND `user_id` IN (...)
The first column should be the one with '='. It is unclear what to put next, so I suggest adding both of these indexes:
INDEX(notification_template_id, user_id, sent_at)
INDEX(notification_template_id, sent_at)
The Optimizer will probably pick between them correctly.
Composite indexes are not the same as indexes on the individual columns.
(3) Yes, you could try putting the IN list in a tmp table, but the cost of doing such might outweigh the benefit. I don't think of 1K values in IN() as being "too many".
(4) My cookbook on building indexes.
I have a MySQL table like this one:
day int(11)
hour int(11)
amount int(11)
Day is an integer with a value that spans from 0 to 365, assume hour is a timestamp and amount is just a simple integer. What I want to do is to select the value of the amount field for a certain group of days (for example from 0 to 10) but I only need the last value of amount available for that day, which pratically is where the hour field has its max value (inside that day). This doesn't sound too hard but the solution I came up with is completely inefficient.
Here it is:
SELECT q.day, q.amount
FROM amt_table q
WHERE q.day >= 0 AND q.day <= 4 AND q.hour = (
SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day
) GROUP BY day
It takes 5 seconds to execute that query on a 11k rows table, and it just takes a span of 5 days; I may need to select a span of en entire month or year so this is not a valid solution.
Anybody who can help me find another solution or optimize this one is really appreciated
EDIT
No indexes are set, but (day, hour, amount) could be a PRIMARY KEY if needed
Use:
SELECT a.day,
a.amount
FROM AMT_TABLE a
JOIN (SELECT t.day,
MAX(t.hour) AS max_hour
FROM AMT_TABLE t
GROUP BY t.day) b ON b.day = a.day
AND b.max_hour = a.hour
WHERE a.day BETWEEN 0 AND 4
I think you're using the GROUP BY a.day just to get a single amount value per day, but it's not reliable because in MySQL, columns not in the GROUP BY are arbitrary -- the value could change. Sadly, MySQL doesn't yet support analytics (ROW_NUMBER, etc) which is what you'd typically use for cases like these.
Look at indexes on the primary keys first, then add indexes on the columns used to join tables together. Composite indexes (more than one column to an index) are an option too.
I think the problem is the subquery in the where clause. MySQl will at first calculate this "SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day" for the whole table and afterwards select the days. Not quite efficient :-)