I have a question regarding optimizing the following query/table
SELECT playitemid,MAX(playdatetime)
FROM buma
WHERE licenseid = 1 AND playdatetime > Date_sub(Curdate(), INTERVAL 1 month)
GROUP BY playitemid
For a table with 11 million records this can sometimes take over 30 seconds.
Here is the create statement for the table.
CREATE TABLE `buma` (
`bumaid` int(11) NOT NULL AUTO_INCREMENT,
`playitemid` int(11) NOT NULL,
`playdatetime` datetime DEFAULT NULL,
`stopdatetime` datetime DEFAULT NULL, `licenseid` int(11) NOT NULL, editionid` int(11) DEFAULT NULL,
PRIMARY KEY (`bumaid`),
KEY `ind_buma`(`playdatetime`,`licenseid`,`playitemid`) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=68644363 DEFAULT CHARSET=latin1;
Is there any way to define a better key or index to speed up the query
Kind regards,
Bjørn
Try to index only playdatetime, licenseid, playitemid alone or groupe only licenseid,playitemid.
MySQL 5.1 is very old (10 years old).
Upgrade to 5.7 for a good performance increase (or use a mysql fork like mariadb 10.1).
You can also try to calculate this : Date_sub(Curdate(), INTERVAL 1 month) without mysql (php strtotime or other), and use mysql cache for instant request if you call it more than one time per day, be carefull dont use more than 128MB, it can decrease performances too).
Related
I have a query which is getting slower and slower because there are more and more records in my table. So I'm trying to speed things up.
Database size:
Records: 1,200,000
Data 22,9 MiB
Index 46,8 MiB
Total 69,7 MiB
The purpose of the query is counting the number of records that exist that match the conditions. The conditions are a date (current date) and a status number. See query below:
SELECT
COUNT(id) AS total
FROM
order_process
WHERE
DATE(datetime) = CURDATE() AND
status = '7';
At the moment, this query is taking 800ms. And I need to run this query multiple times with different dates. These are all in the same script so script execution is going over the 3 seconds at the moment. How can I speed this up?
What have I already done:
Created indexes (Index on status and datetime both don't speed up the query).
Tested InnoDB engine (which is slower, mostly reading on this table)
To make it complete, below the current table setup.
CREATE TABLE IF NOT EXISTS `order_process` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`order_id` int(11) NOT NULL,
`status` int(11) NOT NULL,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`remark` text NOT NULL,
PRIMARY KEY (`id`),
KEY `orderid` (`order_id`),
KEY `datetime` (`datetime`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When you use date() function on a timestamp/datetime column and even if the column is indexed it can't use the index
So you need to construct the query as
where
datetime >= concat(CURDATE(),' 00:00:00')
and datetime <= concat(CURDATE(),' 23:59:59')
and status = '7'
I'm stuck for 2 days with a simple query that I'm not able to optimise...
My table contains about 60,000,000 rows :
CREATE TABLE IF NOT EXISTS `pages_objects_likes` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`page_id` bigint(20) unsigned NOT NULL,
`object_id` bigint(20) unsigned NOT NULL,
`user_id` bigint(20) unsigned NOT NULL,
`created_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `pages_objects_likes_page_id_created_time_index` (`page_id`,`created_time`)
) ENGINE=InnoDB ;
My query is :
SELECT c.user_id, c.page_id,
COUNT(*) AS total
FROM pages_objects_likes c
WHERE page_id IN (116001391818501,37823307325,45502366281,30166294332,7133374462,223343965320,9654313055,123096413226,231809706871226,246637838754023,120063638018636)
AND created_time >= DATE_SUB('2014-06-30', INTERVAL 1 YEAR)
AND created_time < DATE_SUB('2014-06-30', INTERVAL 6 MONTH)
GROUP BY c.user_id, c.page_id
But when I EXPLAIN it, I get this :
Using index condition; Using temporary; Using filesort
I would like to optimise indexes or query because it is taking ages to execute (more than 5 minutes).
My server has SSD's, 32Gb or RAM and a 4 Core i5 dedicated to MySQL, so it is not a hardware issue :)
Thank you for your help !
Ok I found the solution !
Update the index like this :
KEY `pages_objects_likes_page_id_created_time_index` (`page_id`,`user_id`,`created_time`)
And update the query by inverting the group by statement :
GROUP BY c.page_id, c.user_id
The index is now used everywhere ;)
I would guess your problem is with DATE_SUB functions, in general using functions in WHERE is a bad Idea because
They evaluate each and every record in your tables.
DB Engines skip indexes usually when the fields are passed to functions.
My suggestion is to pass the DATE_SUB output from client side, or to calculate it once before the start of the query.
I also might be tempted to put an index on "created_time", "page_id", "user_id" and see how it goes.
I'm running a table that has built up to 600 million rows and is rapidly growing, which has been slowing down queries that need to run as quickly as possible. Current schema is:
CREATE TABLE `user_history` (
`userId` int(11) NOT NULL,
`asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `userId` (`userId`,`asin`,`dateSent`),
KEY `dateSent` (`dateSent`,`asin`),
KEY `asin` (`asin`,`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Everything I've read about partitioning suggested that this was a prime candidate for partitioning by date range. We only tend to use the last 14 days data, but the client doesn't want to delete old data. The new schema looks like:
CREATE TABLE `user_history_partitioned` (
`userId` int(11) NOT NULL,
`asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`dateSent`,`asin`,`userId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
PARTITION BY RANGE ( UNIX_TIMESTAMP(dateSent) ) (
PARTITION Apr2013 VALUES LESS THAN (UNIX_TIMESTAMP('2013-05-01')),
etc...
PARTITION Mar2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-04-01')),
PARTITION Apr2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-05-01')),
PARTITION May2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-06-01')),
PARTITION Future VALUES LESS THAN MAXVALUE);
The idea of the Future partition is because a REORGANIZE PARTITION run on a populated partition was taking a long time to complete. So Future will always be empty and can reorganized into new partitions instantly. And other queries using this table have been reordered to use the primary key only, to reduce the number of indexes on the table.
The time-critical query is apropos of:
SELECT SQL_NO_CACHE *
FROM books B
WHERE (non-relevant stuff deleted)
AND NOT EXISTS
(
SELECT 1 FROM user_history H
WHERE
H.userId=$userId
AND H.asin=B.ASIN
AND dateSent > DATE_SUB(NOW(), INTERVAL 14 DAY)
)
AND (non-relevant stuff deleted)
LIMIT 1
So we're avoid duplicates that have already been selected for the same user in the last 14 days. And this currently returns in < 0.1 secs, which is okay but slower than it used to be on the current schema.
For the new schema, the inner SELECT has been reordered to:
SELECT 1 FROM user_history_partitioned H
WHERE dateSent > DATE_SUB(NOW(), INTERVAL 14 DAY)
AND H.asin=B.ASIN
AND H.userId=$userId
And it's taking 5 minutes per query. and I can't see why. The idea is that the current partition and indexes should be in memory (or maybe the previous month too, at some times of the month), and the primary index covers the WHERE clause. But from the time it's taking, it could be performing a full table scan on asin or userId. Which is difficult to identify from EXPLAIN because it's an inner query.
What am I missing? Do I need another combined index for (asin, userID)? If so, why?
Thanks,
PS: Tried wrapping the DATE_SUB(...) as UNIX_TIMESTAMP(DATE_SUB(...)) just in case it was a type conversion issue, but made no difference.
I'm going to try to explain this best I can I will provide more information if needed quickly.
I'm storing data for each hour in military time. I only need to store a days worth of data. My table structure is below
CREATE TABLE `onlinechart` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`user` varchar(100) DEFAULT NULL,
`daytime` varchar(10) DEFAULT NULL,
`maxcount` smallint(20) DEFAULT NULL,
`lastupdate` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=innodb AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
The "user" column is unique to each user. So I will have list for each user.
The "daytime" column I'm having it store the day and hour together. So as for today and hour it would be "2116" so the day is 21 and the current hour is 16.
The "maxcount" column is what data for each hour. I'm tracking just one total number each hour.
The "lastupdate" column is just a timestamp im using to delete data that is 24 hours+ old.
I have the script running in PHP fine for the tracking. It keeps a total of 24 rows of data for each user and deletes anything older then 24hours. My problem is how would I go about a query that would start from the current hour/day and pull that past 24 hours maxcount and display them in order.
Thanks
You will run into an issue of handling this near the end of the year. It's advisable you switch to using the native timestamp type of MySQL (described here: http://dev.mysql.com/doc/refman/5.0/en/datetime.html). Then you can grab max count by doing something such as:
SELECT * FROM onlinechart WHERE daytime >= ? ORDER BY maxcount
The question mark should be replaced by the timestamp - 86400 (number of seconds in a day).
I have a table that stores a pupil_id, a category and an effective date (amongst other things). The dates can be past, present or future. I need a query that will extract a pupil's current status from the table.
The following query works:
SELECT *
FROM pupil_status
WHERE (status_pupil_id, status_date) IN (
SELECT status_pupil_id, MAX(status_date)
FROM pupil_status
WHERE status_date < NOW() -- to ensure we ignore the "future status"
GROUP BY status_pupil_id );
In MySQL, the table is defined as follows:
CREATE TABLE IF NOT EXISTS `pupil_status` (
`status_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status_pupil_id` int(10) unsigned NOT NULL, -- a foreign key
`status_category_id` int(10) unsigned NOT NULL, -- a foreign key
`status_date` datetime NOT NULL, -- effective date/time of status change
`status_modify` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`status_staff_id` int(10) unsigned NOT NULL, -- a foreign key
`status_notes` text NOT NULL, -- notes detailing the reason for status change
PRIMARY KEY (`status_id`),
KEY `status_pupil_id` (`status_pupil_id`,`status_category_id`),
KEY `status_pupil_id_2` (`status_pupil_id`,`status_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1409 ;
However, with 950 pupils and just over 1400 statuses in the table, the query takes 0.185 seconds to process. Perhaps acceptable now, but when the table swells, I'm worried about scalability. It is likely that the production system will have over 10000 pupils and each will have 15-20 statuses each.
Is there a better way to write this query? Are there better indexes that I should have to assist the query? Please let me know.
There are the following things you could try
1 Use an INNER JOIN instead of the WHERE
SELECT *
FROM pupil_status ps
INNER JOIN
(SELECT status_pupil_id, MAX(status_date)
FROM pupil_status
WHERE status_date < NOW()
GROUP BY status_pupil_id) X
ON ps.status_pupil_id = x.status_pupil_id
AND ps.status_date = x.status_date
2 Have a variable and store the value for NOW() - I am not sure if the DB engine optimizes this call to NOW() as just one call but if it doesnt, then this might help a bit
These are some suggestions however you will need to compare the query plans and see if there is any appreciable improvement or not.
Based on your usage of indexes as per the Query plan, robob's suggestion above could also come in handy
Find out how long query takes when you load the system with 10000 pupils each with have 15-20 statuses each.
Only refactor if it takes too long.