MySQL filesort on GROUP BY YEAR & Month - mysql

I have a large table that stores debug information for my web app. The issue is that the table is now 500,000 rows and one of the queries is slow because the index isn't being used.
SQL:
EXPLAIN SELECT count(*) AS `count`, month(event_date) AS `month`, year(event_date) AS `year`FROM events WHERE 1 = 1 GROUP BY year(event_date) DESC, month(event_date) DESC LIMIT 6;
Result:
SIMPLE events index NULL event_date 8 NULL 139358 Using index; Using temporary; Using file sort
And here is the table structure.
CREATE TABLE IF NOT EXISTS `events` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Event Primary Key',
`event_number` int(11) NOT NULL,
`user_id` int(11) unsigned NOT NULL COMMENT 'User ID',
`server_id` int(11) unsigned DEFAULT NULL COMMENT 'The ID of the remote log client',
`remote_id` int(11) unsigned DEFAULT NULL COMMENT 'The Event Primary Key from the remote client',
`event_date` datetime NOT NULL COMMENT 'Event Datetime in local timezone',
`event_date_utc` datetime NOT NULL COMMENT 'Event Datetime in UTC timezone',
`event_type` varchar(255) NOT NULL COMMENT 'The type of event',
`event_source` varchar(255) NOT NULL COMMENT 'Text description of the source of the event',
`event_severity` varchar(255) NOT NULL COMMENT 'Notice, Warning etc',
`event_file` text NOT NULL COMMENT 'The full file location of the source of the event',
`event_file_line` int(11) NOT NULL COMMENT 'The line in the file that triggered the event',
`event_ip_address` varchar(255) NOT NULL COMMENT 'IP Address of the user that triggered the event',
`event_summary` varchar(255) NOT NULL COMMENT 'A summary of the description',
`event_description` text NOT NULL COMMENT 'Full description of the event',
`event_trace` text NOT NULL COMMENT 'Full PHP trace',
`event_synced` int(1) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `event_type` (`event_type`),
KEY `event_source` (`event_source`),
KEY `user_id` (`user_id`),
KEY `server_id` (`server_id`),
KEY `event_date` (`event_date`)
)
If anyone has any ideas on getting the same results without a file sort that would be awesome!

GROUP BY implies ORDER BY in MySQL
So try adding ORDER BY NULL: this usually eliminates a filesort
See "ORDER BY Optimization" in the MySQL docs

Your key problem is that you are specifying no WHERE clause. Your use of WHERE 1=1 is pointless. The problem is that you are trying to get the YEAR and MONTH from MySQL without limiting the number of rows, and therefore it processes MONTH(..) and YEAR(...) for each row before it is able to process the GROUP.
The fact it still isn't using the INDEX after my earlier suggestion indicates you have more to your query than you are letting on, if that's the case please let me know and I can help you more easily. Otherwise, I would recommend checking below (though I'm having to guess at your purposes as you haven't stated what you are trying to achieve)
If you are after the last 6 calendar months specifically then the following would also help significantly.
SELECT
COUNT(id) AS `count`,
MONTH(event_date) AS `month`,
YEAR(event_date) AS `year`
FROM events
-- Get the first day of this month, and subtract 6 months
WHERE event_date > SUBDATE(DATE_FORMAT(NOW(), '%Y-%m-01'), INTERVAL 6 MONTH)
GROUP BY `year` DESC, `month` DESC;
If you have additional WHERE criteria though, that would change the advice given so if that's the case please update

In addition to what the others have posted:
If you run an EXPLAIN SELECT... and MySQL reports that it uses no index for that query (or not the one you want), you could solve this by querying the data with SELECT... FORCE INDEX.... For more details about the syntax of this, look here: http://dev.mysql.com/doc/refman/5.6/en/index-hints.html

You are using SELECT * which means select all rows so the whole table is scanned - try selecting specific rows to show
Also there are no parameters to filter the data so the whole table is read and returned, try restricting either by date or some other parameter

Related

No data returned after SQL select * on mysql table

My goal is to extract data from a table in Mysql. When I try to do a select * from table I get no data. If I add a where clause then it returns data. I have more than 100k tagnames which I can't add in the filter condition. Can anyone point me why I'm seeing this kind of behaviour and how to extract data without specifying Tagname
The select statement after adding a where clause is as below:
SELECT *
FROM ahpvdb.DailyPVInst DailyPVInst_0
WHERE
(DailyPVInst_0.TagName IN ( 'UMA-PDIN33'))
AND (DailyPVInst_0.StartTime = {ts '2020-01-01 03:00:00' })
AND (DailyPVInst_0.EndTime = {ts '2025-12-31 03:00:00' })
My table DDL is as below:
CREATE TABLE `DailyPVInst` (
`RowNum` int(11) NOT NULL COMMENT 'Unique number in result set',
`TagName` char(56) NOT NULL COMMENT 'Altacs tag name',
`StartTime` datetime NOT NULL COMMENT 'Requested history start time in GMT(UTC)',
`EndTime` datetime NOT NULL COMMENT 'Requested history end time in GMT(UTC)',
`RecFrq` int(11) NOT NULL COMMENT 'Altacs tag recording frequency',
`Value` double NOT NULL DEFAULT '0' COMMENT 'Tag sample value',
`TimeStamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' COMMENT 'Tag sample time',
`QF1` int(11) NOT NULL DEFAULT '0' COMMENT 'Tag sample quality flag 1',
`QF2` int(11) NOT NULL DEFAULT '0' COMMENT 'Tag sample quality flag 2',
PRIMARY KEY (`TagName`,`StartTime`,`EndTime`,`RecFrq`,`RowNum`)
) ENGINE=ALTACSAH DEFAULT CHARSET=utf8 COMMENT='Select * from [Table_Name] where Tagname="ABC" and starttime="2015-01-24 15:20:00" and endtime="2015-01-24 15:30:00" and RecFrq=2;' `VAROPT`='5';
PS: I tried using subquery as well but no luck. Subquery used is as below:
SELECT *
FROM ahpvdb.DailyPVInst DailyPVInst_0
WHERE
(DailyPVInst_0.TagName IN ( select distinct TagName from ahpvdb.DailyPVInst))
AND (DailyPVInst_0.StartTime = {ts '2020-01-01 03:00:00' })
AND (DailyPVInst_0.EndTime = {ts '2025-12-31 03:00:00' })
From the comments above, you said ALTACSAH is a custom storage engine, I assume implemented at your organization. MySQL allows developers to create their own storage engine as a C++ class (see https://dev.mysql.com/doc/internals/en/custom-engine.html).
Hypothetically, a storage engine that does not implement table-scan operations might return nothing when a given search is not indexed. This could be by design, or it could be the result of a code bug.
No one on Stack Overflow who doesn't work at your organization can know anything about the implementation of your custom storage engine. We have never used it, and we don't have access to its code or documentation.
If you use a bespoke storage engine, I'm afraid debugging it is up to you.

Improve query speed suggestions

For self education I am developing an invoicing system for an electricity company. I have multiple time series tables, with different intervals. One table represents consumption, two others represent prices. A third price table should be still incorporated. Now I am running calculation queries, but the queries are slow. I would like to improve the query speed, especially since this is only the beginning calculations and the queries will only become more complicated. Also please note that this is my first database i created and exercises I have done. A simplified explanation is preferred. Thanks for any help provided.
I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. This speed up the process from 60 seconds to 5 seconds.
The structure of the tables is the following:
CREATE TABLE `apxprice` (
`APX_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`PRICE` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`APX_id`)
) ENGINE=MyISAM AUTO_INCREMENT=28728 DEFAULT CHARSET=latin1
CREATE TABLE `imbalanceprice` (
`imbalanceprice_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PTU` tinyint(3) DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`UPWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`DOWNWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`UPWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`DOWNWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`INCENTIVE_COMPONENT` decimal(10,2) DEFAULT NULL,
`TAKE_FROM_SYSTEM` decimal(10,2) DEFAULT NULL,
`FEED_INTO_SYSTEM` decimal(10,2) DEFAULT NULL,
`REGULATION_STATE` tinyint(1) DEFAULT NULL,
`HOUR` int(2) DEFAULT NULL,
PRIMARY KEY (`imbalanceprice_id`),
KEY `DATE` (`DATE`,`PERIOD_FROM`,`PERIOD_UNTIL`)
) ENGINE=MyISAM AUTO_INCREMENT=117427 DEFAULT CHARSET=latin
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`EAN` varchar(18) DEFAULT NULL,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`POWERLOAD` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin
Now when running this query:
SELECT i.DATE, i.PERIOD_FROM, i.TAKE_FROM_SYSTEM, i.FEED_INTO_SYSTEM,
a.PRICE, p.POWERLOAD, sum(a.PRICE * p.POWERLOAD)
FROM imbalanceprice i, apxprice a, powerload p
WHERE i.DATE = a.DATE
and i.DATE = p.DATE
AND i.PERIOD_FROM >= a.PERIOD_FROM
and i.PERIOD_FROM = p.PERIOD_FROM
AND i.PERIOD_FROM < a.PERIOD_UNTIL
AND i.DATE >= '2018-01-01'
AND i.DATE <= '2018-01-31'
group by i.DATE
I have run the query with explain and get the following result: Select_type, all simple partitions all null possible keys a,p = null i = DATE Key a,p = null i = DATE key_len a,p = null i = 8 ref a,p = null i = timeseries.a.DATE,timeseries.p.PERIOD_FROM rows a = 28727 p = 61038 i = 1 filtered a = 100 p = 10 i = 100 a extra: using where using temporary using filesort b extra: using where using join buffer (block nested loop) c extra: null
Preferably I run a more complicated query for a whole year and group by month for example with all price tables incorporated. However, this would be too slow. I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. The calculation result may not be changed, in this case quarter hourly consumption of two meters multiplied by hourly prices.
"Categorically speaking," the first thing you should look at is indexes.
Your clauses such as WHERE i.DATE = a.DATE ... are categorically known as INNER JOINs, and the SQL engine needs to have the ability to locate the matching rows "instantly." (That is to say, without looking through the entire table!)
FYI: Just like any index in real-life – here I would be talking about "library card catalogs" if we still had such a thing – indexes will assist both "equal to" and "less/greater than" queries. The index takes the computer directly to a particular point in the data, whether that's a "hit" or a "near miss."
Finally, the EXPLAIN verb is very useful: put that word in front of your query, and the SQL engine should "explain to you" exactly how it intends to carry out your query. (The SQL engine looks at the structure of the database to make that decision.) Although the EXPLAIN output is ... (heh) ... "not exactly standardized," it will help you to see if the computer thinks that it needs to do something very time-wasting in order to deliver your answer.

Can some one tell me Why This is Query are very slow?

Can some one explain why this query with IN clause over 5000 record are too slow?
Table strucuture
CREATE TABLE IF NOT EXISTS `wp_transactions_log` (
`sync_sequence` bigint(20) unsigned NOT NULL COMMENT 'the sequence number of the sync process/operation that this transaction belong to ',
`objectid` varchar(100) NOT NULL COMMENT 'the entity/record id',
`wp_id` bigint(20) unsigned NOT NULL,
`table_name` varchar(100) NOT NULL COMMENT 'the target wordpress table name this transaction occured/fail for some reason',
`logical_table_name` varchar(100) NOT NULL,
`operation` varchar(20) NOT NULL COMMENT 'inser/update/delete',
`status` varchar(20) NOT NULL COMMENT 'status of the transaction: success,fail',
`fail_count` int(10) unsigned NOT NULL COMMENT 'how many this transaction failed',
`fail_description` text NOT NULL COMMENT 'a description of the failure',
`createdon` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`sync_sequence`,`objectid`,`table_name`,`operation`,`wp_id`),
KEY `objectid` (`objectid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
This table contain 5k record.
The query :
SELECT wp_id,objectId FROM wp_transactions_log WHERE `operation` = "insert" AND `wp_id` != 0 AND `status` != "ignore" AND `table_name` ='itg_wpclass_dates' AND objectId IN (... 5k record)
even this query are same:
SELECT wp_id,objectId FROM wp_transactions_log WHERE objectId IN (5k record)
Note: all the parameters in the IN clauses are themselves the same in the table rows.
I mean by slow it takes more than 15 Sec.
objectid is not indexed. Composite primary key is indexed only. Add index on objectid and then try.
ALTER TABLE wp_transactions_log ADD INDEX (objectid);
Although if you have huge data, then adding index will lock your metadata, use INPLACE algorithm to do it with minimum lock contention.
Also, before youe select statement, just add Explain and provide us the response. It will be a good metrics to identify issue in your table.
The query are fast it take to 200ms to exectue, but the time for processing the query and retrieving the data are the long. I think there's no way to reduce this time.

MySQL - Select unique column with max timestamp

I'm trying to prepare a query and I'm having a hard time with it. I need some MySQL gurus to help please...
Take the following table as an example...
CREATE TABLE order_revision (
id int(11) NOT NULL,
parent_order_id int(11) NOT NULL,
user_id int(11) DEFAULT NULL,
sub_total decimal(19,4) NOT NULL DEFAULT '0.0000',
tax_total decimal(19,4) NOT NULL DEFAULT '0.0000',
status smallint(6) NOT NULL DEFAULT '1',
created_at int(11) NOT NULL,
updated_at int(11) DEFAULT NULL
)
I need a query to select all unique 'parent_order_id' with the max 'updated_at' value. This query should return all rows that have unique 'parent_order_id's based on the max timestamp of the 'updated_at' column.
In other words, each row returned should have an unique 'parent_order_id' and be the maximum timestamp of the'updated_at' column.
Basically this query would find the latest "order revision" for each "parent order"
You mean:
SELECT parent_order_id,max(updated_at) FROM order_revision GROUP BY parent_order_id
For MySQL, the GROUP BY-clause isn't even necessary, nevertheless I would include it for clarification (and most other SQL-conform servers require it).
For anyone interested, this query turned out to be the one I was looking for...
SELECT main.*
FROM order_revision AS main
WHERE main.id = (
SELECT sub.id
FROM order_revision AS sub
WHERE main.parent_order_id = sub.parent_order_id
ORDER BY sub.updated_at DESC
LIMIT 1
);

Optimise query to avoid using temporary tables

I'm stuck for 2 days with a simple query that I'm not able to optimise...
My table contains about 60,000,000 rows :
CREATE TABLE IF NOT EXISTS `pages_objects_likes` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`page_id` bigint(20) unsigned NOT NULL,
`object_id` bigint(20) unsigned NOT NULL,
`user_id` bigint(20) unsigned NOT NULL,
`created_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `pages_objects_likes_page_id_created_time_index` (`page_id`,`created_time`)
) ENGINE=InnoDB ;
My query is :
SELECT c.user_id, c.page_id,
COUNT(*) AS total
FROM pages_objects_likes c
WHERE page_id IN (116001391818501,37823307325,45502366281,30166294332,7133374462,223343965320,9654313055,123096413226,231809706871226,246637838754023,120063638018636)
AND created_time >= DATE_SUB('2014-06-30', INTERVAL 1 YEAR)
AND created_time < DATE_SUB('2014-06-30', INTERVAL 6 MONTH)
GROUP BY c.user_id, c.page_id
But when I EXPLAIN it, I get this :
Using index condition; Using temporary; Using filesort
I would like to optimise indexes or query because it is taking ages to execute (more than 5 minutes).
My server has SSD's, 32Gb or RAM and a 4 Core i5 dedicated to MySQL, so it is not a hardware issue :)
Thank you for your help !
Ok I found the solution !
Update the index like this :
KEY `pages_objects_likes_page_id_created_time_index` (`page_id`,`user_id`,`created_time`)
And update the query by inverting the group by statement :
GROUP BY c.page_id, c.user_id
The index is now used everywhere ;)
I would guess your problem is with DATE_SUB functions, in general using functions in WHERE is a bad Idea because
They evaluate each and every record in your tables.
DB Engines skip indexes usually when the fields are passed to functions.
My suggestion is to pass the DATE_SUB output from client side, or to calculate it once before the start of the query.
I also might be tempted to put an index on "created_time", "page_id", "user_id" and see how it goes.