I'm using MySQL 5.7.10.
I'm checking a new query for an audit report.
I'll execute it in a simple background Unix process, which invoke mysql from the console.
To check the query, I use a worksheet in HeidiSQL.
The table is:
CREATE TABLE `services` (
`assigned_id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`service_id` VARCHAR(10) NOT NULL,
`name` VARCHAR(50) NOT NULL,
...
`audit_insert` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
...
INDEX `idx_audit_insert` (`audit_insert`),
...
);
The simple worksheet is:
SET #numberOfMonths:=6;
SET #today:=CURRENT_TIMESTAMP();
SET #todaySubstractnumberOfMonths=TIMESTAMP( date_sub(#today, interval #numberOfMonths MONTH) );
EXPLAIN SELECT service_id from services where audit_insert between #todaySubstractnumberOfMonths and #today;
The explain output for that query is:
id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,services,[all partitions],ALL,idx_audit_insert,,,,47319735,21.05,Using where
So, index 'idx_audit_insert' is not used.
If I change the query to:
EXPLAIN SELECT service_id where audit_insert between '2020-01-01 00:00:00' and '2020-03-10 23:59:59';
The output is:
id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,tdom_transitos,[all partitions],range,idx_audit_insert,idx_audit_insert,4,,4257192,100.00,Using index condition
Now, the index is used and the rows value is dramatically reduced.
So, my questions are:
How can I force the variables to be timestamp? Is there any wrong in my worksheet?
or maybe
How can I use the index (trying to avoid hints like USE INDEX, FORCE INDEX...)?
Thanks a lot.
(EDIT: I copy the same question in dbastackexchange. Maybe is more properly for that forum).
Well, maybe it's not the answer I thought I'd find, but it works perfectly.
I have splitted audit_insert field in another one, audit_insert_datetype, of DATE type. This field has a new index too.
I have changed the query to execute with this field, and I have tried to force the #... variables to be date type (with current_date and date).
The results: the new index is used and the execution time is dramatically reduced.
Maybe it's bad style, but it works as I need.
All that date arithmetic can be done in SQL. If you do that, it will use the index.
"Constant" expressions (such as CURDATE() + INTERVAL 4 MONTH) are evaluated to a DATETIME or TIMESTAMP datatype before starting the query.
Related
I thought columns in a VIEW simply inherited data types used in the underlying TABLE, but that doesn't seem to be true.
I have a MySQL TABLE like:
"myTable"
FIELD TYPE NULL KEY DEFAULT EXTRA
id int(11) NO PRI NULL auto_increment
dt datetime NO MUL NULL
foo smallint(5) unsigned NO NULL
I can query the table like:
SELECT dt, SUM(foo) FROM myTable WHERE dt>DATE_SUB(Now(), INTERVAL 3 DAY) GROUP BY dt
Now I need the ability to query the same data but using alternate names for some columns (such as "foo").
[I'll skip the long explanation of why!]
I figured a simple solution was a VIEW:
CREATE VIEW myView AS ( SELECT id, dt, grps AS groups FROM myTable ORDER BY dt )
This creates a view with columns like:
"myView"
FIELD TYPE NULL KEY DEFAULT EXTRA
id int(11) NO 0
dt datetime NO NULL
foobar smallint(5) unsigned NO NULL
The problem arises when I query the view: (almost identical to the previous query)
SELECT dt, SUM(foobar) AS foo FROM myView WHERE dt>DATE_SUB(Now(), INTERVAL 3 DAY) GROUP BY dt
The query runs without producing an error but the response is zero records.
I discovered that if I CAST the WHERE clause like this, then it works properly (although it's painfully slow.)
. . . WHERE CAST(dt>DATE_SUB(Now(), INTERVAL 3 DAY)
CASTing all columns would be tedious, plus it's slowing down query execution quite a bit. (There are 5 million records and growing.)
Why is sql forcing me to re-cast the fields? What can I do about it?
Thanks!
Turns out this is a regression somehow introduced with MariaDB 10.2. Somehow having an ORDER BY in the view definition does not turn out well with a GROUP BY on the same column on queries using that view.
I created the following bug report for this:
https://jira.mariadb.org/browse/MDEV-23826
You should remove the ORDER BY clause within the creation statement of the view.
Since, for every call of the view the scanning and the sorting occur for the whole record set before bringing the result for which you might not want to sort the result. This makes ORDER BY clause redundant, and might probably cause adverse impact on performance.
After lots of research and several similar questions asked here, I have reached some conclusions, but as always it is, there are more questions.
This concerns the explicit_defaults_for_timestamp
Assuming the explicit_defaults_for_timestamp is turned off, this will work:
Schema:
CREATE TABLE IF NOT EXISTS `updated_tables` (
`table_name` VARCHAR(50) NOT NULL,
`updated_at` TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6),
PRIMARY KEY (`table_name`),
UNIQUE INDEX `table_name_UNIQUE` (`table_name` ASC))
ENGINE = InnoDB;
And the query:
INSERT INTO `updated_tables` (`table_name`,`updated_at`) VALUES ('products',NULL) ON DUPLICATE KEY UPDATE `table_name`=VALUES(`table_name`), `updated_at`=VALUES(`updated_at`);
First time the query is sent, the table is populated with 'products' and with the current time stamp.
If I repeat the query, then the field 'updated_at' is updated. Per definition, when I send NULL value, even though it is not allowed, the MySQL will update the column.
All is fine, and works as expected.
Lets assume I turn on the explicit_defaults_for_timestamp
If I user the above query, it will complain the the NULL is not allowed, which complies with the rules.
Question is, how can I have the same functionality with the explicit_defaults_for_timestamp turned on?
There is the solution to introduce additional column (varchar) which holds for example timestamp in miliseconds. And when I update it, then the MySQL updates the updated_at accordingly.
But it looks like an overkill, I might as well update the updated_at manually. I would like to move that responsibility to MySQL level, not doing it programatically.
In short, how can I perform updates on the table_name, and have the updated_at being set properly. The trick here is I have many updates (cache table), but actually never changing the table_name value at all.
Is it possible? Or I must turn off explicit_defaults_for_timestamp?
Is it bad decision to turn it off? Looking at this AWS RDS post seems it is ok, but I am not sure.
Side question:
If I decide to perform updates on my own, what would be the way to construct it?
Currently the MySQL CURRENT_TIMESTAMP(6) has this construct:
2018-07-10 11:32:43.490100
How could I create same construct with Javascript? First thing coming to my mind is to get current Date, and append to it last 6 digits of current timestamp.
You can create a trigger on INSERT and always set the value for updated_at with the CURRENT_TIMESTAMP - the cleanest approach but this may slow down your updates. Programmatically setting the column value would be faster than firing a trigger.
If you are executing your queries from Node.js then you can use new Date().getTime() to get a Unix timestamp in milliseconds and then construct your query like this
UPDATE tbl SET col_1 = val_1, col_2 = val_2, updated_at = FROM_UNIXTIME(js_milliseconds / 1000)
WHERE id = desired_id
I have a very huge table (425+ million rows).
CREATE TABLE `DummyTab` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(48) NOT NULL,
`BeginDate` datetime DEFAULT NULL,
`EndDate` datetime NOT NULL,
......
......
KEY `BeginDate_index` (`dBegDate`),
KEY `id` (`id`),
) ENGINE=MyISAM
Selects are done based on "BeginDate" and other criteria on this table
select * from DummyTab where Name like "%dummyname%" and BeginDate>= 20141101
Now in this case only the date field is being provided out of datetime (although it'll be used as 2014-11-01 00:00:00).
Question is DOES THE OPTIMIZER MAKE USE OF DATETIME INDEX PROPERLY EVEN WHEN JUST DATE IS PROVIDED IN THIS CASE ? or should the index be set on a "date" field to be used more effectively rather than a "datetime"
Yes, BeginDate_index can still be used when the query is specified with a DATE-only filter (also applying additional criteria on Name won't disqualify the index either).
If you look at this SqlFiddle of random data, and expand the Execution plan at the bottom, you'll see something like:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 SIMPLE DummyTab range BeginDate_index BeginDate_index 6 17190 100 Using index condition; Using where
(Specifically KEY is BeginDate_index).
Note however that use of the index is not guaranteed, e.g. if you execute the same query against a wider range of date criteria, that a different plan may be used (e.g. if you run the same fiddle for > 20140101, the BeginDate_index is no longer used, since it does not offer sufficient selectivity).
Edit, Re: Comment on Exactness
Since BeginDate is a datetime, the literal 20141101 will be also be converted to a Datetime (once). From the docs:
If one of the arguments is a TIMESTAMP or DATETIME column and the other argument is a constant, the constant is converted to a timestamp before the comparison is performed.
So again, yes, as per your last paragraph, the literal in the filter BeginDate >= 20141101 will be converted to the exact date time 20141101000000 (2014-11-01 00:00:00) and any eligible indexes will be considered (but again, never guaranteed).
A common issue where indexes cannot be used is because the filter predicates are NOT sargable is when a function is applied to a column in a filter, as the engine would need to evaluate the function on all remaining rows in the query. Some examples here.
So altering your example a bit, the below queries do the same thing, but the second one is much slower. This query is sargable:
SELECT * FROM DummyTab
WHERE BeginDate < 20140101; -- Good
Whereas this is NOT:
SELECT * FROM DummyTab
WHERE YEAR(BeginDate) < 2014; -- Bad
Updated SqlFiddle here - again, look at the Execution Plans at the bottom to see the difference.
I am currently working on a Mailbox for a website, holding a large number of messages within a database, where there is an option to filter the mails according to the date. I am in a confusion as of which method to use and how to.
Method 1:
To use a TIMESTAMP column and select the records based on the DATE part only. This seems to be better considering that the TIMESTAMP is the datatype meant to do this. But when filtering, wouldn't the splitting (to date and time) and comparisons be more expensive. If better, how to perform the comparison? (Input : yyyy-mm-dd)
Method 2:
To use a column each for TIME and DATE. Then compare the date field value to the filter param (of the format : yyyy-mm-dd). This seems expensive at inserting a new record (mail), which happens only one at a time. But the filtering requires comparison of a large number of records. So, seems to be more straight forward.
Also in method two, I am having a problem setting the default value as the CURRENT_DATE and CURRENT_TIME!
This is the Table creation code:
CREATE TABLE mailbox (
Mid INT NOT NULL AUTO_INCREMENT,
FromId INT NOT NULL,
ToId INT NOT NULL,
Subject VARCHAR(256) DEFAULT 'No Subject',
Message VARCHAR(2048) DEFAULT 'Empty Mail',
SDate DATE DEFAULT CURRENT_DATE,
STime TIME DEFAULT CURRENT_TIME,
PRIMARY KEY (Mid),
);
Please help...
I would use method 1 and do the filtering with
WHERE
your_timestamp >= search_date
AND
your_timestamp < search_date + INTERVAL 1 DAY
assuming your search_date is of type DATE.
MySQL can use an index in this case.
See this fiddle.
Have a look at the execution plan to verify the use of the index.
I suggest first that you maintain the records in the table sorted by date. Doing so, allows you to not need to compare every value, but you can use binary search to find the two boundaries (begin and end) of records with the desired date.
I would also use the time stamp. If you store is as timesstamp and not as text, it will be number, and its very fast at doing the comparison.
EDIT: Thank you everyone for your comments. I have tried most of your suggestions but they did not help. I need to add that I am running this query through Matlab using Connector/J 5.1.26 (Sorry for not mentioning this earlier). In the end, I think this is the source of the increase in execution time since when I run the query "directly" it takes 0.2 seconds. However, I have never come across such a huge performance hit using Connector/J. Given this new information, do you have any suggestions? I apologize for not disclosing this earlier, but again, I've never experienced performance impact with Connector/J.
I have the following table in mySQL (CREATE code taken from HeidiSQL):
CREATE TABLE `data` (
`PRIMARY` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`ID` VARCHAR(5) NULL DEFAULT NULL,
`DATE` DATE NULL DEFAULT NULL,
`PRICE` DECIMAL(14,4) NULL DEFAULT NULL,
`QUANT` INT(10) NULL DEFAULT NULL,
`TIME` TIME NULL DEFAULT NULL,
INDEX `DATE` (`DATE`),
INDEX `ID` (`SYMBOL`),
INDEX `PRICE` (`PRICE`),
INDEX `QUANT` (`SIZE`),
INDEX `TIME` (`TIME`),
PRIMARY KEY (`PRIMARY`)
)
It is populated with approximately 360,000 rows of data.
The following query takes over 10 seconds to execute:
Select ID, DATE, PRICE, QUANT, TIME FROM database.data WHERE DATE
>= "2007-01-01" AND DATE <= "2010-12-31" ORDER BY ID, DATE, TIME ASC;
I have other tables with millions of rows in which a similar query would take a fraction of a second. I can't figure out what might be causing this one to be so slow. Any ideas/tips?
EXPLAIN:
id = 1
select_type = SIMPLE
table = data
type = ALL
possible_keys = DATE
key = (NULL)
key_len = (NULL)
ref = (NULL)
rows = 361161
Extra = Using where; Using filesort
You are asking for a wide range of data. The time is probably being spent sorting the results.
Is a query on a smaller date range faster? For instance,
WHERE DATE >= '2007-01-01' AND DATE < '2007-02-01'
One possibility is that the optimizer may be using the index on id for the sort and doing a full table scan to filter out the date range. Using indexes for sorts is often suboptimal. You might try the query as:
select t.*
from (Select ID, DATE, PRICE, QUANT, TIME
FROM database.data
WHERE DATE >= "2007-01-01" AND DATE <= "2010-12-31"
) t
ORDER BY ID, DATE, TIME ASC;
I think this will force the optimizer to use the date index for the selection and then sort using file sort -- but there is the cost of a derived table. If you do not have a large result set, this might significantly improve performance.
I assume you already tried to OPTIMIZE TABLE and got no results.
You can either try to use a covering index (at the expense of more disk space, and a slight slowing down on UPDATEs) by replacing the existing date index with
CREATE INDEX data_date_ndx ON data (DATE, TIME, PRICE, QUANT, ID);
and/or you can try and create an empty table data2 with the same schema. Then just SELECT all the contents of data table into data2 and run the same query against the new table. It could be that the data table needed to be compacted more than OPTIMIZE could - maybe at the filesystem level.
Also, check out the output of EXPLAIN SELECT... for that query.
I'm not familiar with mysql but mssql so maybe:
what about to provide index which fully covers all fields in your select query.
Yes, it will duplicates data but we can move to next point of issue discussion.