I am currently working on a Mailbox for a website, holding a large number of messages within a database, where there is an option to filter the mails according to the date. I am in a confusion as of which method to use and how to.
Method 1:
To use a TIMESTAMP column and select the records based on the DATE part only. This seems to be better considering that the TIMESTAMP is the datatype meant to do this. But when filtering, wouldn't the splitting (to date and time) and comparisons be more expensive. If better, how to perform the comparison? (Input : yyyy-mm-dd)
Method 2:
To use a column each for TIME and DATE. Then compare the date field value to the filter param (of the format : yyyy-mm-dd). This seems expensive at inserting a new record (mail), which happens only one at a time. But the filtering requires comparison of a large number of records. So, seems to be more straight forward.
Also in method two, I am having a problem setting the default value as the CURRENT_DATE and CURRENT_TIME!
This is the Table creation code:
CREATE TABLE mailbox (
Mid INT NOT NULL AUTO_INCREMENT,
FromId INT NOT NULL,
ToId INT NOT NULL,
Subject VARCHAR(256) DEFAULT 'No Subject',
Message VARCHAR(2048) DEFAULT 'Empty Mail',
SDate DATE DEFAULT CURRENT_DATE,
STime TIME DEFAULT CURRENT_TIME,
PRIMARY KEY (Mid),
);
Please help...
I would use method 1 and do the filtering with
WHERE
your_timestamp >= search_date
AND
your_timestamp < search_date + INTERVAL 1 DAY
assuming your search_date is of type DATE.
MySQL can use an index in this case.
See this fiddle.
Have a look at the execution plan to verify the use of the index.
I suggest first that you maintain the records in the table sorted by date. Doing so, allows you to not need to compare every value, but you can use binary search to find the two boundaries (begin and end) of records with the desired date.
I would also use the time stamp. If you store is as timesstamp and not as text, it will be number, and its very fast at doing the comparison.
Related
I'm using MySQL 5.7.10.
I'm checking a new query for an audit report.
I'll execute it in a simple background Unix process, which invoke mysql from the console.
To check the query, I use a worksheet in HeidiSQL.
The table is:
CREATE TABLE `services` (
`assigned_id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`service_id` VARCHAR(10) NOT NULL,
`name` VARCHAR(50) NOT NULL,
...
`audit_insert` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
...
INDEX `idx_audit_insert` (`audit_insert`),
...
);
The simple worksheet is:
SET #numberOfMonths:=6;
SET #today:=CURRENT_TIMESTAMP();
SET #todaySubstractnumberOfMonths=TIMESTAMP( date_sub(#today, interval #numberOfMonths MONTH) );
EXPLAIN SELECT service_id from services where audit_insert between #todaySubstractnumberOfMonths and #today;
The explain output for that query is:
id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,services,[all partitions],ALL,idx_audit_insert,,,,47319735,21.05,Using where
So, index 'idx_audit_insert' is not used.
If I change the query to:
EXPLAIN SELECT service_id where audit_insert between '2020-01-01 00:00:00' and '2020-03-10 23:59:59';
The output is:
id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,tdom_transitos,[all partitions],range,idx_audit_insert,idx_audit_insert,4,,4257192,100.00,Using index condition
Now, the index is used and the rows value is dramatically reduced.
So, my questions are:
How can I force the variables to be timestamp? Is there any wrong in my worksheet?
or maybe
How can I use the index (trying to avoid hints like USE INDEX, FORCE INDEX...)?
Thanks a lot.
(EDIT: I copy the same question in dbastackexchange. Maybe is more properly for that forum).
Well, maybe it's not the answer I thought I'd find, but it works perfectly.
I have splitted audit_insert field in another one, audit_insert_datetype, of DATE type. This field has a new index too.
I have changed the query to execute with this field, and I have tried to force the #... variables to be date type (with current_date and date).
The results: the new index is used and the execution time is dramatically reduced.
Maybe it's bad style, but it works as I need.
All that date arithmetic can be done in SQL. If you do that, it will use the index.
"Constant" expressions (such as CURDATE() + INTERVAL 4 MONTH) are evaluated to a DATETIME or TIMESTAMP datatype before starting the query.
For date and time indexing purposes, of the following, which is the best/best practice/fastest?
keep a type date and another for time and have index on type date column
keep a single datetime column and simply put an index on type datetime column
have two, a datetime column and a date column, but put a single index on date
keep a type date and another for time and have index on both , first date and then time
any other approach?
I want to query a table for detecting changes, so I need both date and time.
UPDATE : I thought datetime indexing would take much more space than a date, so it would effect systems performance, is it true?
Assuming you are trying to save the date and the time that is on the same day as the date:
A datetime column can be used by date queries as well as time based queries. I can't see a reason why you would want another field.
I'd suggest using a unix timestamp in a int field, easiest to add, subtract and compare. You can convert to different formats for display.
I have a very huge table (425+ million rows).
CREATE TABLE `DummyTab` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(48) NOT NULL,
`BeginDate` datetime DEFAULT NULL,
`EndDate` datetime NOT NULL,
......
......
KEY `BeginDate_index` (`dBegDate`),
KEY `id` (`id`),
) ENGINE=MyISAM
Selects are done based on "BeginDate" and other criteria on this table
select * from DummyTab where Name like "%dummyname%" and BeginDate>= 20141101
Now in this case only the date field is being provided out of datetime (although it'll be used as 2014-11-01 00:00:00).
Question is DOES THE OPTIMIZER MAKE USE OF DATETIME INDEX PROPERLY EVEN WHEN JUST DATE IS PROVIDED IN THIS CASE ? or should the index be set on a "date" field to be used more effectively rather than a "datetime"
Yes, BeginDate_index can still be used when the query is specified with a DATE-only filter (also applying additional criteria on Name won't disqualify the index either).
If you look at this SqlFiddle of random data, and expand the Execution plan at the bottom, you'll see something like:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 SIMPLE DummyTab range BeginDate_index BeginDate_index 6 17190 100 Using index condition; Using where
(Specifically KEY is BeginDate_index).
Note however that use of the index is not guaranteed, e.g. if you execute the same query against a wider range of date criteria, that a different plan may be used (e.g. if you run the same fiddle for > 20140101, the BeginDate_index is no longer used, since it does not offer sufficient selectivity).
Edit, Re: Comment on Exactness
Since BeginDate is a datetime, the literal 20141101 will be also be converted to a Datetime (once). From the docs:
If one of the arguments is a TIMESTAMP or DATETIME column and the other argument is a constant, the constant is converted to a timestamp before the comparison is performed.
So again, yes, as per your last paragraph, the literal in the filter BeginDate >= 20141101 will be converted to the exact date time 20141101000000 (2014-11-01 00:00:00) and any eligible indexes will be considered (but again, never guaranteed).
A common issue where indexes cannot be used is because the filter predicates are NOT sargable is when a function is applied to a column in a filter, as the engine would need to evaluate the function on all remaining rows in the query. Some examples here.
So altering your example a bit, the below queries do the same thing, but the second one is much slower. This query is sargable:
SELECT * FROM DummyTab
WHERE BeginDate < 20140101; -- Good
Whereas this is NOT:
SELECT * FROM DummyTab
WHERE YEAR(BeginDate) < 2014; -- Bad
Updated SqlFiddle here - again, look at the Execution Plans at the bottom to see the difference.
How is smart to use date as primary value instead ID?
Where I can get better search mysql database performace:
to use timestamp: 1394319600
or to format date and use it as: 09032014
09032014 = 1394319600 = 9.Mart 2014
You likely should not be using a datetime type of data field as a primary key to begin with. I would suggest using autoincrementing integer field to guarantee uniqueness.
Now with regards to the datetime/timestamp field itself it is almost always better to use a native datetime or timestamp data type for these columns rather than string representations such a unix timestamps or other formatted date strings.
Why? Because when people put in timestamp data into their database base, they typically get to the point of wanting to run queries against that data. If you store your data in a non-native datetime format, you will typically need to convert it to such a format before you can use it in typical date/time functions that would be used in this sort of query. This usually means you lose the ability to leverage any index on the field for the query.
For example, say you wanted to run a query to see all records for the current day. With unix timestamp field that query may look like:
SELECT * FROM table
WHERE FROM_UNIX_TIMESTAMP(timestamp_field)
BETWEEN CONCAT(CURRENT_DATE(), ' 00:00:00') AND CONCAT(CURRENT_DATE(), ' 23:59:59')
whereas with a datetime/timestamp filed it would look like:
SELECT * FROM table
WHERE timestamp_field
BETWEEN CONCAT(CURRENT_DATE(), ' 00:00:00') AND CONCAT(CURRENT_DATE(), ' 23:59:59')
Here the simple requirement to use FROM_UNIX_TIMESTAMP() on the left hand side of the WHERE condition in the first query prevents use of an index since FROM_UNIX_TIMESTAMP(timestamp_field) does not exist in memory like timestamp_field would if properly indexed. This means you now need to do a full table scan to execute that query. If you have a large table, this could be very problematic.
I have a table called members. I am looking on advice how to improve it.
id : This is user id (unique) (auto increment) (indexed)
status : Can contain 'activated', 'suspended', 'verify', 'delete'
admin : This just contains either 0 or 1 (if person is admin or not)
suspended_note : If a members account is suspended i can add a note so when they try and login they will see the note.
failed_login_count : basically 1 digit from 0 to 4, counts failed logins
last_visited : unix timestamp of when they last visited site; (updated on logout) (i do this via php with time() )
username : can contain from 3 to 15 characters (unique and indexed)
first_name : can contain letters only and from 3 to 40 chars in length
last_name : can contain letters only and from 2 to 50 chars in length
email : can contain an email address (i use php email filter to check if valid)
password : can contain from 6 to 10 chars in length and is hashed and contains fixed length of 40 chars in database once hashed
date_time : unix timestamp (i do this via php with time() ). When user logs in
ip : members ip on registration/logins
activationkey : i use md5 and a salt to create a unique activation key; length is always 32 chars
gender : either blank or male/female and nothing else.
websiteurl: can add they site url;
msn : can contain msn email address (use regular expression to match this)
aim : aim nickname (use regular expression to match this)
yim : yim nickname (use regular expression to match this)
twitter : twitter username (use regular expression to match this)
suspended_note; first_name; last_name; date_time; ip; gender; websiteurl; msn; aim; yim; twitter can be null because on registration only username, email and password is required so those fields will be null until filled in (they are basically optional and not required) apart from ip which is taken on signup/login.
Could anyone tell me based on the information I have given how I can improve and alter this table more efficently? I would say I could improve it as I tend to use varchar for most things and am looking to get the best performance out of it.
I tend to do quite a few selects and store the user data in sessions to avoid having to query database every time. Username is unique and indexed like id as most of my selects compare have username in it with LIMIT 1 on my queries.
UPDATE:
I wanted to ask if I changed to enum for example how would I do a select and compare query for example in php for enum? I did look online but cannot find any example queries with enum being used. Also if I changed date_time for example to timestamp do I still use time() in php to insert the unix timestamp into date_time column database?
The reason I ask is I was reading one tutorial online that says when the row is queried, selected, updated etc MySQL automatically updates the timestamp for that row; is this true as I rather insert the timestamp using php time() in timestamp field. I use php time() already for date_time but use currently use varchar not timestamp.
Plus server time is in US and in php.ini I set it to UK time but I guess mysql would store it in the time on the server which again is no good as I want them in UK time.
Some tips:
Your status should be an int connected to a lookup, or an enum.
ditto for gender
You could use a char instead of varchar. There is a lot of discussion available on that, but while varchar does help you cut down on the size, that is hardly a big issue most of the time. char can be quicker. this is tricky point though.
safe your date_time as a timestamp. There is a datatype for that
ditto for last_visited
Your ip field looks a bit long to me.
an int(5) can hold too much. So if your failed count is max 4, you don't need that big of a number! A tinyint can hold upt o 127 signed, or 255 unsigned.
A note from the comments:
You could probably normalize some
fields: fields that update often, like
failed_login_count, ip, last_visited
could be in another table. This way
your members table itself doesn't
change as often and can be in cache
I agree with this :)
Edit: some updates after your new questions.
example how would I do a select and compare query for example in php for enum?
You can just compare it to the value as if it was a string. The only difference is that with an insert or update, you can only use the give value. Just use
SELECT * FROM table WHERE table.enum = "yourEnumOption"
changed date_time for example to timestamp do I still use time() in php to insert the unix timestamp into date_time column database?
You can use now() in mysql? (this is just a quick fromthetopofmyhead, could have a minor mistake, but:
INSERT INTO table (yourTime) VALUES (NOW());
reason I ask is I was reading one tutorial online that says when the row is queried, selected, updated etc MySQL automatically updates the timestamp for that row; is this true as I rather insert the timestamp using php time() in timestamp field. I use php time() already for date_time but use currently use varchar not timestamp.
You can use the php time. The timestamp does not get updated automatically, see the manual (http://dev.mysql.com/doc/refman/5.0/en/timestamp.html): you would use something like this in the definition:
CREATE TABLE t (
ts1 TIMESTAMP DEFAULT 0,
ts2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP)
First of all you should use mysql's built in field types:
status is ENUM('activated', 'suspended', 'verify', 'delete');
gender is ENUM('male','female','unknown')
last_visited is TIMESTAMP
suspended_note is TEXT
failed login count is TINYINT(1) because you wouldnt have 10000 failed logins right - INT(5)
date_time is DATETIME or TIMESTAMP
add an index on username and password (combined) so that logins are faster
index, unique email since you'll query by it to retrieve pwds and it should be unique
Also you might want to normalize this table and separate suspended_note, website, IP, aim etc to a separate table called profile. This way logins, session updates, pwd retrievals are queries ran in a much smaller table, and have the rest of the data selected only in pages where you need to have such data as the profile/member pages.
However this tends to vary a lot depending on how your app is thought out but generally its better practice to normalize.
You could probably normalize even more
and have a user_stats table too:
fields that update often, like
failed_login_count, ip, last_visited
could be in another table. This way
your members table itself doesn't
change as often and can be in cache. –
Konerak 1 hour ago
VARCHAR is good but when you know the size of something like the activation key always is 32 then use CHAR(32)
Well first the basics..
IP should be stored as an unsigned INT and you would use INET_ATON and INET_NTOA to retrieve and store the IP.
Status could be an enum or a tinyint 1/0.
For last visited you could insert a unix timestamp using the mysql function UNIX_TIMESTAMP (Store this in a timestamp column). To retrieve the date you would use the FROM_UNIXTIME function.
Most answers have touched on the basics of using Enum's. However using 1 for Male and 2 for Female may speed up your application as a numeric field may be faster than an alphanumeric field if you do a lot of queries by that field. You should test to find out.
Secondly we would need to know how you use the table. How does your app query the table? Where are your indexes? Are you using MyISAM? Innodb? etc. Most of my recommendations would be based on how you app hits the table. The table is also wide so I would look into normalizing it as some others have pointed out.
admin can be of type bit
Activation key can be smaller