Mysql timestamp and AUTO_INCREMENT as primary key - mysql

I am thinking about the best way to index my data. Is it a good idea to use the timestamp as my primary key? I am saving it anyway and I though about saving some columns. The timestamp should be an integer not a datetime column, because of performance. Moreover I don't want to be restricted on the amount of data in a short time (between two seconds). Therefore, I thought about an additionary AUTO_INCREMENT column. Now I have a unique key (timestamp and AI) and I can get the current inserted id easily by using the command "LAST_INSERT_ID". Is it possible to reset the AI counter every second / when there is a new timestamp? Or is it possible to detect if there is a dataset with the same timestamp and increase the AI value (I still want to be able to use LAST_INSERT_ID).
Please share some thoughts.

The timestamp should be an integer not a datetime column, because of performance.
I think you are of the belief that datetime is stored as a string. It is stored as numbers quite efficiently and with a wider range and more accuracy than an integer.
Using an integer may decrease performance because the database may not be able to correctly index it for use as a timestamp. It will complicate queries because you will not be able to use the full suite of date and time functions without first converting the integer to a datetime.
Use the appropriate date/time type, index it, and let the database optimize it.
Moreover I don't want to be restricted on the amount of data in a short time (between two seconds). Therefore, I thought about an [additional] AUTO_INCREEMENT column.
This would seem to defeat the point of "saving some columns". Now your primary key is two integers. Worse, it's a compound key which requires all references to store both values increasing storage requirements and complicating joins.
All the extra work necessary to determine the next primary key could be done in an insert trigger, but now you'd added complexity and extra work to every insert.
Is it a good idea to use the timestamp as my primary key?
A primary key should be A) unique and B) immutable. A timestamp is not unique, and you might need to change it.
Your primary key is unlikely to be a performance or storage bottleneck. Unless you have a good reason, stick with a simple, auto-incrementing big integer. A big integer because 2 billion is smaller than you think.
MySQL encapsulates this in serial which is bigint unsigned not null auto_increment unique.

TIMESTAMP and DATETIME are risky as a PRIMARY KEY since the PK must be Unique.
Otherwise, it is fine to use them for the PK or an index. But here are some caveats:
When using composite indexes (multi-column), put the things tested with = first; put the datetime last.
Smaller is slightly better when picking a PK. TIMESTAMP and DATETIME take 5 bytes (when not including microseconds); INT is 4 bytes; BIGINT is 8.
The time taken for comparing one PK value to another is insignificant. That includes character PKs. For example, country_code CHAR(2) CHARACTER SET ascii is only 2 bytes -- better than 'normalizing' it and replacing it with a 4-byte cc_id INT.
So, no, don't bother using INT instead of TIMESTAMP.
In my experience, 2/3 of tables have a "natural" PK and don't need an auto_increment PK.
One of the worst places to use a auto_inc is on a many-to-many mapping table. It is likely to slow down most operations by a factor of 2.
You hinted at PRIMARY KEY(timestamp, ai):
You need to add INDEX(ai) to keep AUTO_INCREMENT happy.
It provides locality of reference for temporarily 'near' rows. But so does ai, by itself.
No, there is no practical way to reset the ai each second. (MyISAM has such, but do not use that engine.) Instead be sure to declare ai big enough to last 'forever' before overflowing.
But I can't think of a use case where there isn't a better way.

Related

Index vs Auto Increment ID set as PK

In MySQL, what is the difference between Index and an ID set to AUTO_INCREMENT and as Primary Key?
Will this also increase the speed to searching the database? Or is the AUTO_INCREMENT ID just for the purpose of the of the user and the computer doesn't consider it while searching the database? Reading about INDEX on w3schools.com I came across this line:
Indexes are used to retrieve data from the database very fast. The
users cannot see the indexes, they are just used to speed up
searches/queries.
In MySQL, the primary key creates an index on the key . . . but the original data pages are the leafs of the index. This may be a little convoluted, but the effect is that the data is actually sorted on the data pages.
A regular index is implemented as a b-tree (note: the "b" standards for "balanced" rather than "binary", contrary to what many people believe). The leafs are stored separately from the original data.
auto_increment is a property of one column of a table, where the value is set to a new value on each insert and the new value is larger than the previous value. The increment is usually 1, but that is not guaranteed. auto_increment does not directly relate to indexing, but is almost always associated with the primary key of the table.
So, in both cases, you have an index. The primary key index is slightly smaller because storage is combined with the data pages themselves. On the other hand, the data needs to be in order on the disk, which can complicate inserts and updates. On the other hand, auto-increment guarantees that all new rows go at the end of the data. On the other hand, I've run out of hands.
When you index a column, you make a binary tree (or possibly another data structure) to speed up the search process.
ID or primary key by default is indexed.
Auto_Increment means you want MySQL to automatically set a value to the column whenever a new row gets inserted. The value will be set incrementally.
AUTO_INCREMENT is an integer sequence generator and nothing else. It has no inherent relation to indexes of any kind, and only exists to generate unique sequential numbers. It is frequently used to generate integer surrogate keys which are frequently used as primary keys.
You don't have to use either AUTO_INCREMENT or integer IDs as an index so long as you have one or more fields that you can use to uniquely identify a row.
In fact, in terms of scalability, sequence generators like AUTO_INCREMENT are counter-productive as you can only ever have a single instance of a sequence generator, limiting the number of 'master/write' servers and/or bottlenecking insert performance to that of the node running the generator.

TIME Columns as primary key

From a long time ago, and because several reasons, I have understood that no DATETIME columns should not form part of the primary key of a table. Between these reasons, I think it is a bad idea given the high precision of this field. An example, 2014-06-26 15:35:12 won't match 2014-06-26 15:35:13.
Questions like Use timestamp(or datetime) as part of primary key (or part of clustered index) seem to support this "phobia".
However I am facing now a very concrete problem: I want to map into a MySQL table some values of a function like
f:(TimeInDay,TimeInDay) -> Integer
Where the arguments represent a time interval (with second precision) within the same day.
Unique (TimeInDay,TimeInDay) pairs results in a concrete output value. So I came to this table structure:
CREATE TABLE sessions_schedule
(
tIni TIME NOT NULL,
tEnd TIME NOT NULL,
X tinyInt,
CONSTRAINT pk PRIMARY KEY (tIni, tEnd)
);
Where TIMEs compose the primary key.
In the MySQL online manual I found:
MySQL recognizes TIME values in several formats,... Some of these
formats can include a trailing fractional seconds part in up to
microseconds (6 digits) precision. Although this fractional part is
recognized, it is discarded from values stored into TIME columns.
So, it seems to me, that in this case the inclusion of TIME fields in the primary key is justified. Am I right?
From a long time ago, and because several reasons, I have understood
that no DATETIME columns should not form part of the primary key of a
table.
That's not true for the relational model, it's not true of SQL in general, and it's not true of MySQL in particular.
Between these reasons, I think it is a bad idea given the high
precision of this field. An example, 2014-06-26 15:35:12 won't match
2014-06-26 15:35:13.
Your example isn't a good one. Think about using integers instead. Would you expect the integer 3 to match the integer 4? Of course not. So why would you think '2014-06-26 15:35:12' would match '2014-06-26 15:35:13'? They're different values. Different values aren't supposed to match.
So, it seems to me, that in this case the inclusion of TIME fields in
the primary key is justified. Am I right?
Quite likely. You just have to make sure that you
don't store any values more precise than a second, and
tIni is before tEnd.
(MySQL can store trailing microseconds.)
On other platforms, you'd probably use CHECK constraints to enforce those requirements, but MySQL doesn't enforce CHECK constraints. You'll need to write triggers, or revoke permissions on the tables, and require changes to go through a stored procedure.

MySQL - Speed of string comparison for primary key

I have a MySQL table where I would like my primary key to be a string. This string may potentially be a bit longer (hundreds of characters).
A very common query would be an INSERT ... ON DUPLICATE KEY UPDATE, which means MySQL would have to check whether the primary key already exists in the table a lot. If this is done with a naive strcmp I imagine this might take quite a while the longer the strings are. Would it thus be better to hash the string manually (either to a shorter string or some other data type) and use that as my primary key or can I just use the long string directly? Does MySQL hash primary key strings internally?
First off, when you have an index on a varchar field mysql doesn't do a strcmp on all entries to find the correct one; instead it uses a binary tree, which is a lot faster than strcmp to navigate through to find the proper entry.
Note: I include some info to improve performance if needs be below, but please do not do that until you hit an actual problem. Varchar indexes are quick, they have been optimized by a lot of very smart people, and in the large majority of cases it will be way more than you need.
With that said, if you have a lot of entries and/or very long keys it can be nice performance wise to use an index of hashes on top of it.
CREATE TABLE users
(
username varchar not null,
username_hashed varchar(32) not null,
primary key (username),
index (username_hashed)
);
When you insert you can set username_hashed = md5(username) for example. And then you search with something like select otherfields from users where username_hashed = md5(username) and username = username
Note that it seems mysql 5.5 support hash index natively, which would allow you to not have to do that by hand.
Does the primary key need to be a string? Can't it just be a unique index, with an integer primary auto increment?
Searching will always be faster with integers, and it might take a bit of code rearrangement in your app, but you'll always be better off searching numbered primary keys vs. strings.
Look at these two posts that show the difference in memory for int and varchar:
What is the size of column of int(11) in mysql in bytes?
Memory usage of storing strings as varchar in MySQL

Theoretical situation about MySQL

I searched Google for a question I ask myself since this morning but couldn't find any information or article about it.
I was wondering, in the following situation, to improve performance (a little % still) :
Context: I have two column : ID, AddedAt (AddedAt is the Unix Timestamp of when the row is created).
Theoretically, if you insert a new row, ID will be +1 and AddedAt will be the current time.
Now, let's say it is impossible in the current situation to have two simultaneous insert, would it be better to use AddedAt as a PK and remove the ID column ? AddedAt will be only one and unique column that does PK and UNIX Timestamp. So in the final, I will have one column instead of two.
The only bad side I see is maybe the size of the key that will be created on AddedAt since unix timestamp now's day is 10 digits.
Would it be better, in this situation ? What's your opinion ?
EDIT: What about using timestamp + ms ?
Timestamps are in seconds. While you might not have simultaneous inserts, as the world tends to speed up you might get multiple inserts in a second. Build your system to function soundly--don't use timesamps as primary keys.
Also, with statement replication sometime timestamps arent consistent across dbs... Row based replication alleviates this, but still its another reason for concern when using them.
From an good convention standpoint, Primary Keys should have some clear meaning to others outside yourself if it's anything other than just us a plain old auto incrementing id field. Generally, people expect numbers or char values for keys, not things like blobs, timestamps, datetimes, etc... This is especially true if later it's used for as a foreign key in another table, using timestamp as a foreign key can be confusing to later developers. Sure, if you have a varchar GUID field you know is unique, use it as the key. Just remember when used as a foreign key your going to eat up also quite a bit of memory if you have a huge string.
Assuming you can guarantee that two events won't occur within the same 1-second interval, then sure, you could use the timestamp field as a PK.
That being said, why are you worried about key sizes? A timestamp may be 10 digits, but its internal storage requirements is only 4 bytes. By comparison, an int is also 4 bytes, so you wouldn't be losing anything - unless you're using bigints, in which case it's 8 bytes.
Also, note that timestamp fields are subject to the y2038k problem. They're essentially unix timestamps that auto-format into a human readable date for you. If your app is going to be around for more than 26 years, then you should stick with an int/bigint, which has a wraparound range of "however fast you insert rows", not a fixed date/time.
The primary key is not only a technical thing, it is the business representation of something that makes each object represented by a row unique.
A timestamp is a unique field of your object because you cannot (in your case) insert two objects at the same time, but it is NOT the primary definition of a business object (if you had a business object called "timestamp" then yes, the time when it was inserted should be the primary key)
An ID stands for "my client has a physical id that represents him": in the past, we would give numbers to clients on papers, bills...
Never forget that computer science is not the objective per se but the means to achieve your goals.
I would leave the ID column as the primary key as there may be scenarios in which the unix timestamp will give you a value you're not expecting. One could be inserting very fast in succession returns the same timestamp, and another is if the server admin decides to monkey with the servers time settings.
Doing joins will probably much more obvious as people typically expect the primary key to be some sort of unique id, not a timestamp.
Yes of course, but performance gain will be minimal only while adding new record.
Moreover you will be forced to use timestamp for foreign_keys in all related objects.
It is worth considering only if you expect many inserts per second and a lot of records (to save storage on id column and its index), but as you said timestamp will be unique, so it's max 1 record per second :-)

MySQL large index integers for few rows performance

A developer of mine was making an application and came up with the following schema
purchase_order int(25)
sales_number int(12)
fulfillment_number int(12)
purchase_order is the index in this table. (There are other fields but not relevant to this issue). purchase_order is a concatenation of sales_number + fulfillment.
Instead i proposed an auto_incrementing field of id.
Current format could be essentially 12-15 characters long and randomly generated (Though always unique as sales_number + fulfillment_number would always be unique).
My question here is:
if I have 3 rows each with a random btu unique ID i.e. 983903004, 238839309, 288430274 vs three rows with the ID 1,2,3 is there a performance hit?
As an aside my other argument (for those interested) to this was the schema makes little sense on the grounds of data redundancy (can easily do a SELECT CONCATENAE(sales_number,fulfillment_number)... rather than storing two columns together in a third)
The problem as I see is not with bigint vs int ( autoicrement column can be bigint as well, there is nothing wrong with it) but random value for primary key. If you use INNODB engine, primary key is at the same time a clustered key which defines physical order of data. Inserting random value can potentially cause more page splits, and, as a result a greater fragmentation, which in turn causes not only insert/update query to slow down, but also selects.
Your argument about concatenating makes sense, but executing CONCATE also has its cost(unfortunately, mysql doesn't support calculated persistent columns, so in some cases it's ok to store result of concatenation in a separate column; )
AFAIK integers are stored and compared as integers so the comparisons should take the same length of time.
Concatenating two ints (32bit) into one bigint (64bit) may have a performance hit that is hardware dependent.
having incremental id's will put records that were created around the same time near each other on the hdd. this might make some queries faster. if this is the primary key on innodb or for the index that these id's are used.
incremental records can sometimes be inserted a little bit quicker. test to see.
you'll need to make sure that the random id is unique. so you'll need an extra lookup.
i don't know if these points are material for you application.