I searched Google for a question I ask myself since this morning but couldn't find any information or article about it.
I was wondering, in the following situation, to improve performance (a little % still) :
Context: I have two column : ID, AddedAt (AddedAt is the Unix Timestamp of when the row is created).
Theoretically, if you insert a new row, ID will be +1 and AddedAt will be the current time.
Now, let's say it is impossible in the current situation to have two simultaneous insert, would it be better to use AddedAt as a PK and remove the ID column ? AddedAt will be only one and unique column that does PK and UNIX Timestamp. So in the final, I will have one column instead of two.
The only bad side I see is maybe the size of the key that will be created on AddedAt since unix timestamp now's day is 10 digits.
Would it be better, in this situation ? What's your opinion ?
EDIT: What about using timestamp + ms ?
Timestamps are in seconds. While you might not have simultaneous inserts, as the world tends to speed up you might get multiple inserts in a second. Build your system to function soundly--don't use timesamps as primary keys.
Also, with statement replication sometime timestamps arent consistent across dbs... Row based replication alleviates this, but still its another reason for concern when using them.
From an good convention standpoint, Primary Keys should have some clear meaning to others outside yourself if it's anything other than just us a plain old auto incrementing id field. Generally, people expect numbers or char values for keys, not things like blobs, timestamps, datetimes, etc... This is especially true if later it's used for as a foreign key in another table, using timestamp as a foreign key can be confusing to later developers. Sure, if you have a varchar GUID field you know is unique, use it as the key. Just remember when used as a foreign key your going to eat up also quite a bit of memory if you have a huge string.
Assuming you can guarantee that two events won't occur within the same 1-second interval, then sure, you could use the timestamp field as a PK.
That being said, why are you worried about key sizes? A timestamp may be 10 digits, but its internal storage requirements is only 4 bytes. By comparison, an int is also 4 bytes, so you wouldn't be losing anything - unless you're using bigints, in which case it's 8 bytes.
Also, note that timestamp fields are subject to the y2038k problem. They're essentially unix timestamps that auto-format into a human readable date for you. If your app is going to be around for more than 26 years, then you should stick with an int/bigint, which has a wraparound range of "however fast you insert rows", not a fixed date/time.
The primary key is not only a technical thing, it is the business representation of something that makes each object represented by a row unique.
A timestamp is a unique field of your object because you cannot (in your case) insert two objects at the same time, but it is NOT the primary definition of a business object (if you had a business object called "timestamp" then yes, the time when it was inserted should be the primary key)
An ID stands for "my client has a physical id that represents him": in the past, we would give numbers to clients on papers, bills...
Never forget that computer science is not the objective per se but the means to achieve your goals.
I would leave the ID column as the primary key as there may be scenarios in which the unix timestamp will give you a value you're not expecting. One could be inserting very fast in succession returns the same timestamp, and another is if the server admin decides to monkey with the servers time settings.
Doing joins will probably much more obvious as people typically expect the primary key to be some sort of unique id, not a timestamp.
Yes of course, but performance gain will be minimal only while adding new record.
Moreover you will be forced to use timestamp for foreign_keys in all related objects.
It is worth considering only if you expect many inserts per second and a lot of records (to save storage on id column and its index), but as you said timestamp will be unique, so it's max 1 record per second :-)
Related
From a long time ago, and because several reasons, I have understood that no DATETIME columns should not form part of the primary key of a table. Between these reasons, I think it is a bad idea given the high precision of this field. An example, 2014-06-26 15:35:12 won't match 2014-06-26 15:35:13.
Questions like Use timestamp(or datetime) as part of primary key (or part of clustered index) seem to support this "phobia".
However I am facing now a very concrete problem: I want to map into a MySQL table some values of a function like
f:(TimeInDay,TimeInDay) -> Integer
Where the arguments represent a time interval (with second precision) within the same day.
Unique (TimeInDay,TimeInDay) pairs results in a concrete output value. So I came to this table structure:
CREATE TABLE sessions_schedule
(
tIni TIME NOT NULL,
tEnd TIME NOT NULL,
X tinyInt,
CONSTRAINT pk PRIMARY KEY (tIni, tEnd)
);
Where TIMEs compose the primary key.
In the MySQL online manual I found:
MySQL recognizes TIME values in several formats,... Some of these
formats can include a trailing fractional seconds part in up to
microseconds (6 digits) precision. Although this fractional part is
recognized, it is discarded from values stored into TIME columns.
So, it seems to me, that in this case the inclusion of TIME fields in the primary key is justified. Am I right?
From a long time ago, and because several reasons, I have understood
that no DATETIME columns should not form part of the primary key of a
table.
That's not true for the relational model, it's not true of SQL in general, and it's not true of MySQL in particular.
Between these reasons, I think it is a bad idea given the high
precision of this field. An example, 2014-06-26 15:35:12 won't match
2014-06-26 15:35:13.
Your example isn't a good one. Think about using integers instead. Would you expect the integer 3 to match the integer 4? Of course not. So why would you think '2014-06-26 15:35:12' would match '2014-06-26 15:35:13'? They're different values. Different values aren't supposed to match.
So, it seems to me, that in this case the inclusion of TIME fields in
the primary key is justified. Am I right?
Quite likely. You just have to make sure that you
don't store any values more precise than a second, and
tIni is before tEnd.
(MySQL can store trailing microseconds.)
On other platforms, you'd probably use CHECK constraints to enforce those requirements, but MySQL doesn't enforce CHECK constraints. You'll need to write triggers, or revoke permissions on the tables, and require changes to go through a stored procedure.
I am using MySql in phpMyadmin. I have a table which contains a primary key. This primary key is the 'userid' and it is also an "auto increment" field. The application also has a functionality of deleting a particular user with a 'userid'. So after deleting a user when i again create a new user, the 'userid' gets a value of the next integer. i want the table to consider the deletion and assign primary key value, numbers which have been deleted
..
example:
the 'userid' values in the table are - 1,2,3,4,5,6,7....
i deleted userid with value 3.
so now when i create a next record of user, the table should use the userid value '3' as it is no longer in use. how can i do that in phpmyadmin?
i want to do this to keep the no of values of userid minimum. the count may go upto a 5 digit value of the userid. hence if a 2 digit is available to use since its been deleted before, using this 2 digit value will save memory usage of the database
It is entirely possible to assign the ID that is no longer used by explicitely providing it in the next insert you make. AUTO_INCREMENT only assigns an id if you do not supply it yourself.
Be certain though that the ID is really not being used, otherwise the insertion will fail.
That being said, I would discourage doing this. I am not 100% certain, but I think that when you declare an integer in MySQL, it requires integer space, regardless of how many digits the integer has, but I am open to clarification on this point. In any case, I believe the minor benefit of potentially using a little less space is not worth risking failure by tinkering with your IDs.
In my experience, such little things have a tendency to haunt you later on, and I do not see the real benefit.
I suggest looking for other ways to improve memory usage if necessary.
My database will be storing a large number of data points, so I am using an unsigned BIGINT as the primary key.
Would it ever make sense to use a DATETIME object as the primary key?
Thanks,
Yes if course it makes sense for a date/time to be a key or part of a key if you need to uniquely identify discrete points or periods of time. I can't say if that applies to your scenario but as a general rule there's no fundamental reason why keys can't be based on time - almost any data warehouse does it.
No because it can't be guaranteed to be unique. Stick with BIGINT. You can put a nice index on the DateTime for querying and it will be good enough.
It wouldn't make sense, as you would be limited to one record per second without any actual reason for that.
It makes sense if your data comes from a single time-ordered set. Say, a record of financial transactions. If you have multiple data points which naturally occurred at different instants, but have the same timestamp due to rounding, change the low-order bits to discriminate them.
This is more problematic in MySQL than in other databases, because timestamps are stored with only 1-second precision. (Edit: as of 5.6.4, MySQL has microseconds precision on time types)
If you happen to have multiple observations per second this will fail. For this reason it's probably better not to unless you can guarantee that there will never be more than one point per second.
Database type: mysql
Columns:
Date,time,price1,qty1,price2,qty2
time will be in milliseconds
number of records approx 5.5 million for a month.
I cant choose date as primary key as it is not unique, but can choose date and time as combined but that is also not a good idea.
i will be running queries like
select price and qty between 'this date and time' and 'that date and time' and result might be in millions range.
what could be the best choice in terms of primary key, index and surrogate key and what is the best way to implement this. how should i optimize the database.
Not sure why you say choosing both date and time would be a bad idea (are you against composite keys?)
A bigger problem for you is that time does not store milliseconds. See this bug for more data on that: http://bugs.mysql.com/bug.php?id=8523
Also, there seems to be something missing from the key that identifies the Stock such as Ticker. Since the ticker can change over time, it might be a good idea to introduce a surrogate for it such as StockID. You would do this in a table called Stock or similar.
Then for your Trade table, I would suggest using StockID, Date and Time (but store the time in something other than the TIME datatype so you can store milliseconds. Ask another question if you need help with that).
The order of the keys in the PK is important for both storage and retreival. For retrieval, you want to put the most selective keys for your query first. So if you tend to access all the data for a stock at once (or for a set of stocks), put StockID first so the index can be used to find them quickly. If you tend to access all data for a given interval, put Date then Time first.
For storage, its better to be appending so having Date and Time first is a good idea here too.
In case you want to access mostly in date ranges, but sometimes by Stock, put a secondary index on StockID.
As you don't have a natural key (so nothing unique within each row), you'd need to add a surrogate key (for the sake of argument "transactionid"). You can still have your index based on date time (that really, really should be a single column) for efficient period scanning.
I am looking for a better primary key than the autonumber data type, namely for the reason that it's limited to a long integer, when I really just need the field to reflect a number or text string that will never ever repeat, no matter HOW many records are added or deleted from the table. The problem is I am not sure how to implement something like turning the current date and time into a hexadecimal string and using that as a unique field I can use as a primary key.
Am I just being too paranoid about running out of space?
-- EDITED 03-16-2010 # 1237 hours --
I had a person who, at the time, I thought was a wonderful reference for Access related questions tell me that Replication IDs are just a counter for the number of times an item was replicated... hence I never explored it further. After the number of replies, I have up-modded, and accepted an answer. I guess I was just having a stupid newbie Accesss developer moment. Seriously though, thank you again for everyone who replied!
GUID.
They are pretty unique
http://en.wikipedia.org/wiki/Globally_Unique_Identifier
You did not mention your programming language.
C# would be something like
String myKey = Guid.NewGuid().toString();
Why do you think you'll run out of space? Perhaps you do not realize how big a 64-bit integer is, exactly. It allows for around 10 billion billion records. If you created 100 records per second, it would take over five billion years to run out of integers.
Why are you limited to a long integer? When you specify an AutoNumber field, you can tell it to use a Replication ID instead of Long Integer and it will be a unique 128-bit value called a GUID.
Although you can use the current date and time as a primary key, here's why not to:
The current date and time are not as unique as you might think. If you make records very rapidly, you could end up with two being inserted between clock ticks, causing both to end up with the same time. Or your computer's clock could just get reset backwards. Or DST could end and if you're storing local times, you'll end up with duplicate times.
Per John's answer, you're unlikely to run out of long integers. But if you'd prefer a unique string, the easiest solution is a UUID. It doesn't take inputs, but the odds of ever generating two identical UUIDs are negligible.
For example, in Python:
import uuid
uuid.uuid4()
There are UUID functions available in most all languages: http://en.wikipedia.org/wiki/Uuid
A very simple solution is to use an Autonumber with option "Random" instead of "Increment". I read somewhere that since the numbers are not contiguous, it has the added bonus of improving concurrency when adding new records from several clients simultaneously.