Am I missing a reason not to use the new DateTime2 datatype?
For example, might it cause problems when migrating to another database system or integrating it with another technology?
One of the definitive articles is this from Tibor Karaszi
In favour:
Better precision
Potentially less storage
But probably best of all judging by frequency of questions here:
Better support for ANSI date formats (yyyy-mm-dd is not safe otherwise)
If you're lucky enough to be using nothing but Sql Server 2008 and can guarantee that you will be for a long time to come, then I see no reason why you shouldn't use it if you need to.
I think the replies to this question will explain it better than I can.
However, reasons for not using it would be pretty much as you describe, i.e. it's not recognised in earlier versions of Sql Server, so moving data between the two would require some conversion.
Similarly, datetime2 has a higher precision and if you write code that depends on that level of precision, then you are locked-in to always using that datatype.
Related
http://golang.org/pkg/time/
I am building a ISO and RFC complaint core for my new Go system. I am using MySQL and am currently figuring out the most optimal setup for the most important base-tables.
I am trying to figure out how to store the date-time in the database. I want to aim at a good balance between the space the saved time in the database will occupy, but also the query-capabilties and the compatibility with UTC and easy timezone conversion that doesn't give annoying conflicts for inserting and retrieving data into/from Go/MySQL.
I know this sounds a bit weird in context to the title of my question. But I see a lot of wrappers, ORM's and such still storing UNIX timestamps (microseconds?). I think it would be good to just always store UTC nano timestamps and just accepting losing the date/time querying functionalities. I don't want to get into problems when running the system with tons of different countries/languages/timezones/currencies/translations/etc. (internationalizations and localizations). I already encountered these problems before with some systems at work and it drove me nuts to the point where eventually tons of fixes had to be applied through the whole codebase to at least some of the conversion back into order. I don't want this to happen in my system. If it means I always have to do some extra coding to keep all stored times in correct UTC+0, I will take that for granted. Based on ISO-8601 and the timezone aberrations and daytime-savings I will determine the output of the date/time.
The story above is opinion based. But my actual question would be what solely is more efficient to choose Go's timestamp as INT stored vs MySQL TIMESTAMP or DATETIME;
1.) What is most optimal considering storage?
2.) What is most optimal considering timezone conventions?
3.) What is most optimal considering speed and MySQL querying?
The answer to all these questions is simply storing the timestamp in UTC time with t.UTC().UnixNano(), keep in mind that time is int64 so it will always be 8 bytes in the database regardless of precision.
What are the pros and cons? When should we have them and when we shouldn't?
UPDATE
What is this comment in an update SP auto generated with RepositoryFactory? Does it have to do anything with above columns not present?
--The [dbo].[TableName] table doesn't have a timestamp column. Optimistic concurrency logic cannot be generated
If you don't need historical information about your data adding these columns will fill space unnecessarily and cause fewer records to fit on a page.
If you do or might need historical information then this might not be enough for your needs anyway. You might want to consider using a different system such as ValidFrom and ValidTo, and never modify or delete the data in any row, just mark it as no longer valid and create a new row.
See Wikipedia for more information on different schemes for keeping historic information about your data. The method you proposed is similar to Type 3 on that page and suffers from the same drawback that only information about the last change is recorded. I suggest you read some of the other methods too.
All I can say is that data (or full blown audit tables) has helped me find what or who caused a major data problem. All it takes is one use to convince you that it is good to spend the extra time to keep these fields up-to-date.
I don't usually do it for tables that are only populated through a single automated process and no one else has write permissions to the table. And usually it isn't needed for lookup tables which users generally can't update either.
There are pretty much no cons to having them, so if there are any chance you will need them, then add them.
People may mention performance or storage concerns but,
in reality they will have little to no effect on SELECT performance with modern hardware, and properly specified SELECT clauses
there can be a minor impact to write performance, but this will likley only be a concern in OLTP-type systems, and this is exactly the case where you suually want these kinds of columns
if you are at the point where adding columns like this are a dealbreaker in terms of performance, then you are likely looking at moving away from SQL databases as a storage platform
With CreatedDate, I almost always set it up with a default value of GetDate(), so I never have to think about it. When building out my schema, I will add both of these columns unless it is a lookup table with no GUI for administering it, because I know it is unlikely the data will be kept up to date if modified manually.
Some DBMSs provide other means to capture this information autmatically. For example Oracle Flashback or Microsoft Change Tracking / Change Data Capture. Those methods also capture more detail than just the latest modification date.
That column type timestamp is misleading. It has nothing to do with time, it is rowversion. It is widely used for optimistic concurrency, example here
Any good articles out there comparing Oracle vs SQL Server vs MySql in terms of performance?
I'd like to know things like:
INSERT performance
SELECT performance
Scalability under heavy load
Based on some real examples in order to gain a better understanding about the different RDBMS.
The question is really too broad to be answered because it all depends on what you want to do as there is no general "X is better than Y" benchmark without qualifying "at doing Z" or otherwise giving it some kind of context.
The short answer is: it really doesn't matter. Any of those will be fast enough for your needs. I can say that with 99% certainty. Even MySQL can scale to billions of rows.
That being said, they do vary. As just one example, I wrote a post about a very narrow piece of functionality: join and aggregation performance. See Oracle vs MySQL vs SQL Server: Aggregation vs Joins.
Yes, such benchmarks do exist, but they cannot be published, as Oracle's licensing prohibits publishing such things.
At least, that is the case to the best of my knowledge. I've seen a few published which do not name Oracle specifically, but instead say something like "a leading RDBMS" when they are clearly talking about Oracle, but I don't know whether that gets around it.
On the other hand, Oracle now own MySQL, so perhaps they won't care so much, or perhaps they will. Who knows.
I'm having to start building the architecture for a database project but i really don't know the differences between the engines.
Anyone can explain whats the pros and bads of each of these three engines? We'll have to choose one of them and the only thing I actually know about them is this:
Mysql & Postgres:
Are free but not so good as oracle
Mysql as security problems (is this true?)
Oracle:
Best data base engine in the world
Expensive
Can someone clear out other differences between them? This is a medium/large (we're thinking of around some 100 to 200 tables) project with low budget, what would you choose? And with a higher budget?
A few years ago I had to write a translation engine; you feed it one set of sql and it translates to the dialect of the currently connected engine. My engine works on Postgres (AKA PostgreSql), Ingres, DB2, Informix, Sybase, and Oracle - oh, and ANTS. Frankly, Oracle is my least favorite (more on that below)... Unfortunately for you, mySql and SQL Server are not on the list (at the time neither was considered a serious RDBMS - but times do change).
Without regard to the quality or performance of the engine - and ease of making and restoring backups - here are the primary areas of difference:
datatypes
limits
invalids
reserved words
null semantics (see below)
quotation semantics (single quote ', double quote ", or either)
statement completion semantics
function semantics
date handling (including constant keywords like 'now' and input / output function formats)
whether inline comments are permitted
maximum attribute lengths
maximum number of attributes
connection semantics / security paradigm.
Without boring you on all the conversion data, here's a sample for one datatype, lvarchar:
oracle=varchar(%x) sybase=text db2="long varchar" informix=lvarchar postgres=varchar(%x) ants=varchar(%x) ingres=varchar(%x,%y)
The biggest deal of all, in my view, is null handling; Oracle SILENTLY converts blank input strings to null values. ...Somewhere, a LONG time ago, I read a writeup someone had done about "The Seventeen Meanings of Null" or some such and the real point is that nulls are very valuable and the distinction between a null string and an empty string is useful and non-trivial! I think Oracle made a huge mistake on this one; none of the others have this behavior (that I've ever seen).
My second least favorite was ANTS because unlike all the others, they ENFORCED the silly rules for perfect syntax that absolutely no one else does and while they may be the only DB company to provide perfect adherence to the standard, they are also a royal pain in the butt to write code for.
Far and away my favorite is Postgres; it's very fast in _real_world_ situations, has great support, and is open source / free.
The differences between different SQL Implementations are big, at least under the hood. This boards wont suffice to count them all.
If you have to ask, you also have to ask yourself whether you are in the position to reach a valid and founded decision on the matter.
A comparison von MYSQL and Postgres can be found here
Note that Oracle offers also an Express (XE) edition, reduced in features, but free to use.
Also, if you have little knowledge to start with, you will have to learn yourself, I would just choose any one, and start learning by using it.
See the comparison tables on wikipedia: http://en.wikipedia.org/wiki/Comparison_of_object-relational_database_management_systems && http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
Oracle may or may not be the best. It's expensive, but that doesn't mean best.
Have you looked at DB2? Sybase? Teradata? MS SQL?
I think that for low budget scenarios Oracle is out of the question.
100-200 tables is not big and generally the amount of tables in a schema is not a scale measure. Dataset and throughput is.
You can have a look at http://www.mysqlperformanceblog.com/ (and their superb book) to see how MySQL can handle huge deployments.
Generally nowadays most RDBMSes can do almost anything that you'd need in a very serious application.
Trying to make a MySQL-based application support MS SQL, I ran into the following issue:
I keep MySQL's auto_increment as unsigned integer fields (of various sizes) in order to make use of the full range, as I know there will never be negative values. MS SQL does not support the unsigned attribute on all integer types, so I have to choose between ditching half the value range or creating some workaround.
One very naive approach would be to put some code in the database abstraction code or in a stored procedure that converts between negative values on the db side and values from the larger portion of the unsigned range. This would mess up sorting of course, and also it would not work with the auto-id feature (or would it some way?).
I can't think of a good workaround right now, is there any? Or am I just being fanatic and should simply forget about half the range?
Edit:
#Mike Woodhouse: Yeah, I guess you're right. There's still a voice in my head saying that maybe I could reduce the field's size if I optimize its utilization. But if there's no easy way to do this, it's probably not worth worrying about it.
When is the problem likely to become a real issue?
Given current growth rates, how soon do you expect signed integer overflow to happen in the MS SQL version?
Be pessimistic.
How long do you expect the application to live?
Do you still think the factor of 2 difference is something you should worry about?
(I have no idea what the answers are, but I think we should be sure that we really have a problem before searching any harder for a solution)
I would recommend using the BIGINT data type as this goes up to 9,223,372,036,854,775,807.
SQL Server does not support signed and unsigned values.
I would say this.. "How do we normally deal with differences between components?"
Encapsulate what varies..
You need to create an abstraction layer within you data access layer to get it to the point where it doesn't care whether or not the database is MySQL or MS SQL..