The MySQL server I'm using is 5.5.41. I also want to note I did not design this database.
The problem I'm running in to, is when using MySQL's TRUCATE function, I seem to be getting an off by one error. As in it's not accurate. See the attached screen shot for what I mean.
If the option of changing the table structure was limited, is there a way around fixing this bug and returning the correct number?
Floating point numbers are not exact. The actual value of 70.85 is probably something like 70.84999999, but it's being shown rounded off to the nearest 2 decimal places. TRUNCATE takes the actual value and just discards all the decimal places beyond what you requested, so it always rounds down, not to the nearest value, so it becomes 70.84.
If you don't want to lose accuracy like this, use the DECIMAL datatype instead of FLOAT. You could also use ROUND(reserve_amount, 2) instead of TRUNCATE(reserve_amount, 2).
I am trying to figure out how to store 1/3, or any fraction which results in an infinitely repeating decimal value in MySQL. I cannot just use 3.333333 because it obviously does not total to 100. I have been reading about the float datatype but i'm not sure if this will work. Any help would be appreciated.
Thank you
You could potentially represent all rational numbers (including integers) as "numerator" and "denominator". So in your table you'd have a numerator and denominator columns, and your app would have logic to store numbers using that form.
You would still be unable to store irrational numbers precisely with this technique (i.e. if you want to store Pi, you'd need a fractional approximation anyway).
See here for what rational numbers are, so you can understand the limitations of this technique.
http://en.wikipedia.org/wiki/Rational_number
Store the numerator and the denominator in separate columns. It's really that simple. The real problem comes later when you want to add up all the fractions. Some languages have built-in facilities to do so, but I don't think MySQL does.
I am being specific about handling large number of money values. Each value is precise only upto 2 decimal places. But the values will be passed around by a database and one or more web frameworks and there will be arithemetic operations.
Should I insist on decimal datatypes for numbers that need only 2 places of precision? Or are modern floating point implementations robust and standardized to avoid it?
Hell no, absolutely, and the issues are orthogonal, in that order. :-)
Floating point numbers, especially in binary, are never the right choice for fixed-point quantities, least of all those that expect precise fractions, like money values. First of all, they don't express all values of cents (or whatever fractional component) accurately, just like fixed-length decimal numbers can't express 1/3 correctly. Secondly, adding or subtracting small and very large floating point numbers doesn't always produce the result you expect, because of differences in "significance".
Decimal numbers are the way to go for currency calculations. If you absolutely must use binary numbers, use scaled fixed-point binary numbers - for example, compute everything in 1/100ths of your currency unit, and use binary integers to do it.
Lastly, this has nothing to do with "robustness" or "standardization" - it's got everything to do with picking a datatype that matches your data.
No, they are not precise enough. See the floating point guide for details.
EDIT 2009-Nov-04
OK, so it's been a little while since I first posted this question. It seems to me that many of the initial responders failed to really get what I was saying--a common response was some variation on "What you're saying doesn't make any sense"--and so I've made some handy diagrams to really illustrate my point.
When we speak of numbers, we are generally referring to points on what grade school children learn is called the Number Line:
Now, when we learn arithmetic, our minds learn to perform a very interesting transformation of this concept. Evalutating the expression 1 + 0.5, for example, if we simply applied our "number line thinking", would require us to somehow make sense of this:
It's difficult to really illustrate that, because it's difficult to think about that: "adding" two points. This is where a lot of responders struggled with the idea of adding dates (or simply dismissed it as absurd), because they were thinking of dates as points.
However, the expression 1 + 0.5 does make sense to us, because when we think of it, we're really imagining this:
That is, the number (or point) 1, plus the vector 0.5, resulting in point 1.5.
Alternately, we may be imagining this:
That is, the vector 1, plus the vector 0.5, resulting in the vector 1.5.
In other words, when dealing with numbers, we treat points and vectors interchangeably. But what about dates? Dates are, after all, basically numbers. If you don't believe me, compare this line to the number line above:
Notice the correspondence between the timeline and the number line? This was my point: if we perform the transformation above with numbers, we ought to be able to do it with dates as well. So, applying "timeline thinking", the expression 0001-Jan-02 00:00:00 + 0001-Jan-01 12:00:00 doesn't make a lot of sense, as plenty of responders pointed out:
But, if we do the same conceptual transformation in our head that we perform every time we add or subtract numbers, we can easily "rethink" the above as this:
So clearly, the difference between a DateTime and a TimeSpan is the same difference that exists between a point and a vector. What I think caused a lot of people to respond negatively to my suggestion is that it just feels so unnatural to think of dates as magnitudes in this way. But I don't buy the argument that there's no obvious reference point to use as zero. There is an obvious reference point, and I'll give you a hint where it is: about 2010 years ago.
Don't get me wrong: I'm not questioning the usefulness of drawing a conceptual divide between the notion of a DateTime and a TimeSpan. Really, my question all along should have been (as ChrisW indirectly suggested), why do we treat numbers and vectors interchangeably when dealing with regular numeric types? (Or: why do we have just one int type, instead of int and intspan?) There's a big difference, and yet we don't ever really think about it until sometime in junior high or high school, when we begin geometry. And then it's treated as this new mathematical concept, when in reality it's something we've been utilizing ever since we learned to add numbers by counting with our fingers.
In the end, the best answer came from Strilanc, who pointed out that the use of DateTime and TimeSpan is really an implementation of an affine space, which has the convenient property of not needing a reference point to treat as the origin. So thanks, Strilanc. I'm giving the accepted answer to ChrisW, however, for being the first one to bring up the concept of vectors and points, which really got to the crux of the matter.
ORIGINAL QUESTION (for posterity)
I am certainly no programming jack of all trades, but I know both PHP and .NET have a TimeSpan class in addition to a DateTime class (or structure in .NET), and I am guessing this is the case in a variety of other languages and frameworks as well (though I am writing this primarily with reference to the .NET structures). This might seem a strange question, but isn't TimeSpan redundant?
In case you think the answer is obvious ("A DateTime is an absolute point in time, while a TimeSpan is a range of time -- simple as that!"), consider this: an integer can be conceptualized as either an absolute value (the point on the number line) or a distance between values--and we don't need two separate data types for these different conceptualizations. I can still write 5 + 6 without any ambiguity as to what I mean.
As long as there is a consistent zero-point reference, it seems to me there should be no reason why one would need a TimeSpan object to perform arithmetic operations on DateTime objects, or to get the distance between them.
What am I missing? Why can't the unique methods and properties of the TimeSpan structure simply be folded into DateTime?
(Disclaimer: It isn't like I'm passionate about this or anything; I'm fine using DateTime and TimeSpan objects as they're intended all the time. I'm just asking a question.)
EDIT: Okay, over-simplified example to illustrate my point:
Consider the equation 10 - 5 = 5. One could read this as "Start at 10 (value), move 5 to the left (span), and you end up at 5 (value)."
Suppose, just to make things easy, we let January 1 1900 be point zero and we define TimeSpan objects in terms of days only.
Then 10 - 5 = 5 could be understood, in DateTime terms, as January 11 1900 - January 6 1900 = January 6 1900. This is fine, because January 11 is just "10" by our definition and January 6 is "5". The fact that we are viewing the 10 as a value, the first 5 as a span, and the last 5 as a value again is merely for our own conceptual benefit. My point is just this: that the only difference is in how you think of the number, not in what it actually is. This is why we don't have separate structures for, say, integer values and integer spans -- a plain old integer covers all our bases.
Am I making any sense?
consider this: an integer can be conceptualized as either an absolute value (the point on the number line) or a distance between values
By your logic, it isn't TimeSpan that's unecessary: rather it's DateTime that's unnecessary, and could be replaced by TimeSpan (duration since zero).
Plus there's the fact that integers have an obvious zero, whereas Dates however don't have an obvious zero; but having an obvious zero is necessary, if you want to replace "place on the number line" with "distance/span from the zero/origin".
Edit:
A point (location on a plane) isn't the same as a vector.
They seem similar ...
A vector (distance from origin) can represent a point
A point (relative to the origin) can represent a vector
... however the value of the vector that's required to represent a given point will change if the origin changes.
It always makes sense to add two (relative) vectors; but, it makes no sense to add two points, except by converting those points to vectors and then adding the vectors.
The sum of two vectors is unaffected by a change in the origin, but the sum of two points would be affected by a change in the origin if you summed them by converting them to vectors and adding the vectors (because changing the origin would affect the values of those vectors).
[Replace 'point' with DateTime and 'vector' with TimeSpan in the argument above.]
I think there is a genuine difference between absolute and relative values. I'm don't know why that difference isn't more apparent in arithmetic, i.e. why 'numbers' are used seemingly interchangeably to represent both absolute and relative values.
(Speaking as a mathematician) It's because arithmetic operations on a "date" aren't closed or well defined, necessitating the need for an additional structure.
For example, January 1, 2000 - December 1, 1999 = ... ? We know there's 31 days between them, but if this were interpreted as a date, then the answer is Epoch (i.e., zero) + 31 days. This is not a valid "date" anymore.
Similarly, all the arithmetic operations on integers aren't well defined (1 / 2 has no answer in the integers .. integer math returns zero here, but 0 * 2 = 0, not 1 as you would expect). This necessitates the need for an additional structure that we call fractions.
Just because you can define an operation doesn't mean you should. For example, one of the reasons division by zero is undefined is because defining it would require sacrificing some very useful properties of arithmetic (eg. associativity, etc).
The distinction between a timespan and a date comes down to addition. It makes sense to add two timespans, but it doesn't make sense to add two dates unless you have an arbitrary reference date. By not allowing addition of dates, you abstract away that arbitrary reference date. I don't know what date '0' is in .Net, and I've never needed to know. Isn't that nice?
Adding two dates is almost always a bug (seriously, try to think of where this makes sense outside of numerology). By introducing timespans (creating an Affine Space) you eliminate a whole class of bugs.
One reason is that splitting the types prevents a class of bugs where you think you have a relative time but really have an absolute time, and vice versa. For example, addition of two absolute times can be flagged as a compiler error if the two types are separate.
Also, IntelliSense (and discovery for newbies) works better when the number of members is smaller-- by splitting methods between the two types, working with each gets easier.
Asked the other way round: what would the benefit of weakening the type system in that regard be?
It’s all a question of cost vs. benefit and DateTime has the great benefit of reducing bugs due to illogical date/time calculations by forbidding such actions. DateTime exists for very much the same reasons that a strict type-checking system exists in the first place: to make semantic errors in the code produce compile-time messages. that notify the programmers of errors in their code.
Conversely, there’s the cost of having DateTime: zilch.
Now consider dropping DateTime. What would we gain?
To answer your question directly: “isn't TimeSpan redundant?” Absolutely not, it reduces bugs. It definitely has, for me.
Think about it conceptually. If I tell you that I'm having a party 7 days from now, is "7 days" in that sentence a date. Could I just say my party is on 7 days? Of course not, because 7 days isn't a date. One of the key ideas of object oriented programming is to represent concepts like this in the system as types. It's true that we could represent everything as an integer (and in fact, many people have and do), but in object oriented programming, we have the notion of types of items, and their behaviors and properties, and in that sense, it makes sense to have an object that expresses this.
I think you could make the opposite argument that DateTime is redundant, and we should only have TimeSpan :)
Seriously, all dates really are just time spans. They are all relative to some starting point. Technically, there is no "year zero" in the Christian calendar (since you can't really have a "zeroth year of our lord"), but if we assign 12:00 A.M. January 1, 0001 B.C. as the "zero point", then every date that comes after (or before) can be thought of as relative to that date. So, 12:00 A.M. on September 19, 2009 would have a TimeSpan of 734033 days.
So, mathematically, DateTime and TimeSpan are redundant. But when we write code, we are attempting to communicate much more than just abstract mathematical constructs. Any given DateTime instance may in fact just be a time span relative to some arbitrary zero point, but to most people reading your code, it will imply a particular point on the calendar. Similarly, a TimeSpan implies the gap between two points on the calendar.
In this case, Microsoft has chosen to be clear rather than parsimonious. I can't say I disagree with the decision.
There are a lot of complications in dates, for example:
leap years
leap seconds
the 1582 change to the gregorian calendar
the fact that there is no such thing as 0 years
differences in the lengths of months
Treating Dates and TimeSpans as different things means that these kinds of issues are much less likely to confuse you in practise.
its sugar not more or less....
In a previous project, I noticed that the price field was being stored as an int, rather than as a float. This is done by multiplying the actual value by 100, the reason being was to avoid running into floating point problems.
Is this a good practice that I should follow or is it unnecessary and only makes the data less transparent?
Interesting question.
I wouldn't actually choose float in the mysql environment. Too many problems in the past with precision with that datatype.
To me, the choice would be between int and decimal(18,4).
I've seen real world examples integers used to represent floating point values. The internals of JD Edwards datatables all do this. Quantities are typically divided by 10000. While I'm sure it's faster and smaller in-table, it just means that we're always having to CAST the ints to a decimal value if we want to do anything with them, especially division.
From a programming perspective, I'd always prefer to work with decimal for price ( or money in RDBMSs that support it ).
Floating point errors could cause you problems if you are multiplying large numbers. In general, financial calculations should never be done with floating point numbers where possible.
I think Decimal is good for this use.
While it would save you float-related issues, having prices saved as integers might lead to a problem where you end up charging 100 times the price to a customer. It could also confuse other programmers.
I have seen both solution used successfully on medium-size ecommerce websites, but my preference goes to using floats.