I have a Couchbase database and I would like to store price without losing precision - double is really not good enough for my application. However, it seems that there is no support for currency data types in Couchbase.
Is there a preferred solution for this problem for this database engine?
I was thinking about storing each price twice, once as string and once as double, so that I can still query price for inequality. It's better than nothing but not really a nice solution.
This is really a problem with JSON, but since Couchbase uses pure JSON, it applies :)
One solution that I've seen is to store it as an integer.
For example, if you want to store a price of $129.99, you would store a number of 12999. This could be kinda annoying, but depending on what language/framework you're using, it could be relatively easy to customize your (de)serializer to handle this automatically. Or you could create a calculated property in your class (assuming you're using OOP). Or you could use AOP.
But in any case, your precision is stored. Your string solution would also work, with similar caveats.
The semantics of integers and doubles are quite different. Lua recently also added integer support (even though as a subtype of number). Python in a sense is perfect in its type completeness, I am not even talking about other more heavy weight languages like C++/C#/Java ...
There are systems that treats integers significantly different than floating point numbers (or doubles), and while using JSON, it imposes some mental damage that all the integers writing to the wire come back as doubles. On the high level application logic, one could probably differentiate based on the property, but the extra double to int cast makes the code not intuitive and misleading - people would ask questions like: why you do a cast here? Are you sure this is an integer?
So in a sense, the intent is not completely clear when there is no explicit integer support.
Can anyone shed some lights?
Thanks!
My source having different date formats as shown below, And im looking for an algorithm to identify the source date pattern tried in Pentaho Data integration with select value and Fuzzy steps.
Date Column (String)
"20150210"
"20050822--"
"2014-02-May"
"20051509--"
"02-May-2014"
"2013-May-12"
"12DEC2013"
"15050815"
"May-02-2014"
"12312015"
I know that in PDI we can achieve through JS step by writing If conditions for each pattern but is not a good idea and this approach makes transformation dead when dealing with huge records, looking out for efficient way to search date pattern.
I believe this is very common issue in all ETL projects, Here Im trying to understand how enterprise vendors like SAS Data Integration, Informatica, SSIS provides easy way to handle.
Do we have any Algorithm to identify source pattern. If so which one?
The formats that are listed above are not limited.
One cannot simply determine a "monovalent" value as the format for any given input.
Consider all of the following formats completely valid:
MM-dd-yy
dd-MM-yy
yy-MM-dd
As stated in a comment by #billinkc, what would you call 01-02-05 in that case?
If at all, your would be a solvable one only if you took a data set into account (e.g. you know that the next X rows are all from the same date format). Then you can look at it as a linear problem with some constraints that can help you determine the date format. Even then, you can't assure that you'll get a definite answer, just increase the probability that you'll have a definite answer.
In my web project using angular, node and mongodb with JSON, date is not natively supported by JSON serializer. There is a workaround for this problem, as shown here. However, I wonder what's the benefit saving the date as a date object instead of a string in MongoDB? I'm not that far with the project so I don't see the difference.
By saving your dates not as dates but as strings, you are missing out on some very useful features:
MongoDB can query date-ranges with $gt and $lt.
In version 3.0, the aggregation framework got many useful aggregation operators for date handling. None of those work on strings and few of them can be adequately substituted by string operators.
MongoDB dates are internally handled in UNIX epoch, so nasty details like saving timestamps from different timezones or daylight saving times are a non-issue.
A BSON Date is just 8 byte. A date in the minimal form of YYYYMMDD is 12 byte (strings in BSON are prefixed with a 4 byte integer for the length). When you store it as an ISODate string which uses all the ISO 8601 standard has to offer (date, time accurate to millisecond and timezone), you have 32 byte - four times the storage space.
You need to know if any of this matters for your project.
When you really want to avoid using the BSON Date type, you should consider to store your dates as a number representing the elapsed milliseconds/seconds/hours/days (whatever appropriate for your use-case) since a fixed point in time instead of a string. That way you retain the advantages of everything but point 2.
You should at least use ISO dates if you go for this approach. I would however argue that there are benefits in storing date values as date objects. Storing dates as date objects will allow you to add indices and should also help with date range queries. Saying this many developers seem to be happy to store dates as strings, see What is the best way to store dates in MongoDB?
Last night I was thining that programming languages can have a feature in which we should be able to constraints the values assigned to primitive data types.
For example I should be able to say my variable of type int can only have value between 0 and 100
int<0, 100> progress;
This would then act as a normal integer in all scenarios except the fact that you won't be able to specify values out of the range defined in constraint. The compiler will not compile the code progress=200.
This constraint can be carried over with type information.
Is this possible? Is it done in any programming language? If yes then which language has it and what is this technique called?
It is generally not possible. It makes little sense to use integers without any arithmetic operators. With arithmetic operators you have this:
int<0,100> x, u, v;
...
x = u + v; // is it in range?
If you're willing to do checks at run-time, then yes, several mainstream languages support it, starting with Pascal.
I believe Pascal (and Delphi) offers something similar with subrange types.
I think this is not possible at all in Java and in Ruby (well, in Ruby probably it is possible, but requires some effort). I have no idea about other languages, though.
Ada allows something like what you describe with ranges:
type My_Int is range 1..100;
So if you try assign a value to a My_Int that's less than 1 or greater than 100, Ada will raise the exception Constraint_Error.
Note that I've never used Ada. I've only read about this feature, so do your research before you plunge in.
It is certainly possible. There are many different techniques to do that, but 'dependent types' is the most popular.
The constraints can be even checked statically at compile time by compiler. See, for example, Agda2 and ATS (ats-lang.org).
Weaker forms of your 'range types' are possible without full dependent types, I think.
Some keywords to search for research papers:
- Guarded types
- Refinment types
- Subrange types
Certainly! In case you missed it: C. Do you C? You don't C? You don't count short as a constraint on Integer? Ok, so C only gives you pre-packaged constrained types.
BTW: It seems the answer that Pascal has subrange types misses the point of them. In Pascal array bounds violations are not possible. This is because the array index must of the same type as the array was declared with. In turn this means that to use an integer index you must coerce it down to the subrange, and that is where the run time check is done, not accessing the array.
This is a very important idea because it means a for loop over an array index type may access the array components safely without any run time checking.
Pascal has subranges. Ada extended that a bit, so you can do something like a subrange, or you can create an entirely new type with characteristics of the existing type, but not compatible with it (e.g., even if it was in the right range, you wouldn't be able to assign an Integer to your new type based off of Integer).
C++ doesn't support the idea directly, but is flexible enough that you can implement it if you want to. If you decide to support all the compound assignment operators (+=, -=, *=, etc.) this can be a lot of work though.
Other languages that support operator overloading (e.g., ML and company) can probably support it in much the same way as C++.
Also note that there are a few non-trivial decisions involved in the design. In particular, if the type is used in a way that could/does result in an intermediate result that overflows the specified range, but produces a final result that's within the specified range, what do you want to happen? Depending on your situation, that might be an error, or it might be entirely acceptable, and you'll have to decide which.
I really doubt that you can do that. Afterall these are primitive datatypes, with emphasis on primitive!
adding a constraint will make the type a subclass of its primitive state, thus extending it.
from wikipedia:
a basic type is a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types.
a built-in type is a data type for which the programming language provides built-in support.
So personally, even if it is possible i wouldnt do, since its a bad practice. Instead just create an object that returns this type and the constraints (which i am sure you thought of this solution).
SQL has domains, which consist of a base type together with a dynamically-checked constraint. For example, you might define the domain telephone_number as a character string with an appropriate number of digits, valid area code, etc.