I already know how to calculate x^n, where x is a floating point number and n is an integer. However, I want to implement a price curve, using the formula:
y = 0.5 * (x^1.5)
To do this, I need to be able to do exponentiation where the power is a floating-point (or fixed-point) value. How can I do this in Solidity?
Solidity does not offer built-in math functions, therefore you must use a library.
Paul Berg's PRB-math library provides fixed-point mathematics operations for Solidity.
Related
What is precision of MYSQL RAND() function?
I can't find it on the official page: MYSQL RAND() function is told to return floating-point number, unfortunately it's precision is not stated in a clear way. It can be a single-precision floating-point data, or double-precision, or any other kind of data.
What I would like to know exactly is - what is the maximum integer range [0,N] in which I can generate random integer numbers with FLOOR(RAND()*N) such that there won't be any "skips" and any number from 0 to N can be generated?
Another thing which I would like to know:
How to generate numbers, which are bigger than N in MySQL?
As written in the MySQL docs the precision is system dependent. So there is not the one answer to your question.
https://dev.mysql.com/doc/internals/en/floating-point-types.html
Since MySQL uses the machine-dependent binary representation of float and double to store values in the database, we have to care about these. Today, most systems use the IEEE standard 754 for binary floating-point arithmetic. It describes a representation for single precision numbers as 1 bit for sign, 8 bits for biased exponent and 23 bits for fraction and for double precision numbers as 1-bit sign, 11-bit biased exponent and 52-bit fraction. However, we can not rely on the fact that every system uses this representation. Luckily, the ISO C standard requires the standard C library to have a header float.h that describes some details of the floating point representation on a machine. The comment above describes the value DBL_DIG. There is an equivalent value FLT_DIG for the C data type float.
At the end I have no clue why the precision of a random number is important in any case. I cannot see any use case
I am new in solidity language. Have seen their documentation. Is there any floating point data type ?
There is no floating point in Solidity. You should keep numbers in whole number format.
You can place decimal place in your front-end code. Take a look at how ERC20 contract was designed.
Good read: https://medium.com/#jgm.orinoco/understanding-erc-20-token-contracts-a809a7310aa5
There is no native support for floating-point numbers in the core language, but they are available via libraries, such as ABDKMathQuad.
You could try Paul Razvan Berg's PRBMath
You can have floating point constants in Solidity, but only with a unit designation gwei (which multiplies by 10^9, or ether (which multiplies by 10^18). For example, if you write 0.1 ether in your Solidity code, it is turned into 100000000000000000 (= 10^17 wei). However, after multiplication by the fixed point multiplier in the unit suffix, there can't be any fractional decimals left. For example you can't specify 1.00000000000000000000000001 ether, because that would leave you with a fractional number of wei.
More generally, what you are probably asking about is how to do fixed point mathematical calculations in Solidity, since floating point is not supported. There are many tutorials online explaining how to calculate fractional values using fixed point (just search for "fixed point arithmetic tutorial", for a wide range of languages (the principles apply directly to Solidity too). You can generally implement fixed point with very little code (addition and subtraction work without any special care, as long as both operands have already been multiplied by the same fixed point multiplier, whereas for multiplying two fixed point numbers, you just divide by the fixed point multiplier after multiplying two fixed point numbers that have already each been multiplied by the same fixed point multiplier). Or you can use one of the math libraries linked in other answers.
I am writing a program for an embedded hardware that only supports 32-bit single-precision floating-point arithmetic. The algorithm I am implementing, however, requires a 64-bit double-precision addition and comparison. I am trying to emulate double datatype using a tuple of two floats. So a double d will be emulated as a struct containing the tuple: (float d.hi, float d.low).
The comparison should be straightforward using a lexicographic ordering. The addition however is a bit tricky because I am not sure which base should I use. Should it be FLT_MAX? And how can I detect a carry?
How can this be done?
Edit (Clarity): I need the extra significant digits rather than the extra range.
double-float is a technique that uses pairs of single-precision numbers to achieve almost twice the precision of single precision arithmetic accompanied by a slight reduction of the single precision exponent range (due to intermediate underflow and overflow at the far ends of the range). The basic algorithms were developed by T.J. Dekker and William Kahan in the 1970s. Below I list two fairly recent papers that show how these techniques can be adapted to GPUs, however much of the material covered in these papers is applicable independent of platform so should be useful for the task at hand.
https://hal.archives-ouvertes.fr/hal-00021443
Guillaume Da Graça, David Defour
Implementation of float-float operators on graphics hardware,
7th conference on Real Numbers and Computers, RNC7.
http://andrewthall.org/papers/df64_qf128.pdf
Andrew Thall
Extended-Precision Floating-Point Numbers for GPU Computation.
This is not going to be simple.
A float (IEEE 754 single-precision) has 1 sign bit, 8 exponent bits, and 23 bits of mantissa (well, effectively 24).
A double (IEEE 754 double-precision) has 1 sign bit, 11 exponent bits, and 52 bits of mantissa (effectively 53).
You can use the sign bit and 8 exponent bits from one of your floats, but how are you going to get 3 more exponent bits and 29 bits of mantissa out of the other?
Maybe somebody else can come up with something clever, but my answer is "this is impossible". (Or at least, "no easier than using a 64-bit struct and implementing your own operations")
It depends a bit on what types of operations you want to perform. If you only care about additions and subtractions, Kahan Summation can be a great solution.
If you need both the precision and a wide range, you'll be needing a software implementation of double precision floating point, such as SoftFloat.
(For addition, the basic principle is to break the representation (e.g. 64 bits) of each value into its three consitituent parts - sign, exponent and mantissa; then shift the mantissa of one part based on the difference in the exponents, add to or subtract from the mantissa of the other part based on the sign bits, and possibly renormalise the result by shifting the mantissa and adjusting the exponent correspondingly. Along the way, there are a lot of fiddly details to account for, in order to avoid unnecessary loss of accuracy, and deal with special values such as infinities, NaNs, and denormalised numbers.)
Given all the constraints for high precision over 23 magnitudes, I think the most fruitful method would be to implement a custom arithmetic package.
A quick survey shows Briggs' doubledouble C++ library should address your needs and then some. See this.[*] The default implementation is based on double to achieve 30 significant figure computation, but it is readily rewritten to use float to achieve 13 or 14 significant figures. That may be enough for your requirements if care is taken to segregate addition operations with similar magnitude values, only adding extremes together in the last operations.
Beware though, the comments mention messing around with the x87 control register. I didn't check into the details, but that might make the code too non-portable for your use.
[*] The C++ source is linked by that article, but only the gzipped tar was not a dead link.
This is similar to the double-double arithmetic used by many compilers for long double on some machines that have only hardware double calculation support. It's also used as float-float on older NVIDIA GPUs where there's no double support. See Emulating FP64 with 2 FP32 on a GPU. This way the calculation will be much faster than a software floating-point library.
However in most microcontrollers there's no hardware support for floats so they're implemented purely in software. Because of that, using float-float may not increase performance and introduce some memory overhead to save the extra bytes of exponent.
If you really need the longer mantissa, try using a custom floating-point library. You can choose whatever is enough for you, for example change the library to adapt a new 48-bit float type of your own if only 40 bits of mantissa and 7 bits of exponent is needed. No need to spend time for calculating/storing the unnecessary 16 bits anymore. But this library should be very efficient because compiler's libraries often have assembly level optimization for their own type of float.
Another software-based solution that might be of use: GNU MPFR
It takes care of many other special cases and allows arbitrary precision (better than 64-bit double) that you would have to otherwise take care of yourself.
That's not practical. If it was, every embedded 32-bit processor (or compiler) would emulate double precision by doing that. As it stands, none do it that I am aware of. Most of them just substitute float for double.
If you need the precision and not the dynamic range, your best bet would be to use fixed point. IF the compiler supports 64-bit this will be easier too.
How to compute the maximum of a smooth function defined on [a,b] in Fortran ?
For simplicity, a polynomial function.
The background is that almost all numerical flux(a concept in numerical PDE) involves computing the maximum of certain function over an interval [a,b].
For a 1-D problem with smooth and readily-computed derivatives, use Newton-Raphson to find zeros of the first derivative.
For multiple dimensions, and readily-computed derivatives, you're better off using a method that approximates the Hessian. There are several methods of this type, but I've found the L-BFGS method to be reliable and efficient. There a convenient, BSD-licensed package provided by a group at Northwestern University. There's also quite a bit of well-tested code at http://www.netlib.org/
I have a problem with the following statement
trace(10.12+13.75)
//output 23.869999999999997
Can anybody explain me why is this so and how to get exact 23.87 out of this?
Thanks
Computer floating point numbers are not perfectly accurate. JavaScript (and I assume ActionScript, as it's a variant) uses 64-bit IEEE 754 values (ECMAScript spec ref). The best example of this imprecision is probably 0.1 + 0.2, which comes out to 0.30000000000000004. To get 23.87, you'll have to round.
If you're doing financial math, you may (or may not) be better off using a library that does decimal math rather than IEEE floating point (something akin to Java's BigDecimal class or C#'s decimal type). But note that decimal types have their own limitations, such as not being able to represent 1 / 3 accurately.
That happens because of the precision of IEEE format.
Simplest would be to use toFixed.
var num:Number = 10.12+13.75;
var numStr:String = num.toFixed(2);
var num2:Number = new Number(numStr);