negative numbers floating point subtraction circuit - floating

probably wrong place to ask but I will try.
I have to design a circuit that would add/subtract floating point
I tried to do it using signed magnitude numbers in IEE 754 standard.
They are quite large so I decided to start with something smaller just to prove the concept.
I found a few algorithms on the net for performing addition and substraction of positive numbers.
Most look like this:
http://meseec.ce.rit.edu/eecc250-winter99/250-1-27-2000.pdf
. They do not explain what happens with the sign bit.
Now I'm very confused. According to what I've found on the net there is no difference in performing:
A-B and A- (-B)
could someone help me with a link where the algorithm is explained in detail?
thanks for all answers
I've found this algebraic explanation useful http://howardhuang.us/teaching/cs231/08-Subtraction.pdf Currently my circuit performs A+B (disregarding sign bit) and A-B just like kfmfe04 wrote. I'm XORing B's input and adding 1 so I getting the result in 2C. The second pdf suggests including the sign bit in add/sub operation. I will try this in the morning. Having spent so many hours exercising my brain I feel a bit tired and can't think straight. Now I just wonder if I should change my circuit so that: The toggle add/sub button still XORs the B [a+(-b)] but also before this part I XORs the mantissas' with their sign to convert them into 2c. This way I could cover the case of negative numbers subtraction (-A)-(-B). Sounds to complicated though.

The basic principle is that, internally, you have an addition and a subtraction part. Both routines only work on positive numbers.
Below, ADD and SUB are used to denote the internal routines. You have to divide the operation into different cases, depending on if the operands are positive or negative:
For addition:
pos1 + pos2 => pos1 ADD pos1
pos1 + neg2 => pos1 SUB abs(neg2)
neg1 + pos2 => pos2 SUB abs(neg1)
neg1 + neg2 => - (abs(neg1) ADD abs(neg2))
For subtraction:
pos1 - pos2 => pos1 SUB pos2
pos1 - neg2 => pos1 ADD abs(neg2)
neg1 - pos2 => - (abs(neg1) ADD pos2)
neg1 - neg2 => abs(neg2) SUB abs(neg1)
Of course, you could also simply define "A - B" as "A + -B" and only implement the cases for addition.

Related

How can one implement | and & (bitor,bitand) using +-*/%?

How to implement bitOR and bitAND operation (on two variable-sized ints, but at least as small as 8 bits) using just basic arithmetics? I don't care about execution speed, the most important is the simplicity and size of code. I've managed to get negation, xor and shifts implemented.
If you already have xor, then I propose the following:
bitAnd(a,b) = ((a+b) - bitXor(a,b)) / 2
where / 2 is (truncated) integer quotient of division by 2 (or bit shift 1 place to the right).
Beware, integers must be sufficiently wide to not overflow!
If we loose highest bit in (a+b) operation, then the operation will not work.
Then you can reconstruct bitOr with one of:
bitOr(a,b) = bitXor(a,b) + bitAnd(a,b)
bitOr(a,b) = (a+b) - bitAnd(a,b)

How to Solve non-specific non-linear equations?

I am attempting to fit a circle to some data. This requires numerically solving a set of three non-linear simultaneous equations (see the Full Least Squares Method of this document).
To me it seems that the NEWTON function provided by IDL is fit for solving this problem. NEWTON requires the name of a function that will compute the values of the equation system for particular values of the independent variables:
FUNCTION newtfunction,X
RETURN, [Some function of X, Some other function of X]
END
While this works fine, it requires that all parameters of the equation system (in this case the set of data points) is hard coded in the newtfunction. This is fine if there is only one data set to solve for, however I have many thousands of data sets, and defining a new function for each by hand is not an option.
Is there a way around this? Is it possible to define functions programmatically in IDL, or even just pass in the data set in some other manner?
I am not an expert on this matter, but if I were to solve this problem I would do the following. Instead of solving a system of 3 non-linear equations to find the three unknowns (i.e. xc, yc and r), I would use an optimization routine to converge to a solution by starting with an initial guess. For this steepest descent, conjugate gradient, or any other multivariate optimization method can be used.
I just quickly derived the least square equation for your problem as (please check before use):
F = (sum_{i=1}^{N} (xc^2 - 2 xi xc + xi^2 + yc^2 - 2 yi yc + yi^2 - r^2)^2)
Calculating the gradient for this function is fairly easy, since it is just a summation, and therefore writing a steepest descent code would be trivial, to calculate xc, yc and r.
I hope it helps.
It's usual to use a COMMON block in these types of functions to pass in other parameters, cached values, etc. that are not part of the calling signature of the numeric routine.

Compute real roots of a quadratic equation in Pascal

I am trying to solve this problem :
(Write a program to compute the real roots of a quadratic equation (ax2 + bx + c = 0). The roots can be calculated using the following formulae:
x1 = (-b + sqrt(b2 - 4ac))/2a
and
x2 = (-b - sqrt(b2 - 4ac))/2a
I wrote the following code, but its not correct:
program week7_lab2_a1;
var a,b,c,i:integer;
x,x1,x2:real;
begin
write('Enter the value of a :');
readln(a);
write('Enter the value of b :');
readln(b);
write('Enter the value of c :');
readln(c);
if (sqr(b)-4*a*c)>=0 then
begin
if ((a>0) and (b>0)) then
begin
x1:=(-1*b+sqrt(sqr(b)-4*a*c))/2*a;
x2:=(-1*b-sqrt(sqr(b)-4*a*c))/2*a;
writeln('x1=',x1:0:2);
writeln('x2=',x2:0:2);
end
else
if ((a=0) and (b=0)) then
write('The is no solution')
else
if ((a=0) and (b<>0)) then
begin
x:=-1*c/b;
write('The only root :',x:0:2);
end;
end
else
if (sqr(b)-4*a*c)<0 then
write('The is no real root');
readln;
end.
do you know why?
and taking a=-6,b=7,c=8 .. can you desk-check it after writing the pesudocode?
You have an operator precedence error here:
x1:=(-1*b+sqrt(sqr(b)-4*a*c))/2*a;
x2:=(-1*b-sqrt(sqr(b)-4*a*c))/2*a;
See at the end, the 2 * a doesn't do what you think it does. It does divide the expression by 2, but then multiplies it by a, because of precedence rules. This is what you want:
x1:=(-1*b+sqrt(sqr(b)-4*a*c))/(2*a);
x2:=(-1*b-sqrt(sqr(b)-4*a*c))/(2*a);
In fact, this is because the expression is evaluated left-to-right wrt brackets and that multiplication and division have the same priority. So basically, once it's divided by 2, it says "I'm done with division, I will multiply what I have now with a as told".
As it doesn't really seem clear from the formula you were given, this is the quadratic formula:
As you can see you need to divide by 2a, so you must use brackets here to make it work properly, just as the correct text-only expression for this equation is x = (-b +- sqrt(b^2 - 4ac)) / (2a).
Otherwise the code looks fine, if somewhat convoluted (for instance, you could discard cases where (a = 0) and (b = 0) right after input, which would simplify the logic a bit later on). Did you really mean to exclude negative coefficients though, or just zero coefficients? You should check that.
Also be careful with floating-point equality comparison - it works fine with 0, but will usually not work with most constants, so use an epsilon instead if you need to check if one value is equal to another (like such: abs(a - b) < 1e-6)
Completely agree with what Thomas said in his answer. Just want to add some optimization marks:
You check the discriminant value in if-statement, and then use it again:
if (sqr(b)-4*a*c)>=0 then
...
x1:=(-1*b+sqrt(sqr(b)-4*a*c))/2*a;
x2:=(-1*b-sqrt(sqr(b)-4*a*c))/2*a;
This not quite efficient - instead of evaluating discriminant value at once you compute it multiple times. You should first compute discriminant value and store it into some variable:
D := sqr(b)-4*a*c;
and after that you can use your evaluated value in all expressions, like this:
if (D >= 0) then
...
x1:=(-b+sqrt(D)/(2*a);
x2:=(-b-sqrt(D)/(2*a);
and so on.
Also, I wouldn't write -1*b... Instead of this just use -b or 0-b in worst case, but not multiplication. Multiplication here is not needed.
EDIT:
One more note:
Your code:
if (sqr(b)-4*a*c)>=0 then
begin
...
end
else
if (sqr(b)-4*a*c)<0 then
write('The is no real root');
You here double check the if-condition. I simplify this:
if (a) then
begin ... end
else
if (not a)
...
Where you check for not a (in your code it corresponds to (sqr(b)-4*a*c)<0) - in this case condition can be only false (for a) and there is no need to double check it. You should just throw it out.

Function types declarations in Mathematica

I have bumped into this problem several times on the type of input data declarations mathematica understands for functions.
It Seems Mathematica understands the following types declarations:
_Integer,
_List,
_?MatrixQ,
_?VectorQ
However: _Real,_Complex declarations for instance cause the function sometimes not to compute. Any idea why?
What's the general rule here?
When you do something like f[x_]:=Sin[x], what you are doing is defining a pattern replacement rule. If you instead say f[x_smth]:=5 (if you try both, do Clear[f] before the second example), you are really saying "wherever you see f[x], check if the head of x is smth and, if it is, replace by 5". Try, for instance,
Clear[f]
f[x_smth]:=5
f[5]
f[smth[5]]
So, to answer your question, the rule is that in f[x_hd]:=1;, hd can be anything and is matched to the head of x.
One can also have more complicated definitions, such as f[x_] := Sin[x] /; x > 12, which will match if x>12 (of course this can be made arbitrarily complicated).
Edit: I forgot about the Real part. You can certainly define Clear[f];f[x_Real]=Sin[x] and it works for eg f[12.]. But you have to keep in mind that, while Head[12.] is Real, Head[12] is Integer, so that your definition won't match.
Just a quick note since no one else has mentioned it. You can pattern match for multiple Heads - and this is quicker than using the conditional matching of ? or /;.
f[x:(_Integer|_Real)] := True (* function definition goes here *)
For simple functions acting on Real or Integer arguments, it runs in about 75% of the time as the similar definition
g[x_] /; Element[x, Reals] := True (* function definition goes here *)
(which as WReach pointed out, runs in 75% of the time
as g[x_?(Element[#, Reals]&)] := True).
The advantage of the latter form is that it works with Symbolic constants such as Pi - although if you want a purely numeric function, this can be fixed in the former form with the use of N.
The most likely problem is the input your using to test the the functions. For instance,
f[x_Complex]:= Conjugate[x]
f[x + I y]
f[3 + I 4]
returns
f[x + I y]
3 - I 4
The reason the second one works while the first one doesn't is revealed when looking at their FullForms
x + I y // FullForm == Plus[x, Times[ Complex[0,1], y]]
3 + I 4 // FullForm == Complex[3,4]
Internally, Mathematica transforms 3 + I 4 into a Complex object because each of the terms is numeric, but x + I y does not get the same treatment as x and y are Symbols. Similarly, if we define
g[x_Real] := -x
and using them
g[ 5 ] == g[ 5 ]
g[ 5. ] == -5.
The key here is that 5 is an Integer which is not recognized as a subset of Real, but by adding the decimal point it becomes Real.
As acl pointed out, the pattern _Something means match to anything with Head === Something, and both the _Real and _Complex cases are very restrictive in what is given those Heads.

Repeated application of functions

Reading this question got me thinking: For a given function f, how can we know that a loop of this form:
while (x > 2)
x = f(x)
will stop for any value x? Is there some simple criterion?
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
Specifically, can we prove this for sqrt and for log?
For these functions, a proof that ceil(f(x))<x for x > 2 would suffice. You could do one iteration -- to arrive at an integer number, and then proceed by simple induction.
For the general case, probably the best idea is to use well-founded induction to prove this property. However, as Moron pointed out in the comments, this could be impossible in the general case and the right ordering is, in many cases, quite hard to find.
Edit, in reply to Amnon's comment:
If you wanted to use well-founded induction, you would have to define another strict order, that would be well-founded. In case of the functions you mentioned this is not hard: you can take x << y if and only if ceil(x) < ceil(y), where << is a symbol for this new order. This order is of course well-founded on numbers greater then 2, and both sqrt and log are decreasing with respect to it -- so you can apply well-founded induction.
Of course, in general case such an order is much more difficult to find. This is also related, in some way, to total correctness assertions in Hoare logic, where you need to guarantee similar obligations on each loop construct.
There's a general theorem for when then sequence of iterations will converge. (A convergent sequence may not stop in a finite number of steps, but it is getting closer to a target. You can get as close to the target as you like by going far enough out in the sequence.)
The sequence x, f(x), f(f(x)), ... will converge if f is a contraction mapping. That is, there exists a positive constant k < 1 such that for all x and y, |f(x) - f(y)| <= k |x-y|.
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
If we're talking about floats here, that's not true. If for all x > n f(x) is strictly less than x, it will reach n at some point (because there's only a limited number of floating point values between any two numbers).
Of course this means you need to prove that f(x) is actually less than x using floating point arithmetic (i.e. proving it is less than x mathematically does not suffice, because then f(x) = x may still be true with floats when the difference is not enough).
There is no general algorithm to determine whether a function f and a variable x will end or not in that loop. The Halting problem is reducible to that problem.
For sqrt and log, we could safely do that because we happen to know the mathematical properties of those functions. Say, sqrt approaches 1, log eventually goes negative. So the condition x < 2 has to be false at some point.
Hope that helps.
In the general case, all that can be said is that the loop will terminate when it encounters xi≤2. That doesn't mean that the sequence will converge, nor does it even mean that it is bounded below 2. It only means that the sequence contains a value that is not greater than 2.
That said, any sequence containing a subsequence that converges to a value strictly less than two will (eventually) halt. That is the case for the sequence xi+1 = sqrt(xi), since x converges to 1. In the case of yi+1 = log(yi), it will contain a value less than 2 before becoming undefined for elements of R (though it is well defined on the extended complex plane, C*, but I don't think it will, in general converge except at any stable points that may exist (i.e. where z = log(z)). Ultimately what this means is that you need to perform some upfront analysis on the sequence to better understand its behavior.
The standard test for convergence of a sequence xi to a point z is that give ε > 0, there is an n such that for all i > n, |xi - z| < ε.
As an aside, consider the Mandelbrot Set, M. The test for a particular point c in C for an element in M is whether the sequence zi+1 = zi2 + c is unbounded, which occurs whenever there is a |zi| > 2. Some elements of M may converge (such as 0), but many do not (such as -1).
Sure. For all positive numbers x, the following inequality holds:
log(x) <= x - 1
(this is a pretty basic result from real analysis; it suffices to observe that the second derivative of log is always negative for all positive x, so the function is concave down, and that x-1 is tangent to the function at x = 1). From this it follows essentially immediately that your while loop must terminate within the first ceil(x) - 2 steps -- though in actuality it terminates much, much faster than that.
A similar argument will establish your result for f(x) = sqrt(x); specifically, you can use the fact that:
sqrt(x) <= x/(2 sqrt(2)) + 1/sqrt(2)
for all positive x.
If you're asking whether this result holds for actual programs, instead of mathematically, the answer is a little bit more nuanced, but not much. Basically, many languages don't actually have hard accuracy requirements for the log function, so if your particular language implementation had an absolutely terrible math library this property might fail to hold. That said, it would need to be a really, really terrible library; this property will hold for any reasonable implementation of log.
I suggest reading this wikipedia entry which provides useful pointers. Without additional knowledge about f, nothing can be said.