Pascal memory issues - freepascal

I have a problem that requires searching and saving some values that prevent it from doing an infinite loop. Every possible state of this problem is expressed as an unique 8 digit code with base 6 (all digits are 0-5). When the program evaluates this position, I want a boolean to be set as true so as not to evaluate this position again. However an array 1..55555555 is too large in memory and if i convert the 8 digit code to decimal it takes too much time. Also not all combinations are possible in the problem; 11 11 11 11, 11 11 55 12 and others are not valid and i need not use extra memory. So, is there a way to store as value "true" a block of memory with adress e.g 23451211 and when i call the evaluating process check if 23451211 is true or unassigned;

6 to the power 8 = 1679616.
To mark used or not you need one bit, thus you can do with about 209952 bytes.
In recent Free Pascal's, bitpacked structures are done as follows
var
arr : bitpacked array [0..6*6*6*6*6*6*6*6-1] of boolean;
and arr[x] will give true or false.
The conversion time from base 6 to binary (not decimal!) will probably be shorter than trying to use large swaths of memory. (((digit8)*6+digit7)*6+digit6)*6 etc
p.s. FPC does have an exponent operator, but not for constants, so that's why 6^8 is written like that.

Related

LC-3 algorithm for converting ASCII strings to Binary Values

Figure 10.4 provides an algorithm for converting ASCII strings to binary values. Suppose the decimal number is arbitrarily long. Rather than store a table of 10 values for the thousands-place digit, another table for the 10 ten-thousands-place digit, and so on, design an algorithm to do the conversion without resorting to any tables whatsoever.
I have attached pictures of figure 10.4. I am not looking for an answer to the problem, but rather can someone please explain this problem and perhaps give some direction on how to go about creating the algorithm?
Figure 10.4
Figure 10.4 second image
I am unsure as to what it means by tables and do not know where to start really.
The tables are those global, initialized arrays: one called Lookup10 holding 10, 20, 30, 40, ..., and another called Lookup100 holding 100, 200, 300, 400...
You can ignore the tables: as per the assignment instructions you're supposed to find a different way to accomplish this anyway.  Or, you can run that code in simulator or mentally to understand how it works.
The bottom line is that LC-3, while it can do anything (it is turning complete), it can't do much in any one instruction.  For arithmetic & logic, it can do add, not, and.  That's pretty much it!  But that's enough — let's note that modern hardware does everything with only one logic gate, namely NAND, which is a binary operator (so NAND directly available; NOT by providing NAND with the same operand for both inputs; AND by doing NOT after NAND; OR using NOT on both inputs first and then NAND; etc..)
For example, LC-3 cannot multiply or divide or modulus or right shift directly — each of those operations is many instructions and in the general case, some looping construct.  Multiplication can be done by repetitive addition, and division/modulus by repetitive subtraction.  These are super inefficient for larger operands, and there are much more efficient algorithms that are also substantially more complex, so those greatly increase program complexity beyond that already with the repetitive operation approach.
That subroutine goes backwards through the use input string.  It takes a string length count in R1 as parameter supplied by caller (not shown).  It looks at the last character in the input and converts it from an ASCII character to a binary number.
(We would commonly do that conversion from ascii character to numeric value using subtraction: moving the character values from the ascii character range of 0x30..0x39 to numeric values in the range 0..9, but they do it with masking, which also works.  The subtraction approach integrates better with error detection (checking if not a valid digit character, which is not done here), whereas the masking approach is simpler for LC-3.)
The subroutine then obtains the 2nd last digit (moving backwards through the user's input string), converting that to binary using the mask approach.  That yields a number between 0 and 9, which is used as an index into the first table Lookup10.  The value obtained from the table at that index position is basically the index × 10.  So this table is a × 10 table.  The same approach is used for the third (and first or, last-going-backwards) digit, except it uses the 2nd table which is a × 100 table.
The standard approach for string to binary is called atoi (search it) standing for ascii to integer.  It moves forwards through the string, and for every new digit, it multiples the existing value, computed so far, by 10 before adding in the next digit's numeric value.
So, if the string is 456, the first it obtains 4, then because there is another digit, 4 × 10 = 40, then + 5 for 45, then × 10 for 450, then + 6 for 456, and so on.
The advantage of this approach is that it can handle any number of digits (up to overflow).  The disadvantage, of course, is that it requires multiplication, which is a complication for LC-3.
Multiplication where one operand is the constant 10 is fairly easy even in LC-3's limited capabilities, and can be done with simple addition without looping.  Basically:
n × 10 = n + n + n + n + n + n + n + n + n + n
and LC-3 can do those 9 additions in just 9 instructions.  Still, we can also observe that:
n × 10 = n × 8 + n × 2
and also that:
n × 10 = (n × 4 + n) × 2     (which is n × 5 × 2)
which can be done in just 4 instructions on LC-3 (and none of these needs looping)!
So, if you want to do this approach, you'll have to figure out how to go forwards through the string instead of backwards as the given table version does, and, how to multiply by 10 (use any one of the above suggestions).
There are other approaches as well if you study atoi.  You could keep the backwards approach, but now will have to multiply by 10, by 100, by 1000, a different factor for each successive digit .  That might be done by repetitive addition.  Or a count of how many times to multiply by 10 — e.g. n × 1000 = n × 10 × 10 × 10.

Need some assistance understanding binary addition/subtraction using 2's complement

If A = 01110011, B = 10010100, how would I add these?
I did this:
i.e: 01110011 + 10010100 = 100000111
Though, isn't it essentially 115 + (-108) = 7, whereas, I'm getting -249
Edit: I see that removing the highest order bit (overflow) I get 7 which is what I'm looking for but I'm not getting why you wouldn't have the extra bit.
Edit**: Ok, I figured it out. There was no overflow as I had assumed there was because 7 is within [-128, 127] (8-bits). Instead, like Omar hinted at I was supposed to drop the "extra" 1 from addition.
Your calculation is correct and the result is correct.
You stated that the second number is -108, so both your numbers are interpreted as signed 8-bit values. Thus, you should also interpret your result as an 8-bit signed value, this is why the 9th bit must be dropped, and so the result is 7 (00000111).
On a real hardware, like an 8-bit CPU for example, as all the registers are 8-bit wide, you are only be able to store the lowest 8-bit of the result, which here is 7 (00000111).
In some cases, the 9th bit may also be put inside a carry/overflow flag so it's not completely "dropped".

Why is MS Access returning some results in scientific notation?

I have two fields, both have the size set to double in the table properties. When I subtract one field from the other some of the results are displayed as scientific notation when I click in the cell and others just show regular standard format to decimal places.
The data in both fields was updated with Round([Field01],2) and Round([Filed2],2) so the numbers in the fields should not be any longer than 2 decimal places.
Here's an example:
Field1 = 7.01
Field2 = 7.00
But when I subtract Field1 from Field2 the access display shows 0.01 but when I click on the result it displays, -9.99999999999979E-03. So of course, when I try to filter on all results that have 0.01 the query comes back empty because it thinks the result is -9.99999999999979E-03.
Even stranger is if Field1 = 1.02 and Field2 = 1.00, the result is 0.02 and when I click on the result the display still shows 0.02 and I can filter on all results that equal 0.02.
Why would MS Access treat numbers in the same query differently? Why is it displaying in Scientific Notation and not filtering?
Thanks for any support.
Take this simple code in Access (or even Excel) and run it!
Public Sub TestAdd()
Dim MyNumber As Single
Dim I As Integer
For I = 1 To 10
MyNumber = MyNumber + 1.01
Debug.Print MyNumber
Next I
End Sub
Here is the output of the above:
1.01
2.02
3.03
4.04
5.05
6.06
7.070001
8.080001
9.090001
10.1
You can see that after just 7 additions rounding is occurring!
Note how after JUST 7 simple little additions Access is now spitting out wrong numbers and has rounding errors!
More amazing? The above code runs the SAME in Excel!
Ok, I am sure I have your attention now!
If I recall, the FIRST day and first class in computing science? Computers don't store exact numbers when using floating point numbers.
So, then how is it possible that the WHOLE business community using Excel, or Access, or in fact your desktop calculator not come crashing down?
You mean Access cannot add up 7 simple little numbers without having errors?
How can I even do payroll then?
The basic concept and ALL you need to know here is that computers store real (floating) numbers only as approximate.
And integer values are stored exact.
so, there are several approaches here, and in fact if you writing ANY business software that needs to work with money values? And not suffer rounding errors?
Then you better off to choose what we called some kind of "scaled" integer. Behind the scenes, the computer does NOT use floating numbers, but uses a integer value, and the also has a "decimal" position.
In fact, in a lot of older business BASIC languages, or others? We often had to do the scaling on our own. (so, we would choose a large integer format). In fact, this "scaling" feature still exists in Access!!! (and you see it in the format options).
So, two choices here. If you don't want "tiny" rounding errors, then use "currency" data type. This may, or may not be sufficient for you, since it only allows a max of 4 decimal places. But in most cases, it should suffice. And if you need "more" decimal places, then you can multiply the values by 1000, and then divide by 1000 when done the calculations.
however, try changing the column type to currency and that should work. (this type of data is how your desktop calculator also works - and thus you not see funny rounding errors as a result (in most cases).
but, the FIRST rule of the day? First computer course?
Computers do not store exact numbers for floating point numbers - they are approximations, and are subject to rounding errors. Now, if you really are using double for the table, then I don't think these rounding errors should show up - since you have "so many decimal places" available.
But, I would try using currency data type - it is a scaled integer, or so called packed decimal.
You can ALSO choose to use a packed decimal in Access, and it supports out to 28 digits, and you can set the "scale" (the decimal point location). However, since you can't declare a decimal type in VBA, then I would suggest that in the table (and in VBA code, use currency data types).
If you need more then 4 decimal points, then consider scaling the currency in your code, or perhaps at that point, you consider using a packed decimal type in the table, but values in VBA will have to use the "variant" type, and they will correctly take on the data column setting if used in code and assigned a value from the table(s) in question.
Needless to say, the first day you start dealing with computers, and that first day ANYTHING beyond being a "end user"? Well, this is your first lesson of the day!
"The data in both fields was updated with Round([Field01],2) and Round([Filed2],2) so the numbers in the fields should not be any longer than 2 decimal places." instead of rounding up(which i think is the reason for the scientific notation) you can use number field as data type , then under field size choose double, then under decimal places choose 2.

Output_precision(); Octave

Hi after calling this code (Octave) I get an answer with 7 digits of precision, I need only 6. It is worth mentioning that on different data-set the output is normal(with 6 digits);
output_precision(6);
Prev
output:
Prev =
0.1855318
0.2181108
0.1796457
I know this is a little late but I wanted to add an answer for anyone with the same question in the future.
According to the function reference for output_precision(), the argument passed to the function (in this case, 6) specifies the minimum number of significant figures, which only guarantees that future numeric output won't have less than that number of significant figures.
From what I've seen, if you use output_precision(new_val) before displaying an array (e.g., Prev in the question), then octave will round the element with the least digits before the decimal place to have new_val significant figures and then all other elements will be rounded to have the same number of digits after the decimal place as that initial rounded result. If you use a statement to output a single value instead of an array, then the output is just rounded to new_val significant figures. However, I don't know if this behavior is guaranteed .
Here's a short example of what I mean:
% array defined with values having 5 digits after decimal
F = [401.51670 313.70753 -88.55225 188.50067 280.21988 354.51821 54.51821 350];
output_precision(4)
F
output_precision(6)
F
Output:
F =
401.52 313.71 -88.55 188.50 280.22 354.52 54.52 350.00
F =
401.5167 313.7075 -88.5523 188.5007 280.2199 354.5182 54.5182 350.0000
It can be a little quirky if you try to round the values too much. When I used output_precision(3) and then output F, the numbers were actually rounded as if my system's default precision, 5, was still active. However, when I used elements with only 2 or 3 digits after the decimal to define another array, it displayed as expected with output_precision(3).
Check out Octave Forge if you ever need docs for octave features. It's not perfect but it's something. Hope this was helpful.

Compressing a binary matrix

We were asked to find a way to compress a square binary matrix as much as possible, and if possible, to add redundancy bits to check and maybe correct errors.
The redundancy thing is easy to implement in my opinion. The complicated part is compressing the matrix. I thought about using run-length after reshaping the matrix to a vector because there will be more zeros than ones, but I only achieved a 40bits compression (we are working on small sizes) although I thought it'd be better.
Also, after run-length an idea was Huffman coding the matrix, but a dictionary must be sent in order to recover the original information.
I'd like to know what would be the best way to compress a binary matrix?
After reading some comments, yes #Adam you're right, the 14x14 matrix should be compressed in 128bits, so if I only use the coordinates (rows&cols) for each non-zero element, still it would be 160bits (since there are twenty ones). I'm not looking for an exact solution but for a useful idea.
You can only talk about compressing something if you have a distribution and a representation. That's the issue of the dictionary you have to send along: you always need some sort of dictionary of protocol to uncompress something. It just so happens that things like .zip and .mpeg already have those dictionaries/codecs. Even something as simple as Huffman-encoding is an algorithm; on the other side of the communication channel (you can think of compression as communication), the other person already has a bit of code (the dictionary) to perform the Huffman decompression scheme.
Thus you cannot even begin to talk about compressing something without first thinking "what kinds of matrices do I expect to see?", "is the data truly random, or is there order?", and if so "how can I represent the matrices to take advantage of order in the data?".
You cannot compress some matrices without increasing the size of other objects (by at least 1 bit). This is bad news if all matrices are equally probable, and you care equally about them all.
Addenda:
The answer to use sparse matrix machinery is not necessarily the right answer. The matrix could for example be represented in python as [[(r+c)%2 for c in range (cols)] for r in range(rows)] (a checkerboard pattern), and a sparse matrix wouldn't compress it at all, but the Kolmogorov complexity of the matrix is the above program's length.
Well, I know every matrix will have the same number of ones, so this is kind of deterministic. The only think I don't know is where the 1's will be. Also, if I transmit the matrix with a dictionary and there are burst errors, maybe the dictionary gets affected so... wouldnt be the resulting information corrupted? That's why I was trying to use lossless data compression such as run-length, the decoder just doesnt need a dictionary. --original poster
How many 1s does the matrix have as a fraction of its size, and what is its size (NxN -- what is N)?
Furthermore, this is an incorrect assertion and should not be used as a reason to desire run-length encoding (which still requires a program); when you transmit data over a channel, you can always add error-correction to this data. "Data" is just a blob of bits. You can transmit both the data and any required dictionaries over the channel. The error-correcting machinery does not care at all what the bits you transmit are for.
Addendum 2:
There are (14*14) choose 20 possible arrangements, which I assume are randomly chosen. If this number was larger than 128^2 what you're trying to do would be impossible. Fortunately log_2((14*14) choose 20) ~= 90bits < 128bits so it's possible.
The simple solution of writing down 20 numbers like 32,2,67,175,52,...,168 won't work because log_2(14*14)*20 ~= 153bits > 128bits. This would be equivalent to run-length encoding. We want to do something like this but we are on a very strict budget and cannot afford to be "wasteful" with bits.
Because you care about each possibility equally, your "dictionary"/"program" will simulate a giant lookup table. Matlab's sparse matrix implementation may work but is not guaranteed to work and is thus not a correct solution.
If you can create a bijection between the number range [0,2^128) and subsets of size 20, you're good to go. This corresponds to enumerating ways to descend the pyramid in http://en.wikipedia.org/wiki/Binomial_coefficient to the 20th element of row 196. This is the same as enumerating all "k-combinations". See http://en.wikipedia.org/wiki/Combination#Enumerating_k-combinations
Fortunately I know that Mathematica and Sage and other CAS software can apparently generate the "5th" or "12th" or arbitrarily numbered k-subset. Looking through their documentation, we come upon a function called "rank", e.g. http://www.sagemath.org/doc/reference/sage/combinat/subset.html
So then we do some more searching, and come across some arcane Fortran code like http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_rank.m and http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_unrank.m
We could reverse-engineer it, but it's kind of dense. But now we have enough information to search for k-subset rank unrank, which leads us to http://www.site.uottawa.ca/~lucia/courses/5165-09/GenCombObj.pdf -- see the section
"Generating k-subsets (of an n-set): Lexicographical
Ordering" and the rank and unrank algorithms on the next few pages.
In order to achieve the exact theoretically optimal compression, in the case of a uniformly random distribution of 1s, we must thus use this technique to biject our matrices to our output number of range <2^128. It just so happens that combinations have a natural ordering, known as ranking and unranking of combinations. You assign a number to each combination (ranking), and if you know the number you automatically know the combination (unranking). Googling k-subset rank unrank will probably yield other algorithms.
Thus your solution would look like this:
serialize the matrix into a list
e.g. [[0,0,1][0,1,1][1,0,0]] -> [0,0,1,0,1,1,1,0,0]
take the indices of the 1s:
e.g. [0,0,1,0,1,1,1,0,0] -> [3,5,6,7]
1 2 3 4 5 6 7 8 9 a k=4-subset of an n=9 set
take the rank
e.g. compressed = rank([3,5,6,7], n=9)
compressed==412 (or something, I made that up)
you're done!
e.g. 412 -binary-> 110011100 (at most n=9bits, less than 2^n=2^9=512)
to uncompress, unrank it
I'll get to 128 bits in a sec, first here's how you fit a 14x14 boolean matrix with exactly 20 nonzeros into 136 bits. It's based on the CSC sparse matrix format.
You have an array c with 14 4-bit counters that tell you how many nonzeros are in each column.
You have another array r with 20 4-bit row indices.
56 bits (c) + 80 bits (r) = 136 bits.
Let's squeeze 8 bits out of c:
Instead of 4-bit counters, use 2-bit. c is now 2*14 = 28 bits, but can't support more than 3 nonzeros per column. This leaves us with 128-80-28 = 20 bits. Use that space for array a4c with 5 4-bit elements that "add 4 to an element of c" specified by the 4-bit element. So, if a4c={2,2,10,15, 15} that means c[2] += 4; c[2] += 4 (again); c[10] += 4;.
The "most wasteful" distribution of nonzeros is one where the column count will require an add-4 to support 1 extra nonzero: so 5 columns with 4 nonzeros each. Luckily we have exactly 5 add-4s available.
Total space = 28 bits (c) + 20 bits
(a4c) + 80 bits (r) = 128 bits.
Your input is a perfect candidate for a sparse matrix. You said you're using Matlab, so you already have a good sparse matrix built for you.
spm = sparse(dense_matrix)
Matlab's sparse matrix implementation uses Compressed Sparse Columns, which has memory usage on the order of 2*(# of nonzeros) + (# of columns), which should be pretty good in your case of 20 nonzeros and 14 columns. Storing 20 values sure is better than storing 196...
Also remember that all matrices in Matlab are going to be composed of doubles. Just because your matrix can be stored as a 1-bit boolean doesn't mean Matlab won't stick it into a 64-bit floating point value... If you do need it as a boolean you're going to have to make your own type in C and use .mex files to interface with Matlab.
After thinking about this again, if all your matrices are going to be this small and they're all binary, then just store them as a binary vector (bitmask). Going off your 14x14 example, that requires 196 bits or 25 bytes (plus n, m if your dimensions are not constant). That same vector in Matlab would use 64 bits per element, or 1568 bytes. So storing the matrix as a bitmask takes as much space as 4 elements of the original matrix in Matlab, for a compression ratio of 62x.
Unfortunately I don't know if Matlab supports bitmasks natively or if you have to resort to .mex files. If you do get into C++ you can use STL's vector<bool> which implements a bitmask for you.