How does a SHX-file work? - gis

If you have geographic data stored in ESRI shapes, you have at least three files: one ending with .shp containing the vector-data, one ending with .dbf containing attributes and a file ending with .shx containing an index.
I'm interested in the shx-file. How does it work? Does it contain a complete mapping, like 'first geometry maps to third row in the dbf and second geometry maps to the first row' for every geometry? Or does it work different?

According to the spec the shx contains a 100 byte header followed by a sequence of 8 byte records. Each record stores a 4 byte offset and a 4 byte content length for a record in the main .shp data file.
+-----------------------------------------------+
| header (100 bytes) |
+-----------------+------------------+----------+
| offset(4 bytes) | length (4 bytes) |
+-----------------+------------------+
| offset(4 bytes) | length (4 bytes) |
+-----------------+------------------+
| offset(4 bytes) | length (4 bytes) |
+-----------------+------------------+
| offset(4 bytes) | length (4 bytes) |
+-----------------+------------------+
| .... |
+-----------------+------------------+
Note that the offset is specified in 16 bit words, so the offset for the first record is 50 (as the .shp header is 100 bytes, or 50 words, long). The content length is also specified in 16 bit words.
So, you can figure out the number of records from (index_file_length-100)/8, and use the index to access a particular shape record in the .shp file at random or in sequence.

Fine answer by Paul Dixon.
Though I was wondering what you are going to do with it! If you're going to write code to read or write SHP files I would strongly suggest using a library instead - there are some good free open source ones like GDAL, also some good commercial ones.

Related

MySQL character set for numbers compression

I would like to store many numbers in 1 cell and save space. The idea is to encode each one to a string of a constant length and store them in a text field (presumably MEDIUMTEXT). What characters can be used so that they are 1 byte only? I assume that special characters are stored in a way that uses more than 1 byte. I can use e.g. base64 but I am not sure how many encoding characters I can add to the base before MySQL uses actually more space to store them than I manage to save.
You say "numbers". What do you mean, really?
Digits? See above.
Integers? (no decimal point, no fraction)
Floats? (with exponent, etc)
Some notes on digits, compression, etc:
1 byte per Ascii character -- 8 bits
1 byte per digit, since it is an ascii character
One random digit, at maximum theoretical compression, is about 0.332 bytes. Visualize it this way: 1000 is 3 digits and 1024 is 10 bits.
MySQL's DECIMAL datatype puts 2 digits into one byte for smaller numbers; for larger numbers it stores 9 digits in 4 bytes.
If you zip up a million digits of pi, it will come very close to the above compression.
A simple Rule of Thumb is that "text" compresses 3:1.
Base64 expands bytes by 8/6 because one 8-bit byte is represented by 64 (2^6) different characters.
Base64 is more useful for avoiding special characters; it is not really a compression technique.
A 4-byte MySQL INT (range of -2 billion to +2 billion, but usually just positive and not evenly distributed), when converted to base64 would take more than 5 bytes for 9-10 digits.
General techniques
Client compression: For 123,2345,88,22, here is one way to handle it. In fact, I recommend this for virtually any text handling where compression is desired in MySQL.
use compress() (or similar function) in your client.
use BLOB (up to 64KB) or MEDIUMBLOB (up to 16MB) in the table
use uncompress() after retrieving the blob.
For an array of numbers, use json_encode for the array, then feed into compress+blob, above. It will work for any sized "numbers" and provide nearly maximal compresssion.
You cannot efficiently reach into a MEDIUMTEXT or BLOB to get one number out of an array. The entire cell will be fetched.
That leads to another general statement... If you have a lot of stuff that you don't need to sort on, nor fetch individually, JSON is a good approach. Think of it as from MySQL's point of view an opaque blob. The application writes and rereads it as one big thing, then picks it apart.
The JSON will possibly encode the above example as ["123","2345","88","22"], which will be slightly fatter after compression. But, any good compression algorithm will notice and take advantage of the repetition.
Take advantage of the data
17,22738 48,77795 300
17,22792 48,77795 297
17,22853 48,77764 294
17,22874 48,77743 297
17,22887 48,77704 300
17,22968 48,77671 305
17,23069 48,77563 296
17,23092 48,77561 292
-->
17,22738 48,77795 300
54 0 -3
61 -31 -3
21 -21 3
13 -39 3
81 -33 5
1 -108 -9
23 -2 -4
The numbers stay relatively constant. Take advantage of it by starting with raw data, but then switching to deltas. Try it will about 10 times as much data; I suspect you will continue to get better than 2x compression before zipping, but maybe slightly less than 2x after zipping. (Zipping can take advantage of the repetition of 48,777; I am taking more advantage of it by tossing most of it.)

What is format of Google Drive's FileID ? I need to find out whether new File has been uploaded to my drive or not using API's

If FileID follows a particular format like new file has higher values or lower values I can keep track of the old values and find out whether I hace uploaded a new File in drive.
The format is a 264 bit number
bit 0 5 6 261 261 263
+--------+-----------------+-------+
| 110101 | 256 RANDOM BITS | 00 |
+--------+-----------------+-------+
url safe base64 encoded as
/1[a-zA-Z0-9_-]{42}[AEIMQUYcgkosw048]/
You can store just the random part as a Byte[32] but it is safer to treat it as a random string as this may change at any time
File ID is just a random string - there is no format.
You are taking the wrong approach. Look at the changes resource https://developers.google.com/drive/v3/reference/changes

Representing a tic tac toe board in computer memory,

I am trying to solve this problem: Design a method for representing the state of a tic-tac-toe board in computer memory. Can you fit your representation into three bytes?
This is from a textbook without solutions, thank you!
Any help is appreciated!
The state of a Tic-Tac-Toe board can be encoded using 3 bytes as follows.
To represent the state of each cell, 3 states are necessary, namely X, O and undefined. 3 states can be represented by 2 bits (2 bits can in fact represent 4 states, but only 3 are needed here - on the other hand, 1 bit is insufficient).
There are 9 cells in total, so in total
2 * 9 = 18
bits are necessary to represent the board. 18 bits can be encoded in 3 bytes (which in total have 24 bits, which means that 6 bits are not needed).
A Tic-Tac-Toe board consist of 9 fields. Each field can take 3 states: Empty, Circle, Cross. To represent each state you need 2 bits: 00, 01, 10.
With two bits for each field, you can easily represent whole board in 3 bytes, by using two bits as each field, and each byte as row of board.

Binary , hexadecimal and octal values in mysql

I am very interested in working with binary , hexadecimal and octal systems in mysql database. First of all please give me an advice why we need them during storing information , because of ton of information or why ?
Also which type of values must be stored in marked systems ?
In addition here are bit operator like "<<"
here is example => SELECT 50<<2 AS example;
this gives us result 200 , anyone can explain how it is calculating it ?
Thanks for answering :))
First of all please give me an advice why we need them during storing information
Computers store data in binary. Sometimes it's useful for us to think in terms of the actual bits that are stored, in which case our familiar decimal system can be a little awkward (as conversions are not straightforward); we could write the bits out in full, but that's often too cumbersome since even quite small numbers take up a lot of space to write (e.g. decimal 24521 is binary 101111111001001).
Instead, we tend to use bases which are some power of 2, since they're more compact than binary whilst having the property that each 'digit' represents an exact number of bits in the binary representation. For example, a hexadecimal (base-16) digit represents four bits (a "nibble") with the digits 0 through to F (decimal 15 / binary 1111); an octal (base-8) digit represents three bits with the digits 0 through to 7 (binary 111).
Our earlier example of decimal 24521 would be 5FC9 in hex or 57711 in octal: starting from the right you can see that each digit respectively represents 4 and 3 bits in the above binary representation. Therefore it is (relatively) easy for us humans to visualise the binary representation whilst looking at these compact representations in other bases.
Also which type of values must be stored in marked systems?
I'm not sure what you mean by this. As indicated above, the same values can be represented in all of these systems. In MySQL, we can indicate a binary literal by prepending it with 0b and a hexadecimal literal by prepending it with 0x. MySQL does not support octal literals.
anyone can explain how it is calculating it ?
The << operator performs a bitwise left-shift. That is, it shifts the bits of the left-hand operand left by the number of places given by the right-hand operand.
For each position the bits of an integer are shifted left, the value represented by those bits increases two-fold. It's similar to the effect of shifting digits left in our decimal system, whereby values increase ten-fold (for example, 50 shifted one place to the left gives 500, which is a ten-fold increase; in binary 110 (decimal 6) shifted one place left gives 1100 (decimal 12), which is a two-fold increase).
In your case, shifting the bits of the number 50 (i.e. 110010) two places to the left yields 2 two-fold increases (i.e. a four-fold increase overall): 11001000 is decimal 200.
Your first two questions are too vague to answer, but the third one is concrete enough that I'll respond.
The <<2 is shifting the bits to the left 2 places, as documented here. This is the equivalent of multiplying 50 by 2^2:
mysql> SELECT 50<<2 AS example;
+---------+
| example |
+---------+
| 200 |
+---------+
1 row in set (0.00 sec)
mysql>
mysql> SELECT 50 * POW(2,2) AS example;
+---------+
| example |
+---------+
| 200 |
+---------+
1 row in set (0.00 sec)

Binary to standard digit?

I'm going to make a computer in Minecraft. I understand how to build a computer where it can make binary operations but I want the outputs to be displayed as standard integer numbers. How you "convert" the binaries into standard digits? Is there any chart for that? And the digits will be shown like in old calculators; with 7 lines.
--
| |
--
| |
--
In electronics, what you need is called a "binary to binary coded decimal" converter. "Binary coded decimal" is the set of bits needed to produce a number on a 7 segment display. Here's a PDF describing how one of these chips works. Page 3 of the PDF shows the truth table needed to do the conversion as well as a picture of all of the NAND gates that implement it in hardware. You can use the truth table to build the set of boolean expressions needed in your program.
0 = 0
1 = 1
10 = 2
11 = 3
100 = 4
101 = 5
110 = 6
111 = 7
...
Do you see the pattern? Here's the formula:
number = 2^0 * (rightmost digit)
+ 2^1 * (rightmost-but-1 digit
+ 2^2 * (rightmost-but-2 digit) + ...
Maybe what you are looking for is called BCD or Binary Coded Decimal. There is a chart and a karnaugh map for it that has been used for decades. a quick Google search for it gave me this technical page
http://circuitscan.homestead.com/files/digelec/bcdto7seg.htm
How are you trying to build the computer?
Maybe that key word can at least help you find what you need. :)
Your problem has two parts:
Convert a binary number into digits, that is do a binary to BCD conversion.
Convert a digit into a set of segments to activate.
For the latter you can use a table that assigns the bitmap of active segments to each digit.
I think's that's two different questions.
There isn't a "binary string of 0/1" to integer conversion built in - you would normally just write your own to loop over the string and detect each power of 2.
YOu can also write your own 7segment LED display - it's a little tricky because it's on multiple lines, but would be an interesting excersize.
Alternatively most GUIs have an LCD font,Qt certainly does