How data stored in Binary form? - binary

I want to know the concept of data That how data is stored in a computer system in binary form. There is huge amount of data and how many combinations computer system make to store that data.

Binary is as the name suggests, only two points that are expressed in a number of ways depending on where you look but all represent the same meaning: -
1 or 0
1 or -1
+ or -
up or down
on or off
A computer does nothing more than put all them together very fast to make things, similar to Morse code. In Morse code a series of on and off makes up letters, letters make up words, sentences, paragraphs, chapters, books, libraries.
How is the data stored, its nothing more than binary, a piece of metal that is magnetized (or switched) to be + or - to represent one of the binary choices.
This was not the right question for this site and you should have searched and read up on this yourself.

Related

Cracking a binary file format if I have the contents of one of these files

I have about 300 measurements (each stored in a dat file) that I would like to read using MATLAB or Python. The files can be exported to text or csv using a proprietary program, but this has to be done one by one.
The question is: what would be the best approach to crack the format of the binary file using the known content from the exported file?
Not sure if this makes any difference to make the cracking easier, but the files are just two columns of (900k) numbers, and from the dat files' size (1,800,668 bytes), it appears as if each number is 16 bits (float) and there is some other information (possible the header).
I tried using HEX-Editor, but wasn't able to pick up any trends from there.
Lastly, I want to make sure to specify that these are measurements I made and the data in them belongs to me. I am not trying to obtain data that I am not supposed to.
Thanks for any help.
EDIT: Reading up a little more I realized that there my be some kind of compression going on. When you look at the data in StreamWare, it gives 7 decimal places, leading me to believe that it is a single precision value (4 bytes). However, the size of the files suggests that each value only takes 2 bytes.
After thinking about it a little more, I finally figured it out. This is very specific, but just in case another Dantec StreamWare user runs into the same problem, it could save him/her a little time.
First, the data is actually only a single vector. The time column is calculated from the length of the recorded signal and the sampling frequency. That information is probably in the header (but I wasn't able to crack that portion).
To obtain the values in MATLAB, I skipped the header bytes using fseek(fid, 668, 'bof'), then I read the data as uint16 using fread(fid, 900000, 'uint16'). This gives you integers.
To get the float value, all you have to do is divide by 2^16 (it's a 16 bit resolution system) and multiply by ten. I assume the factor of ten depends on the range of your data acquisition system.
I hope this helps.

How do Keygens function?

A keying algorithm used to register software and games. Let's say there is no server-side verification to get in the way...
I would like to know how these can be replicated in C Code so I can have a better understanding of creating my own keygens.
I'm currently a student learning Reverse Engineering and I am trying to reverse crackmes in order to build a keygen for the software.
What are the steps to creating my own keygen that I can debug and crack?
Are all keygens basically a sum of ASCII numbers?
Are there different variants of keygens?
The key generators you're thinking of work very similar to Cryptographic hash functions and often (if not checking with an online authority) verify against some files or checksums and the software just stops execution if the hash/key wasn't found or entered correctly.
You're getting your terminology mixed up speaking about "ASCII Numbers". Any data on a computer can be converted to numbers, and if chosen to convert blocks of 8 bits to numbers between 0-255 you'll always find a corresponding ASCII character. Often you will actually find that keys are encoded in hex, characters from 0-9 and A-F using 4 bits per character.
And, there are many.

Best way to store U.S. phone numbers ( NANP)

I have requirement to store NANP(North American Numbering Plan) numbers. This means I don't care and no need to bother about international numbers.
Numbering plan goes like this :
NPA-NXX-XXXX
I would filter & strip extra space or dash(-) to make into 10 digit correct format. Currently we use MySQL and CouchDB for some other stuff but would prefer to keep in MySQL DB as preferred storage system.
I'm looking for fast read operation to match numbers during runtime and write can be little slow as mostly insert/update will happen in off hours.
Since it is given that NPA & NXX will never start with 0 so if we can separate
them and they can be used as integer type in case of want to breakdown.
For NoSQL case, it is possible to generate separate document for each area code and then further isolate NXX & XXXX.
For RDBMS case, a full number can be stored as indexed integer for fast accessibility.
What would be the best database design to store these numbers ?
Thanks in advance.
I'm looking for fast read operation to match numbers during runtime
With CouchDB you can store every number as ID of a doc e.g.
{
_id: "NPA-NXX-XXXX",
_rev: "1-..."
}
To match any number you send a lightweight
HEAD path/to/CouchDB/dbname/NPA-NXX-XXXX
and it will respond with the statuscode 200 (has matched) or 404 (no match).
Write operations can be done in big bulks (/dbname/_bulk_docs).
Because the numbers are stored as id's the primary index of CouchDB can be used for the HEAD requests (as described above) - that means that every write is immediately available for reads.

store text of character length ~300,000 in mysql database

I have a column of data I would like to add to a mysql database table. The column is raw text and the longest piece of text contains approximately 300,000 characters. Is it possible to store this in the table? How?
I have been reading that even LONGTEXT columns are limited somewhat.
Presumably you have ruled out the alternative of storing these items of text in files, and storing their pathnames in your table. If you have not considered that choice, please do. It's often the most practical way to handle this sort of application. That's especially true if you're using a web server to deliver your information to your users: by putting those objects in your file system you avoid a very serious production bottleneck (fetching the objects from the DBMS and then sending them to the user).
MySQL's LOBs (large objects) will take 300k characters without problems. MEDIUMTEXT handles 16 megabytes. But the programming work necessary to load those objects into the DBMS and get them out again can be a bit challenging. You haven't mentioned your application stack, so it's hard to give you specific advice about that. Where to start? read about the MySQL server parameter max_allowed_packet.
If this were my project, and for some reason using the file system was out of the question, I would store the large textual articles as segments in shorter rows. For example, instead of
textid textval
(int) (MEDIUMTEXT)
number lots and lots and lots of text.
I'd make a table like this:
textid segmentid textval
(int) (int) (VARCHAR(250))
number 1 Lots and
number 2 lots and
number 3 lots of
number 4 text.
The segment lengths should probably be around 250 characters each in length. I think you'd be smart to break the segments on word boundaries if you can; it will make stuff like FULLTEXT search easier. This will end up with many shorter rows for your big text items, but that will make your programming, your backups, and everything else about your system. easier to handle all around.
There is an upfront cost, but it's probably worth it.

url text compression (not shortening) and storing in mysql

I have url table in mysql which has only two fields id and varchar(255) for url. There are currently more than 50 million urls there and my boss has just given my clue about the expansion of our current project which will result in more urls to be added in that url table and expected numbers are well around 150 million in the mid of the next year.
Currently database size is about 6GB so I can safely say that if things are left same way then it will cross 20GB which is not good. So, I am thinking of some solution which can reduce the disk space of url storage.
I also want to make it clear that this table is not a busy table and there are not too many queries at the momen so I am just looking to save disk space and more importantly I am looking to explore new ideas of short text compression and its storage in mysql
BUT in future that table can also be accessed heavily so its better to optimize the table well before the time come.
I worked quite a bit to change the url into numeric form and store using BIGINT but as it has limitations of 64 bits so it didn't work out quite well. And same is the problem with BIT data type and imposes the limit of 64 bits too.
My idea behind converting to numeric form is basically as 8byte BIGINT stores 19 digits so if each digit points a character in a character set of all possible characters then it can store 19 characters in 8 bytes if all characters are ranged from 1-10 but as in real world scenario there are 52 characters of English and 10 numeric plus few symbols so its well around 100 character set. So, in worst case scenario BIGINT can still point to 6 characters and yes its not a final verdict it still needs some workout to know exactly what each digit is point to it is 10+ digit or 30+ digit or 80+ digit but you have got pretty much the idea of what I am thinking about.
One more important thing is that as url are of variable length so I am also trying to save disk space of small urls so I don't want to give a fixed length column type.
I have also looked into some text compression algo like smaz and Huffman compression algo but not pretty much convinced because they use some sort of dictionary words but I am looking for a clean method.
And I don't want to use binary data type because it also take too many space like varchars in bytes.
Another idea to try might be to identify common strings and represent them with a bitmap. For instance, have two bits to represent the protocol (http, https, ftp or something else), another bit to indicate whether the domain starts with "wwww", two bits to indicate whether the domain ends with ".com", ".org", ".edu" or something else. You'd have to do some analysis on your data and see if these make sense, and if there are any other common strings you can identify.
If you have a lot of URLs to the same site, you could also consider splitting your table into two different ones, one holding the domain and the other containing the domain-relative path (and query string & fragment id, if present). You'd have a link table that had the id of the URL, the id of the domain and the id of the path, and you'd replace your original URL table with a view that joined the three tables. The domain table wouldn't have to be restricted to the domain, you could include as much of the URL as was common (e.g., 'http://stackoverflow.com/questions'). This wouldn't take too much code to implement, and has the advantage of still being readable. Your numeric encoding could be more efficient, once you get it figured out, you'll have to analyze your data to see which one makes more sense.
If you are looking for 128 bit integers then you can use binary(16) here 16 is bytes. And you can extend it to 64 bytes (512 bits) so it doesn't take more space than bit data type. You can say Binary data type as an expansion of BIT data type but its string variant.
Having said that I would suggest dictionary algorithms to compress URLs and short strings but with the blend of techniques used by url shortening services like using A-Z a-z 0-9 combination of three words to replace large dictionary words and you would have more combinations available than available words 62 X 62 X 62.
Though I am not sure what level of compression you would achieve but its not a bad idea to implement url compression this way.