I know that the computer restore all information in terms of 0,1 so for example when I declare variable in any programming language and assign number 5 to it, my question is how the number 5 is present on the console, in other word, how computer draw the number 5 on the screen, if the number is just restored on the memory in term of 0,1 in 8 byte ram.
Everything inside the computer works like that. You input a number or string, it's being stored like zeros and ones and then decoded back when it reaches your screen.
It's interesting to read about the ASCII code. So if you want to find the how many letter apart is A to F you need to find their ASCII (given that this encoding is used) codes and do the subtraction.
Usually more problem oriented questions are expected (rather that CS theory). Have a read on that too. I am happy to talk theory but it's important to respect the stackoverflow guidelines.
Related
I have about 300 measurements (each stored in a dat file) that I would like to read using MATLAB or Python. The files can be exported to text or csv using a proprietary program, but this has to be done one by one.
The question is: what would be the best approach to crack the format of the binary file using the known content from the exported file?
Not sure if this makes any difference to make the cracking easier, but the files are just two columns of (900k) numbers, and from the dat files' size (1,800,668 bytes), it appears as if each number is 16 bits (float) and there is some other information (possible the header).
I tried using HEX-Editor, but wasn't able to pick up any trends from there.
Lastly, I want to make sure to specify that these are measurements I made and the data in them belongs to me. I am not trying to obtain data that I am not supposed to.
Thanks for any help.
EDIT: Reading up a little more I realized that there my be some kind of compression going on. When you look at the data in StreamWare, it gives 7 decimal places, leading me to believe that it is a single precision value (4 bytes). However, the size of the files suggests that each value only takes 2 bytes.
After thinking about it a little more, I finally figured it out. This is very specific, but just in case another Dantec StreamWare user runs into the same problem, it could save him/her a little time.
First, the data is actually only a single vector. The time column is calculated from the length of the recorded signal and the sampling frequency. That information is probably in the header (but I wasn't able to crack that portion).
To obtain the values in MATLAB, I skipped the header bytes using fseek(fid, 668, 'bof'), then I read the data as uint16 using fread(fid, 900000, 'uint16'). This gives you integers.
To get the float value, all you have to do is divide by 2^16 (it's a 16 bit resolution system) and multiply by ten. I assume the factor of ten depends on the range of your data acquisition system.
I hope this helps.
A keying algorithm used to register software and games. Let's say there is no server-side verification to get in the way...
I would like to know how these can be replicated in C Code so I can have a better understanding of creating my own keygens.
I'm currently a student learning Reverse Engineering and I am trying to reverse crackmes in order to build a keygen for the software.
What are the steps to creating my own keygen that I can debug and crack?
Are all keygens basically a sum of ASCII numbers?
Are there different variants of keygens?
The key generators you're thinking of work very similar to Cryptographic hash functions and often (if not checking with an online authority) verify against some files or checksums and the software just stops execution if the hash/key wasn't found or entered correctly.
You're getting your terminology mixed up speaking about "ASCII Numbers". Any data on a computer can be converted to numbers, and if chosen to convert blocks of 8 bits to numbers between 0-255 you'll always find a corresponding ASCII character. Often you will actually find that keys are encoded in hex, characters from 0-9 and A-F using 4 bits per character.
And, there are many.
I want to know the concept of data That how data is stored in a computer system in binary form. There is huge amount of data and how many combinations computer system make to store that data.
Binary is as the name suggests, only two points that are expressed in a number of ways depending on where you look but all represent the same meaning: -
1 or 0
1 or -1
+ or -
up or down
on or off
A computer does nothing more than put all them together very fast to make things, similar to Morse code. In Morse code a series of on and off makes up letters, letters make up words, sentences, paragraphs, chapters, books, libraries.
How is the data stored, its nothing more than binary, a piece of metal that is magnetized (or switched) to be + or - to represent one of the binary choices.
This was not the right question for this site and you should have searched and read up on this yourself.
I hope this isn't too opinionated for SO; it may not have a good answer.
In a portion of a library I'm writing, I have a byte array that gets populated with values supplied by the user. These values might be of type Float, Double, Int (of different sizes), etc. with binary representations you might expect from C, say. This is all we can say about the values.
I have an opportunity for an optimization: I can initialize my byte array with the byte MAGIC, and then whenever no byte of the user-supplied value is equal to MAGIC I can take a fast path, otherwise I need to take the slow path.
So my question is: what is a principled way to go about choosing my magic byte, such that it will be reasonably likely not to appear in the (variously-encoded and distributed) data I receive?
Part of my question, I suppose, is whether there's something like a Benford's law that can tell me something about the distribution of bytes in many sorts of data.
Capture real-world data from a diverse set of inputs that would be used by applications of your library.
Write a quick and dirty program to analyze dataset. It sounds like what you want to know is which bytes are most frequently totally excluded. So the output of the program would say, for each byte value, how many inputs do not contain it.
This is not the same as least frequent byte. In data analysis you need to be careful to mind exactly what you're measuring!
Use the analysis to define your architecture. If no byte never appears, you can abandon the optimization entirely.
I was inclined to use byte 255 but I discovered that is also prevalent in MSWord files. So I use byte 254 now, for EOF code to terminate a file.
I have url table in mysql which has only two fields id and varchar(255) for url. There are currently more than 50 million urls there and my boss has just given my clue about the expansion of our current project which will result in more urls to be added in that url table and expected numbers are well around 150 million in the mid of the next year.
Currently database size is about 6GB so I can safely say that if things are left same way then it will cross 20GB which is not good. So, I am thinking of some solution which can reduce the disk space of url storage.
I also want to make it clear that this table is not a busy table and there are not too many queries at the momen so I am just looking to save disk space and more importantly I am looking to explore new ideas of short text compression and its storage in mysql
BUT in future that table can also be accessed heavily so its better to optimize the table well before the time come.
I worked quite a bit to change the url into numeric form and store using BIGINT but as it has limitations of 64 bits so it didn't work out quite well. And same is the problem with BIT data type and imposes the limit of 64 bits too.
My idea behind converting to numeric form is basically as 8byte BIGINT stores 19 digits so if each digit points a character in a character set of all possible characters then it can store 19 characters in 8 bytes if all characters are ranged from 1-10 but as in real world scenario there are 52 characters of English and 10 numeric plus few symbols so its well around 100 character set. So, in worst case scenario BIGINT can still point to 6 characters and yes its not a final verdict it still needs some workout to know exactly what each digit is point to it is 10+ digit or 30+ digit or 80+ digit but you have got pretty much the idea of what I am thinking about.
One more important thing is that as url are of variable length so I am also trying to save disk space of small urls so I don't want to give a fixed length column type.
I have also looked into some text compression algo like smaz and Huffman compression algo but not pretty much convinced because they use some sort of dictionary words but I am looking for a clean method.
And I don't want to use binary data type because it also take too many space like varchars in bytes.
Another idea to try might be to identify common strings and represent them with a bitmap. For instance, have two bits to represent the protocol (http, https, ftp or something else), another bit to indicate whether the domain starts with "wwww", two bits to indicate whether the domain ends with ".com", ".org", ".edu" or something else. You'd have to do some analysis on your data and see if these make sense, and if there are any other common strings you can identify.
If you have a lot of URLs to the same site, you could also consider splitting your table into two different ones, one holding the domain and the other containing the domain-relative path (and query string & fragment id, if present). You'd have a link table that had the id of the URL, the id of the domain and the id of the path, and you'd replace your original URL table with a view that joined the three tables. The domain table wouldn't have to be restricted to the domain, you could include as much of the URL as was common (e.g., 'http://stackoverflow.com/questions'). This wouldn't take too much code to implement, and has the advantage of still being readable. Your numeric encoding could be more efficient, once you get it figured out, you'll have to analyze your data to see which one makes more sense.
If you are looking for 128 bit integers then you can use binary(16) here 16 is bytes. And you can extend it to 64 bytes (512 bits) so it doesn't take more space than bit data type. You can say Binary data type as an expansion of BIT data type but its string variant.
Having said that I would suggest dictionary algorithms to compress URLs and short strings but with the blend of techniques used by url shortening services like using A-Z a-z 0-9 combination of three words to replace large dictionary words and you would have more combinations available than available words 62 X 62 X 62.
Though I am not sure what level of compression you would achieve but its not a bad idea to implement url compression this way.