Really 1 KB (KiloByte) equals 1024 bytes? - binary

Until now I believed that 1024 bytes equals 1 KB (kilobyte) but I was reading on the internet about decimal and binary system.
So, actually 1024 bytes = 1 KB would be the correct way to define or simply there is a general confusion?

What you are seeing is a marketing stunt.
Since non-technical people don't know the difference between Metric Meg, Gig, etc. against the binary Meg, Gig, etc. marketers for storage will use the Metric calculation, thus 1000 Bytes == 1 KiloByte.
This can cause issues with development or highly technical people so you get the idea of a binary Meg, Gig, etc. which is designated with a bi instead of the standard combination (ex. Mebibyte vs Megabyte, or Gibibyte vs Gigabyte)

There are two ways to represent big numbers: You could either display them in multiples of 1000 (base 10) or 1024 (base 2). If you divide by 1000, you probably use the SI prefix names, if you divide by 1024, you probably use the IEC prefix names. The problem starts with dividing by 1024. Many applications use the SI prefix names for it and some use the IEC prefix names. But it is important how it is written:
Using IEC standard:
1 KiB = 1,024 bytes (Note: big K)
1 MiB = 1,024 KiB = 1,048,576 bytes
Using SI standard:
1 kB = 1,000 bytes (Note: small k)
1 MB = 1,000 kB = 1,000,000 bytes
Source: ubunty units policy: https://wiki.ubuntu.com/UnitsPolicy

In the normal world, most things go by the power of 10. This would include electricity, for example.
But, in the computer world, it is about half binary. For example, when they sell a hard drive, they sell it by the value of 10, so if it is a 1KB drive, then it is 1000 B. But, when the computer reads it, the OS's usually read by the value of 1024. This is why, when you read the size of space available on a drive, it reads much less then what it was advertised. A 500 GB drive will read only about 466GB, because the computer is reading the drive by the binary 1024 version. Not the power of 10 that it was sold and advertised by. Same will go with flash drives. But, RAM is sold, and read by the computer, by the Binary 1024 version.
One thing to note.. It is "B", not "b". There are 8 bits "b" in a Byte "B". The reason I bring this up is when you get internet service, they usually advertise the speed by bits, not bytes. When it reads in the download box on the computer, it reads the speed in bytes. Say you have a 50Mb internet connection, it is actually 6.25MB connection in the download speed box, because you have to divide the 50 by 8 since there are 8 bits in a byte. That is how the computer reads it. Another marking strategy too. After all, 50Mb sounds much faster then 6.25MB. Other then speeds through a network, most things are read by bytes "B". Some people do not realize that there is a difference between the "B" and "b".

Quite simple...
The word 'Byte' is a computing reference for which the letter 'B' is used as abbreviation.
It must follow then that any reference to Bytes, eg. KB, MB etc, must be based on the well known and widely accepted 1024 base.
Therefore 1KB must equal 1024 Bytes, 1MB must equal 1048576 Bytes (1024x1024) etc.
Any non-computing reference to Kilo/Mega etc. Is based on the decimal 1000 base, eg. 1KW or 1KiloWatt which is 1000 Watts.

Related

Need a formula to get total LUN size using lunSizeLow and lunSizeHigh SNMP objects

I have 2 SNMP Objects/OIDs. Below are the details:
Object1:
Name: lunSizeLow
OID: 1.3.6.1.4.1.43906.1.4.3.2.3.1.9
Description: `LUN` size in bytes - low order bytes
Object2:
Name: lunSizeHigh
OID: 1.3.6.1.4.1.43906.1.4.3.2.3.1.10
Description: `LUN` size in bytes - high order bytes
My requirement:
I want to monitor LUN size through some script. But i didn't found any SNMP object, which can give total LUN size directly. I found 2 separate objects (lunSizeLow and lunSizeHigh) to get LUN total size, so i need a formula to get total LUN size using these 2 low order and high order SNMP objects (lunSizeLow and lunSizeHigh).
I gone through many articles over internet and i found couple of formulas in community.hpe.com.
But I'm not sure which one is correct.
Formula 1:
Max unsigned number that can be stored in 32bits counter is 4294967295.
Total size would be: LOW_ORDER_BYTES + HIGH_ORDER_BYTES * 4294967296
Formula 2:
Total size in GB is LOW_ORDER_BYTES / 1073741824 + HIGH_ORDER_BYTES * 4
Could any one help me to get correct formula.
Most languages will have the bit-shift operator, allowin you to do something similar to the below (pseudo-Java):
long myBigInteger = lunSizeHigh
myBigInteger << 32 # Shifts the high bits 32 positions to the left, into the high half of the long
myBigInteger = myBigInteger + lunSizeLow
This has two advantages over multiplying:
Bit shifting is often faster than multiplication, even though most compilers would optimize that particular multiplication into a bit shift anyway.
It is easier to read the code and understand why this would provide the correct answer, given the description from the MIB. Magic numbers should be avoided where possible.
That aside, putting some numbers into the Windows Calculator (using Programmer Mode) and trying formula 1, we can see that it works.
Now, you don't specify what language or environment you're working in, and in some languages you won't have any number type that supports the size of numbers you want to manipulate. (Same reason that this number had to be split into two counters to begin with - it's larger than the largest number representation available on some (primitive) platforms.) If you want to do it using multiplication, you'll have to make sure your implementation language can do better.

Why doesn't Google calculate data quantity conversions using binary, and instead just moves the decimal point left/right?

I already understand the fundamentals behind why the two calculations are different. I just want to know how can I get Google to give me the same binary conversion result that Bing does, because I don't feel like using Bing just to convert data quantities.
Use MiB and KiB when you want the 1024 version as from the Kilobyte wikipedia entry: "In the International System of Quantities, the kilobyte (symbol kB) is 1000 bytes, while the kibibyte (symbol KiB) is 1024 bytes.
1 MiB to KiB

Searching through very large rainbow table file

I am looking for the best way to search through a very large rainbow table file (13GB file). It is a CSV-style file, looking something like this:
1f129c42de5e4f043cbd88ff6360486f; somestring
78f640ec8bf82c0f9264c277eb714bcf; anotherstring
4ed312643e945ec4a5a1a18a7ccd6a70; yetanotherstring
... you get the idea - there are about ~900 Million lines, always with a hash, semicolon, clear text string.
So basically, the program should look if a specific hash is lited in this file.
Whats the fastest way to do this?
Obviously, I can't read the entire file into memory and then put a strstr() on it.
So whats the most efficent way to do this?
read file line by line, always to a strstr();
read larger chunk of the file (e.g. 10.000 lines), do a strstr()
Or would it be more efficient import all this data into an MySQL database and then search for the hash via SQL querys?
Any help is appreciated
The best way to do it would be to sort it and then use a binary search-like algorithm on it. After sorting it, it will take around O(log n) time to find a particular entry where n is the number of entries you have. Your algorithm might look like this:
Keep a start offset and end offset. Initialize the start offset to zero and end offset to the file size.
If start = end, there is no match.
Read some data from the offset (start + end) / 2.
Skip forward until you see a newline. (You may need to read more, but if you pick an appropriate size (bigger than most of your records) to read in step 3, you probably won't have to read any more.)
If the hash you're on is the hash you're looking for, go on to step 6.
Otherwise, if the hash you're on is less than the hash you're looking for, set start to the current position and go to step 2.
If the hash you're on is greater than the hash you're looking for, set end to the current position and go to step 2.
Skip to the semicolon and trailing space. The unhashed data will be from the current position to the next newline.
This can be easily converted into a while loop with breaks.
Importing it into MySQL with appropriate indices and such would use a similarly (or more, since it's probably packed nicely) efficient algorithm.
Your last solution might be the easiest one to implement as you move the whole performance optimizing to the database (and usually they are optimized for that).
strstr is not useful here as it searches a string, but you know a specific format and can jump and compare more goal oriented. Thing about strncmp, and strchr.
The overhead for reading a single line would be really high (as it is often the case for file IO). So I'd recommend reading a larger chunk and perform your search on that chunk. I'd even think about parallelizing the search by reading the next chunk in another thread and do comparison there aswell.
You can also think about using memory mapped IO instead of the standard C file API. Using this you can leave the whole contents loading to the operating system and don't have to care about caching yourself.
Of course restructuring the data for faster access would help you too. For example insert padding bytes so all datasets are equally long. This will provide you "random" access to your data stream as you can easily calculate the position of the nth entry.
I'd start by splitting the single large file into 65536 smaller files, so that if the hash begins with 0000 it's in the file 00/00data.txt, if the hash begins with 0001 it's in the file 00/01data.txt, etc. If the full file was 12 GiB then each of the smaller files would be (on average) 208 KiB.
Next, separate the hash from the string; such that you've got 65536 "hash files" and 65536 "string files". Each hash file would contain the remainder of the hash (the last 12 digits only, because the first 4 digits aren't needed anymore) and the offset of the string in the corresponding string file. This would mean that (instead of 65536 files at an average of 208 KiB each) you'd have 65536 hash files at maybe 120 KiB each and 65536 string files at maybe 100 KiB each.
Next, the hash files should be in a binary format. 12 hexadecimal digits costs 48 bits (not 12*8=96-bits). This alone would halve the size of the hash files. If the strings are aligned on a 4 byte boundary in the strings file then a 16-bit "offset of the string / 4" would be fine (as long as the string file is less than 256 KiB). Entries in the hash file should be sorted in order, and the corresponding strings file should be in the same order.
After all these changes; you'd use the highest 16-bits of the hash to find the right hash file, load the hash file and do a binary search. Then (if found) you'd get the offset for the start of the string (in the strings file) from entry in the hash file, plus get the offset for the next string from next entry in the hash file. Then you'd load data from the strings file, starting at the start of the correct string and ending at the start of the next string.
Finally, you'd implement a "hash file cache" in memory. If your application can allocate 1.5 GiB of RAM, then that'd be enough to cache half of the hash files. In this case (half the hash files cached) you'd expect that half the time the only thing you'd need to load from disk is the string itself (e.g. probably less than 20 bytes) and the other half the time you'd need to load the hash file into the cache first (e.g. 60 KiB); so on average for each lookup you'd be loading about 30 KiB from disk. Of course more memory is better (and less is worse); and if you can allocate more than about 3 GiB of RAM you can cache all of the hash files and start thinking about caching some of the strings.
A faster way would be to have a reversible encoding, so that you can convert a string into an integer and then convert the integer back into the original string without doing any sort of lookup at all. For an example; if all your strings use lower case ASCII letters and are a max. of 13 characters long, then they could all be converted into a 64-bit integer and back (as 26^13 < 2^63). This could lead to a different approach - e.g. use a reversible encoding (with bit 64 of the integer/hash clear) where possible; and only use some sort of lookup (with bit 64 of the integer/hash set) for strings that can't be encoded in a reversible way. With a little knowledge (e.g. carefully selecting the best reversible encoding for your strings) this could slash the size of your 13 GiB file down to "small enough to fit in RAM easily" and be many orders of magnitude faster.

cuda: effective Bandwidth in the sdk example of Reduction

in the reduction.pdf ,it introduces the reduction method through 7 steps ,there are 16777216 elements,in the 1th step,the effective bandwidth is 2.083 GB/S,how 2.083GB/S come out? and how the 2th step bandwidth 4.854GB/s come out?
The bandwidth figures are calculated using the number of bytes in the reduction input data divided by the execution time (note there are 2^22 integers = 16777216 bytes). The calculation is clearly shown on page 10 of the pdf that ships in the SDK in reduction/doc.

Are "65k" and "65KB" the same?

Are "65k" and "65KB" the same?
From xkcd:
65KB normally means 66560 bytes. 65k means 65000, and says nothing about what it is 65000 of. If someone says 65k bytes, they might means 65KB...but they're mispeaking if so. Some people argue for the use of KiB to mean 66560 bytes, since k means 1000 in the metric system. Everyone ignores them, though.
Note: a lowercase b would mean bit, rather than bytes. 8Kb = 1KB. When talking about transmission rates, bits are usually used.
Edit: As Joel mentions, hard drive manufacturers often treat the K as meaning 1000. So hard disk space of 65KB would often mean 65000. Thumb drives and the like tend to use K as meaning 1024, though.
Probably.
Technically 65k just means 65 thousand (monkeys perhaps?). You would have to take into account the context.
65kB can be interpreted to mean either 65 * 1000 = 65,000 bytes or 60 * 2^10 = 66,560 bytes.
You can read about all this and kibibytes at Wikipedia.
65k is 65,000 of something
65KB is 66,560 bytes (65*1024)
Like most have said, 65KB is 66560, 65k is 65000. 65KB means 66560 BYTES, and 65k is ambiguous. So they're not the same.
Additionally, since there are a few people equating "8 bits = 1 byte", I thought I'd add a little bit about that.
Transmission rates are usually in bits per second, because the grouping into bytes might not be directly related to the actual transmission clock rate.
Take for instance 9600 baud with RS232 serial ports. There are always exactly 9600 bits going out per second (+/- maybe a 5% clock tolerance). However, if those bits are grouped as N-8-1, meaning "no parity, 8 bits, 1 stop bit", then there are 10 bits per byte and so the byte rate is 960 bytes/second maximum. However, if you have something odd like E-8-2, or "even parity, 8 bits, 2 stop bits" then it's 12 bits per byte, or 800 bytes/second. The actual bits are going out at exactly the same rate, so it only makes sense to talk about the bits/second rate.
So 1 byte might be 8 bits, 9 bits (ie parity), 10 bits (ie N81,E71,N72), 11 bits(ie E81), 12 bits (ie E82), or whatever. There are lots of combinations of ways with just RS232-style transmission to get very odd byte rates. If you throw in RS or ECC correction, you could have even more bits per byte. Then there's 8b/10b, 6b/8b, hamming codes, etc...
In terms of data transfer rates - 65k implies 65 kilobits and 65KB implies 65 KiloBytes
Check this http://en.wikipedia.org/wiki/Data_rate_units
cheers
From Wikipedia for Kilobyte:
It is abbreviated in a number of ways: KB, kB, K and Kbyte.
In other words, they could both be abbreviations for Kilobyte. However, using only a lowercase 'k' is not a standard abbreviation, but most people will know what you mean.
There you go:
kB = kiloByte
KB = KelvinByte
kb = kilobit
Kb = Kelvinbit
Use the bold ones! But be aware that some people use 1024 instead of 1000 for k (kilo).
My opinion on this: kilo = 1000. So the first one who decided to use 1024 made the mistake. If I am not mistaken 1024 was used first by IT engineers. Later they found out (probably some marketing genius) that they can label things using 1000 as kilo and make things look bigger than they actualy are. Since then, you can't be sure which value is used for kilo.
In general, yes, they're both 65 kilobytes (66,560 bytes).
Sometimes the abbreviations are tricky with casing. If it had been "65Kb", it would have correctly meant kilo***bits***.
A kilobyte (KB) is 1024 bytes.
Kilo stands for 1000.
So, going purely by notation: (65k = 65,000) != (65KB = 66,560).
However, if you're talking about memory you're probably always going to see KB (even if its written as k).
Generally, KB = k. It's all very confusing really.
Strictly speaking, the former is not specifying the unit: 65,000 What? So, the two can't really be compared.
However, in general speech then most people mean 65K (note it's normally uppercase) to mean 65 KiloBytes (or 65 * 1024 Bytes).
Note 65Kb usually denotes KiloBits.
"Officially", 65k is 65,000; however people say 65k all the time, even if the real number is something like 65,123.
Typically 65k means anywhere from 64.00001 to 65.99999998 KiB or sometimes anywhere between 63500 and 64999 bytes ... ie, we aren't all the precise most of the time with sizes of things. When someone cares to be precise, they will be explicit, or the meaning will be clear from context.
65 KiB means 65 * 1024 bytes. .... unless the person was rounding. Never trust a number unless you measure it yourself! ... :)
Hope that helps,
--- Dave
65k may be the same as 65KB, but remember, 65KB is larger than 65Kb.
Case is important, as are units.
Psto, you're right. This is an absolute minefield!
As many said, K is tecnically Kilo, meaning Thousand (of anything) and comes from greek.
But you can assume different units depending on the context.
As data transfer rates are most often measured in bits, K in this context could be assumed to be Kilo Bits.
When talking about data storage, a file's size, etc. K can be assumed to be Kilo Bytes.