In the GIF specification, here:
http://www.w3.org/Graphics/GIF/spec-gif89a.txt
It refers to 'bytes', which I naturally assume are unsigned chars. If this is the case, what does it refer to when it says 'unsigned'? Unsigned... what? The precise definition is important as it lets me know how many bytes to read in.
Thank you for your time.
"unsigned" in the specification refers to a 16-bit integer, with the least significant byte first.
It should probably be noted that in C, unsigned by itself is a synonym for unsigned int, and at the time the GIF specification was written, it was probably reasonable to assume that int on most machines was 16 bits, so it's not entirely unreasonable for them to not define the terms they were using.
Wherever the word "unsigned" is mentioned in the document, the adjacent diagram shows the number of bytes taken by it. Looks like it's always 2 bytes.
Notice also that the appendix mentions:
Byte Ordering - Unless otherwise stated, multi-byte numeric fields are
ordered with the Least Significant Byte first.
Related
I'm a bit confused with the MySQL Documentation with regards to the storage requirements for various fields. I'm currently working with redesigning a database and I'm seeing TINYINT(4) as they data type. Previously I've never given any thought to this, but will this require one byte and just truncate the last digit off the number, or will it actually require 2 bytes and be converted to a SMALLINT internally?
EDIT - I know that the number represents the amount of digits that will be displayed, like TINYINT(2) will only show 2 digits or whatever, but what if that number is more than the data type can actually hold?
As you stated correctly the TINYINT type uses 1 byte of storage for 256 possible integer values (-128 through 127) or UNSIGNED 0-255. See that -128? This is (along with ZEROFILL) the reason for (4). But it won´t get converted to a SMALLINT automatically, so choose your data type accordingly.
See this link, this blog deals with the topic (as mentioned in answer here).
Seems like BIGINT is the biggest integer available on MySQL, right?
What to do when you need to store a BIGINT(80) for example?
Why in some cases, like somewhere in the Twitter API docs, they recommend us to store these large integers as varchar?
Which is the real reason behind the choice of using one type over another?
Big integers aren't actually limited to 20 digits, they're limited to the numbers that can be expressed in 64 bits (for example, the number 99,999,999,999,999,999,999 is not a valid big integer despite it being 20 digits long).
The reason you have this limitation is that native format integers can be manipulated relatively fast by the underlying hardware whereas textual versions of a number (tend to) need to be processed one digit at a time.
If you want a number larger than the largest 64-bit unsigned integer 18,446,744,073,709,551,615 then you will need to store it as a varchar (or other textual field) and hope that you don't need to do much mathematical manipulation on it.
Alternatively, you can look into floating point numbers which have a larger range but less precision, or decimal numbers which should be able to give you 65 digits for an integral value, with decimal(65,0) as the column type.
You can specify a numeric(65,0), but if you need to get larger, you'll need a varchar.
The reason to select one over another is usage, efficiency and space. Using an int is more efficient than a bigint or, I believe, numeric If you need to do math on it.
You can store that big integers as an arbitrary binary string if you want maximum storage efficiency.
But I'm not sure if it worth it because you'll have to deal with over 64 bit integers in your application too, which is also not the thing you want to do without a strong reason.
Better keep things simple and use varchar.
BIGINT is limited by definition to 8 digits. The maximum number of digits in DECIMAL type is 64. You must use VARCHAR to store values of larger precision and be aware that there is no direct math of such values.
I'm working with some database abstraction layers and most of them are using attributes like "String" which is VARCHAR 250 or INTEGER which has length of 11 digits. But for example I have something that will be less than 250 characters long. Should I go and make it less? Does it really makes any valuable difference?
Thanks in advance!
INT length does nothing. All INTs are 4 bytes. The number you can set, is only used for zerofill (and who uses that!?).
VARCHAR length does more. It's the maxlength of the field. VARCHAR is saved so that only the actual data is stored, so the length doesn't mattter. These days, you can have bigger VARCHARs than 255 bytes (being 256^2-1). The difference is the bytes that are used for the field length. VARCHAR(100) and VARCHAR(8) and VARCHAR(255) use 1 byte to save the field length. VARCHAR(1000) uses 2.
Hope that helps =)
edit
I almost always make my VARCHARs 250 long. Actual length should be checked in the app anyway. For bigger fields I use TEXT (and those are stored differently, so can be much much longer).
edit
I don't know how current this is, but it used to help me (understand): http://help.scibit.com/Mascon/masconMySQL_Field_Types.html
First, remember that the database is meant to store facts and is designed to protect itself against bad data. Thus, the reason you do not want to allow a user to enter 250 characters for a first name is that a user will put all kinds of data in there that is not a first name. They'll put their whole name, their underwear size, a novel about what they did last summer and so on. Thus, you want to strive to enforce that the data is as correct as possible. It is a mistake to assume that the application is the sole protector against bad data. You want users to tell you that they had a problem stuffing War in Peace into a given column.
Thus, the most important question is, "What is the most appropriate value for the data being stored?" Ideally, you would use an int and a check constraint to ensure that the values have an appropriate range (e.g. greater than zero, less than a billion etc.). Unfortunately, this is one of MySQL's greatest weakness: it does not honor check constraints. That simply means you must implement those integrity checks in triggers which admittedly is more cumbersome.
Will the difference between an int (4 bytes) make an appreciable difference to a tinyint (1 byte)? Obviously, it depends on the amount of data. If you will have no more than 10 rows, the answer is obviously no. If you will have 10 billion rows, the answer is obviously "Yes". However, IMO, this is premature optimization. It is far better to focus on ensuring correctness first.
For text, you should ask whether your data should support Chinese, Japanese or non-ANSI values (i.e., should you use nvarchar or varchar)? Does this value represent a real world code like a currency code, or bank code which has a specific specification?
Not so sure in MySQL, but in MS SQL it only makes a difference for sufficiently large databases. Typically, I like to use smaller fields for a) the space saving (it never hurts to practice good habits) and b) for the implied validation (if you know a certain field should never be more than 10 characters, why allow eleven, let alone 250?).
I thinks Rudie is wrong, not all INTs are 4 bytes... in MySQL you have:
tinyint = 1 byte,
smallint = 2 bytes,
mediumint = 3 bytes,
int = 4 bytes,
bigint = 8 bytes.
I think Rudie refers to the "display with" that is the number you put between parenthesis when you are creating a column, e.g.:
age INT(3)
You're telling to the RDBMS just to SHOW no more than 3 numbers.
And VARCHARs are (variable length charcter string) so if you declare let's say name varchar(5000) and you store a name like "Mario" you only are using 7 bytes (5 for the data and 2 for the length of the value).
The correct field size serves to limit the bad data that can be put in. For instance suppose you have a phone number field. If you allow 250 characters, you will often end up with things like the following in the phone field (an example not taken at random):
Call the good-looking blonde secretary instead.
So first limiting the length is part of how we enforce data integrity rules. As such it is critical.
Second, there is only so much space on a datapage and while some databases will allow you to create tables where the potential record is longer than the width of the data page, they often will not allow you to actually exceed it when storing the data. This can lead to some very hard to find bugs when suddenly one record can't be saved. I don't know about MySql and whether it does this but I know SQL Server does and it is very hard to figure out what is wrong. So making data the correct size can be critical to preventing bugs.
Are there any performance difference between decimal(10,0) unsigned type and int(10) unsigned type?
It may depend on the version of MySQL you are using. See here.
Prior to MySQL 5.0.3, the DECIMAL type was stored as a string and would typically be slower.
However, since MySQL 5.0.3 the DECIMAL type is stored in a binary format so with the size of your DECIMAL above, there may not be much difference in performance.
The main performance issue would have been the amount of space taken up by the different types (with DECIMAL being slower). With MySQL 5.0.3+ this appears to be less of an issue, however if you will be performing numeric calculations on the values as part of the query, there may be some performance difference. This may be worth testing as there is no indication in the documentation that i can see.
Edit:
With regards to the int(10) unsigned, i took this at face value as just being a 4 byte int. However this has a maximum value of 4294967295 which strictly doesn't provide the same range of numbers as a DECIMAL(10,0) unsigned .
As #Unreason pointed out, you would need to use a bigint to cover the full range of 10 digit numbers, pushing the size up to 8 bytes.
A common mistake is that when specifying numeric columns types in MySQL, people often think the number in the brackets has an impact on the size of the number they can store. It doesn't. The number range is purely based on the column type and whether it is signed or unsigned. The number in the brackets is for display purposes in results and has no impact on the values stored in the column. It will also have no impact of the display of the results unless you specify the ZEROFILL option on the column as well.
According to the mysql data storage your decimal will require
DECIMAL(10,0): 4 bytes for 9 digits and 1 byte for the remaining 10th digit, so in total five bytes (assuming my reading of documentation is correct).
INT(10): will need BIGINT which is 8 bytes.
The differences is that the decimal is packed and some operations on such data type might be slower then on normal INT types which map directly to machine represented numbers.
Still I would do your own tests to confirm the above reasoning.
EDIT:
I noticed that I did not elaborate on the obvious point - assuming the above logic is sound the difference in size required is 60% more space needed for BIGINT variant.
However this does not directly translate to penalties due to the fact that data is normally not written byte by byte. In case of selects/updates of many rows you should see the performance loss/gain, but in case of selecting/updating a small number of rows the filesystem will fetch blocks from the disk(s) which will normally get/write multiple columns anyway.
The size (and speed) of indexes might be more directly impacted.
However, the question on how the packing influences various operations still remains open.
According to this similar question, yes, potentially there is a big performance hit because of difference in the way DECIMAL and INT are treated and threaded into the CPU when doing calculations.
See: Is there a performance hit using decimal data types (MySQL / Postgres)
I doubt such a difference can be performance related at all.
Most of performance issues tied to proper database design and indexing plan, and server/hardware tuning as a next level.
In which cases would you use which? Is there much of a difference? Which I typically used by persistence engines to store booleans?
A TINYINT is an 8-bit integer value, a BIT field can store between 1 bit, BIT(1), and 64 bits, BIT(64). For a boolean values, BIT(1) is pretty common.
From Overview of Numeric Types;
BIT[(M)]
A bit-field type. M indicates the
number of bits per value, from 1 to
64. The default is 1 if M is omitted.
This data type was added in MySQL
5.0.3 for MyISAM, and extended in 5.0.5 to MEMORY, InnoDB, BDB, and NDBCLUSTER. Before 5.0.3, BIT is a
synonym for TINYINT(1).
TINYINT[(M)] [UNSIGNED] [ZEROFILL]
A very small integer. The signed range
is -128 to 127. The unsigned range is
0 to 255.
Additionally consider this;
BOOL, BOOLEAN
These types are synonyms for
TINYINT(1). A value of zero is
considered false. Non-zero values are
considered true.
All these theoretical discussions are great, but in reality, at least if you're using MySQL and really for SQLServer as well, it's best to stick with non-binary data for your booleans for the simple reason that it's easier to work with when you're outputting the data, querying and so on. It is especially important if you're trying to achieve interoperability between MySQL and SQLServer (i.e. you sync data between the two), because the handling of BIT datatype is different in the two of them. SO in practice you will have a lot less hassles if you stick with a numeric datatype. I would recommend for MySQL to stick with BOOL or BOOLEAN which gets stored as TINYINT(1). Even the way MySQL Workbench and MySQL Administrator display the BIT datatype isn't nice (it's a little symbol for binary data). So be practical and save yourself the hassles (and unfortunately I'm speaking from experience).
BIT should only allow 0 and 1 (and NULL, if the field is not defined as NOT NULL). TINYINT(1) allows any value that can be stored in a single byte, -128..127 or 0..255 depending on whether or not it's unsigned (the 1 shows that you intend to only use a single digit, but it does not prevent you from storing a larger value).
For versions older than 5.0.3, BIT is interpreted as TINYINT(1), so there's no difference there.
BIT has a "this is a boolean" semantic, and some apps will consider TINYINT(1) the same way (due to the way MySQL used to treat it), so apps may format the column as a check box if they check the type and decide upon a format based on that.
Might be wrong but:
Tinyint is an integer between 0 and 255
bit is either 1 or 0
Therefore to me bit is the choice for booleans
From my experience I'm telling you that BIT has problems on linux OS types(Ubuntu for ex).
I developped my db on windows and after I deployed everything on linux, I had problems with queries that inserted or selected from tables that had BIT DATA TYPE.
Bit is not safe for now.
I changed to tinyint(1) and worked perfectly. I mean that you only need a value to diferentiate if it's 1 or 0 and tinyint(1) it's ok for that