What is the difference between BIT and TINYINT in MySQL? - mysql

In which cases would you use which? Is there much of a difference? Which I typically used by persistence engines to store booleans?

A TINYINT is an 8-bit integer value, a BIT field can store between 1 bit, BIT(1), and 64 bits, BIT(64). For a boolean values, BIT(1) is pretty common.

From Overview of Numeric Types;
BIT[(M)]
A bit-field type. M indicates the
number of bits per value, from 1 to
64. The default is 1 if M is omitted.
This data type was added in MySQL
5.0.3 for MyISAM, and extended in 5.0.5 to MEMORY, InnoDB, BDB, and NDBCLUSTER. Before 5.0.3, BIT is a
synonym for TINYINT(1).
TINYINT[(M)] [UNSIGNED] [ZEROFILL]
A very small integer. The signed range
is -128 to 127. The unsigned range is
0 to 255.
Additionally consider this;
BOOL, BOOLEAN
These types are synonyms for
TINYINT(1). A value of zero is
considered false. Non-zero values are
considered true.

All these theoretical discussions are great, but in reality, at least if you're using MySQL and really for SQLServer as well, it's best to stick with non-binary data for your booleans for the simple reason that it's easier to work with when you're outputting the data, querying and so on. It is especially important if you're trying to achieve interoperability between MySQL and SQLServer (i.e. you sync data between the two), because the handling of BIT datatype is different in the two of them. SO in practice you will have a lot less hassles if you stick with a numeric datatype. I would recommend for MySQL to stick with BOOL or BOOLEAN which gets stored as TINYINT(1). Even the way MySQL Workbench and MySQL Administrator display the BIT datatype isn't nice (it's a little symbol for binary data). So be practical and save yourself the hassles (and unfortunately I'm speaking from experience).

BIT should only allow 0 and 1 (and NULL, if the field is not defined as NOT NULL). TINYINT(1) allows any value that can be stored in a single byte, -128..127 or 0..255 depending on whether or not it's unsigned (the 1 shows that you intend to only use a single digit, but it does not prevent you from storing a larger value).
For versions older than 5.0.3, BIT is interpreted as TINYINT(1), so there's no difference there.
BIT has a "this is a boolean" semantic, and some apps will consider TINYINT(1) the same way (due to the way MySQL used to treat it), so apps may format the column as a check box if they check the type and decide upon a format based on that.

Might be wrong but:
Tinyint is an integer between 0 and 255
bit is either 1 or 0
Therefore to me bit is the choice for booleans

From my experience I'm telling you that BIT has problems on linux OS types(Ubuntu for ex).
I developped my db on windows and after I deployed everything on linux, I had problems with queries that inserted or selected from tables that had BIT DATA TYPE.
Bit is not safe for now.
I changed to tinyint(1) and worked perfectly. I mean that you only need a value to diferentiate if it's 1 or 0 and tinyint(1) it's ok for that

Related

Best performance in MYSQL 8 between BIT or BOOL for search

I'm using MySQL 8.0 CE. I have an attribute where I want to store a TRUE or FALSE status and I want it to use as little space as possible.
After reading the answers in this question:
MySQL: Smallest datatype for one bit and the MySQL documentation Bit-Value Type - BIT and Integer Types (Exact Value), I understand that at storage level it is better is to use BIT(1) because BOOL is actually a TINYINT(1) and therefore uses full 1byte.
At storage level it is clear that BIT(1) is the best option but, at performance level when searching for true or false?
If I understand correctly BIT would store 1 or 0 while BOOL stores TRUE or FALSE.
That difference makes that when searching between both possibilities one of the types is better optimized for it?
Thanks.
BIT(1) also requires minimum 1 byte, so you're not saving any space compared to BOOL/TINYINT. Both take 1 byte.
Speaking to MySQL developers, they usually wince when I bring up the BIT data type. It's full of known bugs, and likely undiscovered bugs. The internal code is poorly understood. They told me to just use TINYINT.
By the way, MySQL doesn't have a true BOOL type. BOOL is just an alias for TINYINT(1), and there is no true or false value. The "false" value is literally the integer 0, and the "true" value is the integer 1. In other words, you can SUM() a column that is supposedly boolean, and you get an integer sum equal to the number of rows where the column is "true." This is not compliant with standard SQL (it makes no sense to SUM() a boolean column), but it's the way BOOL is implemented in MySQL.
Consideing that in mysql boolean is the same as tinyint(1). That said, boolean always uses 1 byte per column but bit(n) will use as few bytes that are needed to hold the given number of bits. BIT save some space however i would use boolean because it makes things simpler at the moment you want to query a database. in terms you can have values other than 0 or 1 if you are not careful. To avoid this, you can use the aliases TRUE and FALSE.
(In addition to what Bill says...)
I have a Rule Of Thumb: "If a back-of-envelope estimate doesn't suggest at least a 10% improvement, move on to some other optimization". Couple that with the fact that even if a single-bit bool would shrink the row length by 7 bits, that would probably be less than 1% savings. So, I move on.
OTOH, If you have lots of booleans, then consider the SET datatype, which can handle up to 64 'booleans' in 8 bytes or less. But it's rather clumsy. So is a similar thing with the 64 bits of BIGINT UNSIGNED, together with shifting and boolean operators (<<, &, etc.)
If you need to INDEX one or several booleans, forget it. About the only case where indexing works is in a composite (multi-column) index where one of the columns is [effectively] TINYINT NOT NULL

MySql - Which column type to use for storing multiple booleans in multiple columns?

I would like to save multiple booleans in a table. I want to use multiple columns rather than one Bit(N) column. Now I'm thinking about whether I should use Bool = tinyint(1) or bit(1) columns.
I read this older answer from a similar question and want to know if
but if you had more true/false columns i suggest you to use Bit as each value of the bit columns are placed in the same 1 Byte until it is filled.
is true. Can anybody confirm this? Which column type should I use in the year 2020 for this case?
Thanks,
Regards
The native BOOLEAN type is intended to be used to store booleans. Yes, apparently BOOLEAN takes up a byte instead of a bit (it may not actually be the case that BIT(1) only uses one bit of space; see notes below). But it's not going to make a noticeable difference in how much space your database takes up. Consider this, if you have 10 booleans in a table and you end up with a million records, that's just 10MB of space taken up by the booleans vs 1.25MB taken up if you used a bit. Even if you get to 100 million records, that's only 1GB of space. If you have 100 million records, you'll have enough space that 1GB won't matter.
Here are some notes on BOOLEAN, TINYINT and BIT that might help clarify why you'd want to go with BOOLEAN:
BOOLEAN is intended to be used to store boolean values. You can trust the implementation details to the Mysql developers.
BOOLEAN carries semantic meaning; it clearly indicates the intended purpose of the column is to store a boolean.
It turns out that BIT(1) actually takes up 1 byte as well. From the documentation:
BIT(M) requires approximately (M+7)/8 bytes
So BIT(1) would require (1+7)/8 bytes or 1 byte.
You may read that, since BOOLEAN is synonymous with TINYINT, you can store values other than TRUE and FALSE in a BOOLEAN (e.g. you might be able to store a 22). However, if you try to insert e.g. 22 into a BOOLEAN column, Mysql will interpret it as TRUE (and it will interpret a 0 as FALSE). So, you don't need to worry about weird values getting into your BOOLEAN column. See this SQL fiddle for an example.
you can also use VARCHAR data type to store the bit(N) and it will take space with what you store in the first place, example "0101000" which means you have 7 boolean. The 2nd and 4th would be TRUE and other would be FALSE that would use 7+1 bytes in your storage (CMIIW)
cheers :)

mysql , bigint or decimal for storing > 32 bit values but less than 64 bits

We're needing to store integer values of up to 2^38 . Are there any reasons to use decimal(12,0) or should we use bigint ?
In my view, bigint would be better. It's stored as an integer that MySQL will understand natively without any conversion required, and will therefore (I imagine) be faster at manipulating. You should therefore expect MySQL to be marginally more efficient if you use bigint.
According to this manual page, the first 9 digits of your number will be stored in a four-byte block and the remaining digits (you require up to 12) will be stored in a two-byte block. That means your column takes up 6 bytes per row, as opposed to 8 bytes for bigint. I would suggest that unless a) you are going to be storing a truly obscene number of rows, such that the space taken up is a serious concern, and b) you are going to need to query the data in question very little, you should go with bigint.
This is an assumption, but I think its a good one ... on a 64bit machine, i'm pretty sure accessing a 64bit integer is very efficient, so you should stick with bigint. i don't know off-hand how mysql stores decimals, but i can't imagine how it would do so more efficiently than storing a 64-bit integer.

How to choose optimized datatypes for columns [innodb specific]?

I'm learning about the usage of datatypes for databases.
For example:
Which is better for email? varchar[100], char[100], or tinyint (joking)
Which is better for username? should I use int, bigint, or varchar?
Explain. Some of my friends say that if we use int, bigint, or another numeric datatype it will be better (facebook does it). Like u=123400023 refers to user 123400023, rather then user=thenameoftheuser. Since numbers take less time to fetch.
Which is better for phone numbers? Posts (like in blogs or announcments)? Or maybe dates (I use datetime for that)? maybe some have make research that would like to share.
Product price (I use decimal(11,2), don't know about you guys)?
Or anything else that you have in mind, like, "I use serial datatype for blablabla".
Why do I mention innodb specifically?
Unless you are using the InnoDB table
types (see Chapter 11, "Advanced
MySQL," for more information), CHAR
columns are faster to access than
VARCHAR.
Inno db has some diffrence that I don't know.
I read that from here.
Brief Summary:
(just my opinions)
for email address - VARCHAR(255)
for username - VARCHAR(100) or VARCHAR(255)
for id_username - use INT (unless you plan on over 2 billion users in you system)
phone numbers - INT or VARCHAR or maybe CHAR (depends on if you want to store formatting)
posts - TEXT
dates - DATE or DATETIME (definitely include times for things like posts or emails)
money - DECIMAL(11,2)
misc - see below
As far as using InnoDB because VARCHAR is supposed to be faster, I wouldn't worry about that, or speed in general. Use InnoDB because you need to do transactions and/or you want to use foreign key constraints (FK) for data integrity. Also, InnoDB uses row level locking whereas MyISAM only uses table level locking. Therefore, InnoDB can handle higher levels of concurrency better than MyISAM. Use MyISAM to use full-text indexes and for somewhat less overhead.
More importantly for speed than the engine type: put indexes on the columns that you need to search on quickly. Always put indexes on your ID/PK columns, such as the id_username that I mentioned.
More details:
Here's a bunch of questions about MySQL datatypes and database design (warning, more than you asked for):
What DataType should I pick?
Table design question
Enum datatype versus table of data in MySQL?
mysql datatype for telephne number and address
Best mysql datatype for grams, milligrams, micrograms and kilojoule
MySQL 5-star rating datatype?
And a couple questions on when to use the InnoDB engine:
MyISAM versus InnoDB
When should you choose to use InnoDB in MySQL?
I just use tinyint for almost everything (seriously).
Edit - How to store "posts:"
Below are some links with more details, but here's the short version. For storing "posts," you need room for a long text string. CHAR max length is 255, so that's not an option, and of course CHAR would waste unused characters versus VARCHAR, which is variable length CHAR.
Prior to MySQL 5.0.3, VARCHAR max length was 255, so you'd be left with TEXT. However, in newer versions of MySQL, you can use VARCHAR or TEXT. The choice comes down to preference, but there are a couple differences. VARCHAR and TEXT max length is now both 65,535, but you can set you own max on VARCHAR. Let's say you think your posts will only need to be 2000 max, you can set VARCHAR(2000). If you every run into the limit, you can ALTER you table later and bump it to VARCHAR(3000). On the other hand, TEXT actually stores its data in a BLOB (1). I've heard that there may be performance differences between VARCHAR and TEXT, but I haven't seen any proof, so you may want to look into that more, but you can always change that minor detail in the future.
More importantly, searching this "post" column using a Full-Text Index instead of LIKE would be much faster (2). However, you have to use the MyISAM engine to use full-text index because InnoDB doesn't support it. In a MySQL database, you can have a heterogeneous mix of engines for each table, so you would just need to make your "posts" table use MyISAM. However, if you absolutely need "posts" to use InnoDB (for transactions), then set up a trigger to update the MyISAM copy of your "posts" table and use the MyISAM copy for all your full-text searches.
See bottom for some useful quotes.
MySQL Data Type Chart (outdated)
MySQL Datatypes (outdated)
Chapter 10. Data Types (better details)
The BLOB and TEXT Types (1)
11.9. Full-Text Search Functions (2)
10.4.1. The CHAR and VARCHAR Types (3)
(3) "Values in VARCHAR columns are
variable-length strings. The length
can be specified as a value from 0 to
255 before MySQL 5.0.3, and 0 to
65,535 in 5.0.3 and later versions.
Before MySQL 5.0.3, if you need a data
type for which trailing spaces are not
removed, consider using a BLOB or TEXT
type.
When CHAR values are stored, they are
right-padded with spaces to the
specified length. When CHAR values are
retrieved, trailing spaces are
removed.
Before MySQL 5.0.3, trailing spaces
are removed from values when they are
stored into a VARCHAR column; this
means that the spaces also are absent
from retrieved values."
Lastly, here's a great post about the pros and cons of VARCHAR versus TEXT. It also speaks to the performance issue:
VARCHAR(n) Considered Harmful
There are multiple angles to approach your question.
From a design POV it is always best to chose the datatype which expresses the quantity you want to model best. That is, get the data domain and data size right so that illegal data cannot be stored in the database in the first place. But that is not where MySQL is strong in the first place, and especially not with the default sql_mode (http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html). If it works for you, try the TRADITIONAL sql_mode, which is a shorthand for many desireable flags.
From a performance POV, the question is entirely different. For example, regarding the storage of email bodies, you might want to read http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ and then think about that.
Removing redundancies and having short keys can be a big win. For example, in a project that I have seen, a log table has been storing http User-Agent information. By simply replacing each user agent string in the log table with a numeric id of a user agent string in a lookup table, data set size was considerably (more than 60%) reduced. By parsing the user agent further and then storing a bunch of ids (operating system, browser type, version index) data set size was reduced to 1% of the original size.
Finally, there is a number of rules that can help you spot errors in schema design.
For example, anything that has id in the name and is not an unsigned integer type is probably a bug (especially in the context of innodb).
For example, anything that has price or cost in the name and is not unsigned is a potential source of fraud (fraudster creates article with negative price, and buys that).
For example, anything that works on monetary data and is not using the DECIMAL data type of the appropriate size is probably doing math wrong (DECIMAL is doing BCD, decimal paper math with correct precision and rounding, DOUBLE and FLOAT do not).
SQLyog has Calculate optimal datatype feature which helps in finding out optimal datatype based on records inserted in a table.
It uses
SELECT * FROMtable_name` PROCEDURE ANALYSE(1, 10);
query to find out optimal datatype

Decimal VS Int in MySQL?

Are there any performance difference between decimal(10,0) unsigned type and int(10) unsigned type?
It may depend on the version of MySQL you are using. See here.
Prior to MySQL 5.0.3, the DECIMAL type was stored as a string and would typically be slower.
However, since MySQL 5.0.3 the DECIMAL type is stored in a binary format so with the size of your DECIMAL above, there may not be much difference in performance.
The main performance issue would have been the amount of space taken up by the different types (with DECIMAL being slower). With MySQL 5.0.3+ this appears to be less of an issue, however if you will be performing numeric calculations on the values as part of the query, there may be some performance difference. This may be worth testing as there is no indication in the documentation that i can see.
Edit:
With regards to the int(10) unsigned, i took this at face value as just being a 4 byte int. However this has a maximum value of 4294967295 which strictly doesn't provide the same range of numbers as a DECIMAL(10,0) unsigned .
As #Unreason pointed out, you would need to use a bigint to cover the full range of 10 digit numbers, pushing the size up to 8 bytes.
A common mistake is that when specifying numeric columns types in MySQL, people often think the number in the brackets has an impact on the size of the number they can store. It doesn't. The number range is purely based on the column type and whether it is signed or unsigned. The number in the brackets is for display purposes in results and has no impact on the values stored in the column. It will also have no impact of the display of the results unless you specify the ZEROFILL option on the column as well.
According to the mysql data storage your decimal will require
DECIMAL(10,0): 4 bytes for 9 digits and 1 byte for the remaining 10th digit, so in total five bytes (assuming my reading of documentation is correct).
INT(10): will need BIGINT which is 8 bytes.
The differences is that the decimal is packed and some operations on such data type might be slower then on normal INT types which map directly to machine represented numbers.
Still I would do your own tests to confirm the above reasoning.
EDIT:
I noticed that I did not elaborate on the obvious point - assuming the above logic is sound the difference in size required is 60% more space needed for BIGINT variant.
However this does not directly translate to penalties due to the fact that data is normally not written byte by byte. In case of selects/updates of many rows you should see the performance loss/gain, but in case of selecting/updating a small number of rows the filesystem will fetch blocks from the disk(s) which will normally get/write multiple columns anyway.
The size (and speed) of indexes might be more directly impacted.
However, the question on how the packing influences various operations still remains open.
According to this similar question, yes, potentially there is a big performance hit because of difference in the way DECIMAL and INT are treated and threaded into the CPU when doing calculations.
See: Is there a performance hit using decimal data types (MySQL / Postgres)
I doubt such a difference can be performance related at all.
Most of performance issues tied to proper database design and indexing plan, and server/hardware tuning as a next level.