Enum or Bool in mysql? - mysql

Simple silly question. What is better?
A Bool or an Enum('y','n') ?

BOOLEAN is an alias for TINYINT(1) and is stored as one byte of data.
ENUM('y','n') is also stored as 1 byte of data.
So from a storage size point of view, neither is better.
However you can store 9 in a BOOLEAN field and it will accept it. So if you want to force two states only, go for ENUM.

Here's the problem with storing boolean values as an enum:
SELECT count(*) FROM people WHERE is_active = true; #=> Returns 0 because true != 'true'
Which is misleading because:
SELECT count(*) FROM people WHERE is_active = 'true'; #=> Returns 10
If you're writing all of your own SQL queries, then you would know to not to pass an expression into your query, but if you're using an ORM you're going to run into trouble since an ORM will typically convert the expression to something the database it's querying can understand ('t'/'f' for SQLite; 0/1 for MySQL etc.)
In short, while one may not be faster than the other at the byte level, booleans should be stored as expressions so they can be compared with other expressions.
At least, that's how I see it.

TINYINT(1) - it looks like a Boolean, so make it one.
Never compare internally to things like y when a Boolean (0/1) is available.

Neither are best for storing a single bit (or boolean). The enum has a lookup table, and stores the answer as an integer. The boolean is actually just an alias for "TINYINT(1)" which is technically 8 bits of information. The bit data type will only store as many bits as in its definition (like in the varchar type) so a bit(1) will literally only store one bit. However, if you only have one of these fields, then the question is moot, as nothing will fill the remaining bits, so they will be unused space on each row (amount of space each row is rounded up to at least a byte, typically 8 bits, per row).

A lot of default advise is to use BOOL/TINYINT(1), but as stated in the answer at https://stackoverflow.com/a/4180982/2045006 this allow 9 variations of TRUE.
In many cases this does not matter, but if your column will be part of a unique index then this will become quite a problem.
In the case that you will use the column in a unique index, I would recommend using BIT(1).
ENUM would also work well with a unique index (provided you have a suitable SQL Mode set.) However, I would use ENUM only when you want to work with string representations of true/false rather than actual boolean values.

Depending on the language you're using to interface with the database, you can run into case sensitivity issues by using enum, for example if your database uses a lowercase 'y' but your code expects an uppercase 'Y'. A bool/tinyint will always be 0 or 1 (or NULL) so it avoids this problem.

There are 8 reasons for not using ENUM data type;
So, instead of ENUM, either use boolean or a reference foreign table.

Related

Best performance in MYSQL 8 between BIT or BOOL for search

I'm using MySQL 8.0 CE. I have an attribute where I want to store a TRUE or FALSE status and I want it to use as little space as possible.
After reading the answers in this question:
MySQL: Smallest datatype for one bit and the MySQL documentation Bit-Value Type - BIT and Integer Types (Exact Value), I understand that at storage level it is better is to use BIT(1) because BOOL is actually a TINYINT(1) and therefore uses full 1byte.
At storage level it is clear that BIT(1) is the best option but, at performance level when searching for true or false?
If I understand correctly BIT would store 1 or 0 while BOOL stores TRUE or FALSE.
That difference makes that when searching between both possibilities one of the types is better optimized for it?
Thanks.
BIT(1) also requires minimum 1 byte, so you're not saving any space compared to BOOL/TINYINT. Both take 1 byte.
Speaking to MySQL developers, they usually wince when I bring up the BIT data type. It's full of known bugs, and likely undiscovered bugs. The internal code is poorly understood. They told me to just use TINYINT.
By the way, MySQL doesn't have a true BOOL type. BOOL is just an alias for TINYINT(1), and there is no true or false value. The "false" value is literally the integer 0, and the "true" value is the integer 1. In other words, you can SUM() a column that is supposedly boolean, and you get an integer sum equal to the number of rows where the column is "true." This is not compliant with standard SQL (it makes no sense to SUM() a boolean column), but it's the way BOOL is implemented in MySQL.
Consideing that in mysql boolean is the same as tinyint(1). That said, boolean always uses 1 byte per column but bit(n) will use as few bytes that are needed to hold the given number of bits. BIT save some space however i would use boolean because it makes things simpler at the moment you want to query a database. in terms you can have values other than 0 or 1 if you are not careful. To avoid this, you can use the aliases TRUE and FALSE.
(In addition to what Bill says...)
I have a Rule Of Thumb: "If a back-of-envelope estimate doesn't suggest at least a 10% improvement, move on to some other optimization". Couple that with the fact that even if a single-bit bool would shrink the row length by 7 bits, that would probably be less than 1% savings. So, I move on.
OTOH, If you have lots of booleans, then consider the SET datatype, which can handle up to 64 'booleans' in 8 bytes or less. But it's rather clumsy. So is a similar thing with the 64 bits of BIGINT UNSIGNED, together with shifting and boolean operators (<<, &, etc.)
If you need to INDEX one or several booleans, forget it. About the only case where indexing works is in a composite (multi-column) index where one of the columns is [effectively] TINYINT NOT NULL

MySql - Which column type to use for storing multiple booleans in multiple columns?

I would like to save multiple booleans in a table. I want to use multiple columns rather than one Bit(N) column. Now I'm thinking about whether I should use Bool = tinyint(1) or bit(1) columns.
I read this older answer from a similar question and want to know if
but if you had more true/false columns i suggest you to use Bit as each value of the bit columns are placed in the same 1 Byte until it is filled.
is true. Can anybody confirm this? Which column type should I use in the year 2020 for this case?
Thanks,
Regards
The native BOOLEAN type is intended to be used to store booleans. Yes, apparently BOOLEAN takes up a byte instead of a bit (it may not actually be the case that BIT(1) only uses one bit of space; see notes below). But it's not going to make a noticeable difference in how much space your database takes up. Consider this, if you have 10 booleans in a table and you end up with a million records, that's just 10MB of space taken up by the booleans vs 1.25MB taken up if you used a bit. Even if you get to 100 million records, that's only 1GB of space. If you have 100 million records, you'll have enough space that 1GB won't matter.
Here are some notes on BOOLEAN, TINYINT and BIT that might help clarify why you'd want to go with BOOLEAN:
BOOLEAN is intended to be used to store boolean values. You can trust the implementation details to the Mysql developers.
BOOLEAN carries semantic meaning; it clearly indicates the intended purpose of the column is to store a boolean.
It turns out that BIT(1) actually takes up 1 byte as well. From the documentation:
BIT(M) requires approximately (M+7)/8 bytes
So BIT(1) would require (1+7)/8 bytes or 1 byte.
You may read that, since BOOLEAN is synonymous with TINYINT, you can store values other than TRUE and FALSE in a BOOLEAN (e.g. you might be able to store a 22). However, if you try to insert e.g. 22 into a BOOLEAN column, Mysql will interpret it as TRUE (and it will interpret a 0 as FALSE). So, you don't need to worry about weird values getting into your BOOLEAN column. See this SQL fiddle for an example.
you can also use VARCHAR data type to store the bit(N) and it will take space with what you store in the first place, example "0101000" which means you have 7 boolean. The 2nd and 4th would be TRUE and other would be FALSE that would use 7+1 bytes in your storage (CMIIW)
cheers :)

What use has mysql data type 'BINARY(0)'?

Right now I'm trying to learn the details of MySQL. The type BINARY needs as many storage bytes as provided via its parameter, so for example, if I define a column as BINARY(8) it consumes 8 bytes.
On the site https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings, there is a table mapping the types to their storage requirements. And it says that I can define a BINARY(0). But in my opinion, it does not make sense. BINARY(0) would mean that I can store 0 bytes - so nothing. Do I miss a thing? What use does it have? Or what is the reason for that?
On the other hand, I cannot define a bigger BINARY-column than one with 255 bytes. I always thought the reason for 255 is that you start counting at 0. But when you don't need a BINARY(0) you could define a BINARY(256) without problems...
I had to poke around on this one, because I didn't know myself. From this link, we can see that BINARY(0) can store two types of values:
NULL
empty string
So, you could use a BINARY(0) column much in the same way you would use a non nullable BIT(1) column, namely as a true/false or yes/no column. However, the storage requirement of BINARY(0) is just one bit, which requires no additional storage beyond the boundary for nullable columns.
Since the non NULL state of the BINARY(0) column would be empty string, which translates to zero, you could find all such records using:
SELECT *
FROM yourTable
WHERE bin_zero_column = 0;
The unmarked NULL records could find found using WHERE bin_zero_column IS NULL.

Exclude records with empty binary column data

I have a column with type binary(16) not null in a mysql table. Not all records have data in this column, but because it setup to disallow NULL such records have an all-zero value.
I want to exclude these all-zero records from a query.
So far, the only solution I can find is to HEX the value and compare that:
SELECT uuid
FROM example
WHERE HEX(uuid) != '00000000000000000000000000000000'
which works, but is there a better way?
To match a binary type using a literal, use \0 or \1 for the bits.
In your case with a binary(16), this is how to exclude only zeroes:
where uuid != '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'
See SQLFiddle.
The advantage of using a plain comparison with a literal (like this) is that an index on the column can be used and it's a lot faster. If you invoke functions to make the comparison, indexes will not be used and the function(s) must be invoked on every row, which can cause big performance problems on large tables.
SELECT uuid FROM example WHERE TRIM('\0' FROM uuid)!='';
Note that Bohemians answer is a lot neater, I just use this when I am not sure about the length of the field (Which probably comes down to bad design on some level, feel free to educate me).
select uuid from example where uuid!=cast('' as binary(N));
where N is the length of the UUID column. Not pretty, but working.

Is it bad to use enum('y','n') instead of a boolean field in a MySQL table?

So a few years ago I saw the DB schema of a system developed by a 3rd party and noticed they used enum('y','n') instead of a boolean (tinyint) field. I don't know why but I loved it so much, I found it made things easier to read (totally subjective I know) but I adopted it and started using it ever since then. I suppose I could swap it for "true" and "false" but, what can I say, I just liked it.
Now that being said, are there any setbacks to doing things this way -- aside from maybe slightly annoying a programmer who'd come in late in the game?
Yes, it is bad. You lose intuitive boolean logic with it (SELECT * FROM user WHERE NOT banned becomes SELECT * FROM user WHERE banned = 'n'), and you receive strings instead of booleans on your application side, so your boolean conditions there become cumbersome as well. Other people who work with your schema will get bitten by seeing flag-like column names and attempting to use boolean logic on them.
As explained in the manual:
If you insert an invalid value into an ENUM (that is, a string not present in the list of permitted values), the empty string is inserted instead as a special error value. This string can be distinguished from a “normal” empty string by the fact that this string has the numeric value 0. See Section 11.4.4, “ Index Values for Enumeration Literals ” for details about the numeric indexes for the enumeration values.
If strict SQL mode is enabled, attempts to insert invalid ENUM values result in an error.
In this respect, an ENUM results in different behaviour to a BOOLEAN type; otherwise I'm inclined to agree with #lanzz's answer that it makes integration with one's application that little bit less direct.
One factor to consider is whether the people who wrote the original schema limit it to MySQL or not. If it is only intended to run on MySQL, then adapting to MySQL makes sense. If the same schema is intended to be usable with other DBMS, then a more generic schema design that works in all the relevant DBMS may be better for the people making the design.
With that said, the enum is moderately MySQL specific, but something equivalent to enum can easily be created in other DBMS:
CREATE TABLE ...
(
...
FlagColumn CHAR(1) NOT NULL CHECK(FlagColumn IN ('y', 'n')),
...
);
The way that different DBMS handle BOOLEAN is not as uniform as you'd like it to be, SQL Standard notwithstanding (and the reason is, as ever, history; the less conformant systems had a variation on the theme of BOOLEAN before the standard did, and changing their implementation breaks the code of their existing customers).
So, I would not automatically condemn the use of enum over boolean, but it is better to use boolean for boolean flags.