I'm designing DB tables for a log system. I have two ideas on my mind about a field. Should I create three "bit(1)" property or one "enum" property?
is_error bit(1)
is_test bit(1)
is_embedded bit(1)
or
boolErrors enum(is_error_true, is_error_false, is_test, is_test_false, is_embedded_ is_embedded_false)
Obviously, holding enum seems not proper in semantics and space but what about performance. Is fetching time increases when i have 3 columns instead of 1?
If, as it seems, the flags represent states (that is, only one flags may be true at a given point in time), then I would recommend a single column, as integer datatype. Instead of using ENUM, you can use a referrential table to store all possible flags and their names, an reference it from the original table, using the integer column.
On the other hand, if several flags may be on (say, both is_error and is_test), then a single column is not sufficient. You can either create several columns (if the list of flags never changes), or use a bridge table to store each status on a separate row.
If only one of those flags can be set at a time, use ENUM.
If multiple flags can be set at the same time, use SET.
Performance is not really something to worry about. The main "cost" in working with a row in a table is fetching the row, not the details of what goes on in the columns.
Sure, "smaller is better" for several reasons -- I/O, etc. But an ENUM is 1 or 2 bytes; a SET is up to 8 bytes (for up to 64 flags). Both of those are reasonably small for any use case.
As for speed and indexability, let's see the main queries.
Related
I would like to save multiple booleans in a table. I want to use multiple columns rather than one Bit(N) column. Now I'm thinking about whether I should use Bool = tinyint(1) or bit(1) columns.
I read this older answer from a similar question and want to know if
but if you had more true/false columns i suggest you to use Bit as each value of the bit columns are placed in the same 1 Byte until it is filled.
is true. Can anybody confirm this? Which column type should I use in the year 2020 for this case?
Thanks,
Regards
The native BOOLEAN type is intended to be used to store booleans. Yes, apparently BOOLEAN takes up a byte instead of a bit (it may not actually be the case that BIT(1) only uses one bit of space; see notes below). But it's not going to make a noticeable difference in how much space your database takes up. Consider this, if you have 10 booleans in a table and you end up with a million records, that's just 10MB of space taken up by the booleans vs 1.25MB taken up if you used a bit. Even if you get to 100 million records, that's only 1GB of space. If you have 100 million records, you'll have enough space that 1GB won't matter.
Here are some notes on BOOLEAN, TINYINT and BIT that might help clarify why you'd want to go with BOOLEAN:
BOOLEAN is intended to be used to store boolean values. You can trust the implementation details to the Mysql developers.
BOOLEAN carries semantic meaning; it clearly indicates the intended purpose of the column is to store a boolean.
It turns out that BIT(1) actually takes up 1 byte as well. From the documentation:
BIT(M) requires approximately (M+7)/8 bytes
So BIT(1) would require (1+7)/8 bytes or 1 byte.
You may read that, since BOOLEAN is synonymous with TINYINT, you can store values other than TRUE and FALSE in a BOOLEAN (e.g. you might be able to store a 22). However, if you try to insert e.g. 22 into a BOOLEAN column, Mysql will interpret it as TRUE (and it will interpret a 0 as FALSE). So, you don't need to worry about weird values getting into your BOOLEAN column. See this SQL fiddle for an example.
you can also use VARCHAR data type to store the bit(N) and it will take space with what you store in the first place, example "0101000" which means you have 7 boolean. The 2nd and 4th would be TRUE and other would be FALSE that would use 7+1 bytes in your storage (CMIIW)
cheers :)
Right now I'm trying to learn the details of MySQL. The type BINARY needs as many storage bytes as provided via its parameter, so for example, if I define a column as BINARY(8) it consumes 8 bytes.
On the site https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings, there is a table mapping the types to their storage requirements. And it says that I can define a BINARY(0). But in my opinion, it does not make sense. BINARY(0) would mean that I can store 0 bytes - so nothing. Do I miss a thing? What use does it have? Or what is the reason for that?
On the other hand, I cannot define a bigger BINARY-column than one with 255 bytes. I always thought the reason for 255 is that you start counting at 0. But when you don't need a BINARY(0) you could define a BINARY(256) without problems...
I had to poke around on this one, because I didn't know myself. From this link, we can see that BINARY(0) can store two types of values:
NULL
empty string
So, you could use a BINARY(0) column much in the same way you would use a non nullable BIT(1) column, namely as a true/false or yes/no column. However, the storage requirement of BINARY(0) is just one bit, which requires no additional storage beyond the boundary for nullable columns.
Since the non NULL state of the BINARY(0) column would be empty string, which translates to zero, you could find all such records using:
SELECT *
FROM yourTable
WHERE bin_zero_column = 0;
The unmarked NULL records could find found using WHERE bin_zero_column IS NULL.
I was just wondering about the efficiency of storing a large amount of boolean values inside of a CHAR or VARCHAR
data
"TFTFTTF"
vs
isFoo isBar isText
false true false
Would it be worth the worse performance by switching storing these values in this manner? I figured it would just be easier just to set a single value rather than having all of those other fields
thanks
Don't do it. MySQL offers types such as char(1) and tinyint that occupy the same space as a single character. In addition, MySQL offers enumerated types, if you want your flags to have more than one value -- and for the values to be recognizable.
That last point is the critical point. You want your code to make sense. The string 'FTF' does not make sense. The columns isFoo, isBar, and isText do make sense.
There is no need to obfuscate your data model.
This would be a bad idea, not only does it have no advantage in terms of the space used, it also has a bad influence on query performance and the comprehensibility of your data model.
Disk Space
In terms of storage usage, it makes no real difference whether the data is stored in a single varchar(n) or char(n) column or in multiple tinynt, char(1)or bit(1) columns. Only when using varchar you would need 1 to 2 bytes more disk space per entry.
For more information about the storage requirements of the different data types, see the MySql documentation.
Query Performance
If boolean values were stored in a VarChar, the search for all entries where a specific value is True would take much longer, since string operations would be necessary to find the correct entries. Even when searching for a combination of Boolean values such as "TFTFTFTFTT", the query would still take longer than if the boolean values were stored in individual columns. Furthermore you can assign indexes to single columns like isFoo or isBar, which has a great positive effect on query performance.
Data Model
A data model should be as comprehensible as possible and if possible independent of any kind of implementation considerations.
Realistically, a database field should only contain one atomic value, that is to say: a value that can't be subdivided into separate parts.
Columns that do not contain atomic values:
cannot be sorted
cannot be grouped
cannot be indexed
So let's say you want to find all rows where isFoo is true you wouldn't be able to do it unless you were to do string operations like "find the third characters in this string and see if it's equal to "F". This would imply a full table scan with every query which would degrade performance quite dramatically.
it depends on what you want to do after storing the data in this format.
after retrieving this record you will have to do further processing on the server side which worsen the performance if you want to load the data by checking specific conditions. the logic in the server would become complex.
The columns isFoo, isBar, and isText would help you to write queries better.
I have a properties table and each property can have many amenities.
Examples of amenities are "hasTerrace", "hasGarden" and "hasPrivateParking". There are about 50 amenities.
The list of amenities is unlikely to change in the future. I see two options:
Add boolean flags to the properties table for the 50 or so fields such as "hasTerrace".
Create separate tables called amenities and property_amenity and have a many-to-many relationship.
A typical use case would be querying for all properties that have any given number of amenities.
I am inclined to keep everything in one table and use booleans because:
The list of amenities is unlikely to change.
I think that my queries will be faster (probably a crap reason given all of the db tweaking that could be done).
My queries will be simpler.
I will write less code and that code will be less complex.
However, having 60 or so fields in a table seems a quite high.
Which would be the best database design for the above problem?
Consider using
SET ('hasTerrace', 'hasGarden', ...) NOT NULL
50 properties will take only 7 bytes. Setting and testing is a bit complex; see the manual.
Similarly a BIGINT UNSIGNED would allow for up to 64 flags in 8 bytes, but you would need to use numbers 0..63 and "shift" (eg, 1 << 22).
With the BIGINT approach, you can 'easily' test for any number of the flags being on/off in a single WHERE clause.
I cannot recommend one versus the other without knowing what types of queries you will need to perform.
As I understand, MySQL doesn't have any special SPARSE COLUMN directive, and any question like this is purely situational, so I'm wondering if there is a good rule of thumb for when to use a sparse column vs. when to create another table.
As a specific example, I have a table called Lessons. We want to add a lesProgramNumber, but this will only apply to about 10% of all lessons at any given time (it will be NULL for the other 90%). We could avoid a lot of NULL data by having a LessonsProgramNumber table very easily, but then this requires an additional JOIN at times. Is there an easy way to make a choice about what I need? What if Lessons only has 500 rows? What if it has 500 million?
InnoDB's COMPACT row format (which is the default) does not store anything when a column is NULL; it only stores non-null columns. It just skips that column in the on-disk row storage. So the cost of sparse columns is not so bad.
See http://dev.mysql.com/doc/refman/5.1/en/innodb-physical-record.html