enum or char(1) in MySQL - mysql

Sometimes I am not sure whether using enum or char(1) in MysQL. For instance, I store statuses of posts. Normally, I only need Active or Passive values in status field. I have two options:
// CHAR
status char(1);
// ENUM (but too limited)
status enum('A', 'P');
What about if I want to add one more status type (ie. Hidden) in the future? If I have small data, it won't be an issue. But if i have too large data, so editing ENUM type will be problem, i think.
So what's your advice if we also think about MySQL performance? Which way I would go?

Neither. You'd typically use tinyint with a lookup table
char(1) will be slightly slower because comparing uses collation
confusion: As you extend to more than A and P
using a letter limits you as you add more types. See last point.
every system I've seen has more then one client eg reporting. A and P have to resolved to Active and Passive for in each client code
extendibility: add one more type ("S" for "Suspended") you can one row to a lookup table or change a lot of code and constraints. And your client code too
maintenance: logic is in 3 places: database constraint, database code and client code. With a lookup and foreign key, it can be in one place
Enum is not portable
On the plus side of using a single letter or Enum
Note: there is a related DBA.SE MySQL question about Enums. The recommendation is to use a lookup table there too.

You can use
status enum('Active', 'Passive');
It will not save a string in the row, it will only save a number that is reference to enum member in the table structure, so the size is the same but its more readable than char(1) or your enum.
Editing enum is not a problem no matter how big your data is

I would use a binary SET field for this, but without labelling the options specifically within the database. All the "labelling" would be done within your code, but it does provide some very flexible options.
For example, you could create a SET containing eight "options" such as;
`column_name` SET('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h') NOT NULL DEFAULT ''
Within your application, you can then define the 'a' as denoting "Active" or "Passive", the 'b' can denote "Hidden", and the rest can be left undefined until you need them.
You can then use all sorts of useful binary operations on the field for instance you could extract all those which are "Hidden" by running;
WHERE `column_name` & 'b'
And all those which are "Active" AND "Hidden" by running;
WHERE `column_name` & 'a' AND `column_name` & 'b'
You can even use the LIKE and FIND_IN_SET operators to do even more useful queries.
Read the MySQL documentation for further information;
http://dev.mysql.com/doc/refman/5.1/en/set.html
Hope it helps!
Dave

Hard to tell without knowing the semantics of your statuses, but to me "hidden" doesn't seem like an alternative to "active" or "passive", i.e. you might want to have both "active hidden" and "passive hidden"; this would degenerate with each new non-exclusive "status", it would be better to implement your schema with boolean flags: one for the active/passive distinction, and one for the hidden/visible distinction. Queries become more readable when your condition is "WHERE NOT hidden" or "WHERE active", instead of "WHERE status = 'A'".

Related

Queries with one item in the list in `__in` are extremely slow. Otherwise, super fast

I am retrieving event_id's by name with the code below:
events = Events.objects.values_list('event__id', flat=True). \
filter(name__in=names).distinct()
Everything is working great except when names consist of just one name. If I change my code to:
events = Events.objects.values_list('event__id', flat=True). \
filter(name__in=names + ['x']).distinct()
Once again, it becomes super fast. I am seriously going crazy cause this makes no sense. I used print(events.query) and it uses the same query basically, just the list changes. How is this possible?
The execution time with one name in the list lasts for 30-60secs, otherwise it takes just 100-1000ms. The amount of event_ids don't change dramatically, so it's not the size issue.
I used EXPLAIN and the difference seems to be:
Extra: Using where; Using index
Extra: Using index
And:
type: range
type: ref
More details and clarification would definitely help.
Such as:
Event model (will help reproducing the issue and give required background)
events.query SQL statement (very helpful)
values_list('event__id') suggests Event model may have ForeignKey to self, combined with retrieving event_id's by name just adds more frustration (it may be valid in fact)
how many records in events table? 100-1000ms is not very optimal query time
First thing to suggest - take a look at distinct().
To make sure only selected columns are present in select and thus distinct is just over this one column and simpler query plan - clear ordering from the QuerySet with empty order_by().
events = Events.objects.values_list('event__id', flat=True). \
filter(name__in=names + ['x']).order_by().distinct()
Description:
With distinct() Django performs SELECT DISTINCT sql query is to remove duplicate rows. Note duplicate rows which means unique rows across all columns of the SELECT, not with unique values in one specific column.
values_list('event__id', flat=True) on first look may suggest that only event_id is present in SELECT (i.e. SELECT DISTINCT event_id FROM events ...), but that is not like that - Django just takes values from columns listed in values_list of the result, but SELECT may contain any other columns Django thinks are required for the query.
So, your events.query may actually look like SELECT DISTINCT event_id, col_2, name FROM events ... which not only produces different results than distinct on one column (in some cases same results if unique column is included, i.e. id) but also may result in more complicated query plan. Also, col_2 may not even be present in QuerySet.
Django includes columns it thinks are required to run the QuerySet. I.e. this may be default ordering column set on the model - the one present if no ordering is set on QuerySet.
Have you checked the type of names when is just one name? It should work the same independent of the length of the names list, tuple, etc... However if when you have only one name in names then it is a string, not a list.
Check the example in the documentation if you pass a string, Django, and python in general, treats the string as a list of characters.
Then, if names='Django Reinhardt':
filter(name__in=names)
would become:
filter(name__in=['D', 'j', 'a', 'n', 'g', 'o', ' ',
'R', 'e', 'i', 'n', 'h', 'a', 'r', 'd', 't'])
which surely isn't the desired behavior in your case.
Be sure to enforce that names is a list even when just one provided. So when names=['Django Reinhardt]
Your code would evaluate to:
filter(name__in=['Django Reinhardt']
If you provide more details on how you obtain/construct ´names´ I could provide more help on this.

How to manipulate SET datatype in MySQL with fast binary operations?

I want to efficiently store and efficiently manipulate bit flags for a record in MySQL. The SET datatype satisfies the first wish because up to 64 flags are stored as a single number. But what about the second? I have seen only awkward solutions like
UPDATE table_name SET set_col = (set_col | 4) WHERE condition;
UPDATE table_name SET set_col = (set_col & ~4) WHERE condition;
to respectively include and exclude a member into the value. I.e. I have to use numeric constants, which renders the code unmaintainable. Then I could have used INT datatype as well. If set_col definition gets changed (adding, removing or reordering the possible members), the code with hard-coded constants becomes a mess. I could try to enforce some discipline on coders to use only named variables in application language instead of numeric constants which would make maintenance easier, but not totally error-proof. Is there a solution where MySQL would resolve the symbolic names of set members to their correct numeric values? E.g. this does not work:
UPDATE person SET tag=tag | 'MGR'
To stem useless answers, I know about database normalization and a separate m-to-n relationship table, that is not the topic here. If you need a more concrete example, here you are:
CREATE TABLE `coder` (
`name` VARCHAR(50) NOT NULL,
`languages` SET('Perl','PHP','Java','Scala') NOT NULL
)
Changes to the set definition are unlikely but possible, maybe every other year, like splitting "Perl" into "Perl5" and "Perl6".
I found the answer here:
https://dev.mysql.com/doc/refman/5.7/en/set.html
Posted by John Kozura on April 12, 2011
Note that MySQL, at least
5.1+, seems to deal just fine with extra commas, so setting/deleting individual bits by name can be done very simply without creating a
"proper" list. So even something like SET flags=',,,foo,,bar,,' works
fine, if you don't care about a truncated data warning.
add bits:
UPDATE tbl SET flags=CONCAT_WS(',', flags, 'flagtoadd');
delete bits:
UPDATE tbl SET flags=REPLACE(flags, 'flagtoremove', '')
..or if you have a bit that's name is a subname of another bit like
"foo" and "foot", slightly more complicated:
UPDATE tbl SET flags=REPLACE(CONCAT(',', flags, ','), ',foo,', ',')
If the warnings do cause issues from you, then the solutions posted
above work:
add:
UPDATE tbl SET flags=TRIM(',' FROM CONCAT(flags, ',', 'flagtoadd'))
delete:
UPDATE tbl SET flags=TRIM(',' FROM REPLACE(CONCAT(',', flags, ','), ',flagtoremove,', ','))

Should I go with ENUM or TINYINT

My table contains a field 'priority'. Now I have following priorities to consider 'low', 'medium', 'high'.
What I am confused with is that:
Should I create a ENUM type field for priority values ?
Should I create a TINYINT type field and store values as 1, 2, 3 ?
Please note I would be required search and sort data based on this field.
Also, there will be indexing on this field.
You should use ENUM in case if you are sure none of the priority added in future because in that case you have to alter the table... BUt enum give surity of consistent data no other values gets inserted..
You should go with ENUM, as you said you have to search / sort, stroing them in tiny int would make you to additional processing like convert 1,2,3 back to 'low', 'medium', 'high'. while displaying.
Enum is ideal for such situvations
ENUM is a non-standard MySql extension. You should avoid it, especially if you can achieve the same results in a standard way. So its better to go with tinyint.

different column attributes for default values

Can anybody give me an example when to use
allow null
default 0
default '' and empty string.
In which situations should use these different configurations?
In general, avoid NULLs. NULL tends to require extra coding effort. Treatment for NULL versus empty string varies by RDBMS. Sorting of NULL within a set varies by RDBMS.
That said, you may wish to:
Use NULLs on foreign key columns when the related row is optional.
Use NULLs when you want values to be eliminated from aggregate operations. For example, if you have an "age" column, but don't require this information for all records, you would still be able to get meaningful information from: SELECT AVG(age) FROM mytable
Use NULLs when you need ternary logic.
1.A NULL value represents the absence of a value for a record in a field (others softwares call it also a missing value).
2.An empty value is a "field-formatted" value with no significant data in it.
3.NULL isn't allocated any memory, the string with NUll value is just a pointer which is pointing to nowhere in memory. however, Empty IS allocated to a memory location, although the value stored in the memory is "".
4.Null has no bounds, it can be used for string, integer, date, etc. fields in a database. Empty string is just regarding a string; it's a string like 'asdfasdf' is, but is just has no length. If you have no value for a field, use null, not an empty string.
5.Null is the database's determination of an absense of a value logically, so to speak. You can query like: where FIELD_NAME is NULL

MySQL Tri-state field

I need to create a good/neutral/bad field. which one would be the more understandable/correct way.
A binary field with null (1=good, null=neutral, 0=bad)
An int (1=good, 2=neutral, 3=bad)
An enum (good, neutral, bad)
Any other
It's only and informative field and I will not need to search by this.
NULL values should be reserved for either:
unknown values; or
not-applicable values;
neither of which is the case here.
I would simply store a CHAR value myself, one of the set {'G','N','B'}. That's probably the easiest solution and takes up little space while still providing mnemonic value (easily converting 'G' to 'Good' for example).
If you're less concerned about space, then you could even store them as varchar(7) or equivalent and store the actual values {'Good','Neutral','Bad'} so that no translation at all would be needed in your select statements (assuming those are the actual values you will be printing).
In Mysql you ought to be using an enum type. You can pick any names you like without worrying about space, because Mysql stores the data as a short integer. See 10.4.4. The ENUM Type in the documentation.