Difference in performance between two similar sql queries - mysql

What is the difference between doing:
SELECT * FROM table WHERE column IS NULL
or -
SELECT * FROM table WHERE column = 0
Is doing IS NULL significantly worse off than equating to a constant?
The use case comes up where I have something like:
SELECT * FROM users WHERE paying IS NULL
(or adding an additional column)
SELECT * FROM users WHERE is_paying = 0

If I understand your question correctly, you are asking about the relative benefits/problems with the two situations:
where is_paying = 0
where paying is null
Given that both are in the data table, I cannot think of why one would perform better than the other. I do think the first is clearer on what the query is doing, so that is the version I would prefer. But from a performance perspective, they should be the same.
Someone else mentioned -- and I'm sure you are aware -- that NULL and 0 are different beasts. They can also behave differently in the optimization of joins and other elements. But, for simple filtering, I would expect them to have the same performance.
Well, there is one technicaility. The comparison to "0" is probably built into the CPU. The comparison to NULL is probably a bit operation that requires something like a mask, shift, and comparison -- which might take an iota of time longer. However, this performance difference is negligible when compared to the fact that you are reading the data from disk to begin with.

comparing to NULL and zero are two different things. zero is a value (known value) while NULL is UNKNOWN. The zero specifically means that the value was set to be zero; null means that the value was not set, or was set to null.

You'll get entirely different results using these queries, it's not simply a matter of performance.
Suppose you have a variety of users. Some have non-zero values for the "paying" column, some have 0, and some don't have a value whatsoever. The last case is what "null" more or less represents.
As for performance, do you have an index on the "paying" column? If you only have a few hundred rows in the table, this is probably irrelevant. If you have many thousands of rows, you are basically telling the query to iterate over every row of the table unless you have some indexing in place. This is true regardless of whether you are searching for "paying = 0" or "paying is null".
But again, just to reemphasize, the two queries will give you completely different results.

As far as I know comparing to NULL is as fast as comparing to 0, so you should choose based on:
Simplicity - use the option which makes your code simpler
Minimal size - use the option which makes your table smaller
In this case making the paying column NULL-able will probably be better.
You should also check out these questions:
NULL in MySQL (Performance & Storage)
MySQL: NULL vs “”

Related

Which is better in MySQL, an IFNULL or OR logic

I have added deleted columns to lots of tables and I have a query that LEFT JOINs across 9 tables and want to checked the deleted column for each of the tables.
I made the deleted column a TINYINT rather than a BIT for some flexibility in terms of more than one "deleted" value for workflow reasons. I want NULL or zero to mean "not deleted" and any other non-null, non-zero value to mean "deleted". I can see two approaches in the WHERE clause:
WHERE (k.deleted IS NULL OR k.deleted = 0)
AND (c.deleted IS NULL OR c.deleted = 0)
...
Or alternatively
WHERE IFNULL(k.deleted,0) = 0
AND IFNULL(c.deleted,0) = 0
...
Efficiency matters a lot in this query as it is a 9 table LEFT JOIN that returns zero or one record and it runs a lot so I really need maximum efficiency. I think the IFNULL looks more elegant, but I have a nagging feeling that MySQL might optimize queries with functions differently than AND / OR logic in WHERE clauses. Unless I hear otherwise, I am going with the more verbose "OR" form just to be on the safe side.
They are both going to be pretty bad, because both preclude indexes. One suggestion is to default the value so it is 0 instead of NULL. That will at least make the WHERE clause able to use indexes. This query is much more optimal because it can use indexes:
WHERE k.deleted = 0 AND c.deleted = 0
For clarity, I would use the ANSI standard COALESCE() rather than IFNULL(). And my personal preference is for the OR, because I think it is clearer.

What are the pros and cons of Using NULL in MySql Structure in this specific case?

I have a table structure shown below contains Structure of Roles Table I taken:
Let it be a "roles" table contains some records related to roles of users.
Now here I have taken one column "is_archived(int)" which I am using to get to know that role still exists or deleted.
So I am considering two values for that column:
"NULL"=> if that role still exists (like TRUE),
"1" => if deleted /inactive (like FALSE)
For my table maximum records will contain "NULL" value for this column and Default value is also "NULL".
Now I am in a dilemma that is there any performance issue in this case as I am using "NULL" instead of "0".
I need to know the pros and cons of this case(Like "Search Performance", "Storage", "indexing", etc).
And in case of cons, what are the best alternatives?
My opinion is that NULL is for "out of band", not for kludging an in-band value. If there is any performance or space difference, it is insignificant.
For true/false, use TINYINT NOT NULL. It is only 1 byte. You could use ENUM('false', 'true'); it is also 1 byte.
INT, regardless of the number after it, takes 4 bytes. Don't use INT for something of such low cardinality.
Leave NULL to mean "not yet known" or any other situation where you can't yet say "true" or "false". (Since you probably always know if it is 'archived', NULL has no place here.
You could even use ENUM('male', 'female', 'decline_to_state', 'transgender', 'gay', 'lesbian', 'identifies_as_male', 'North_Carolina_resident', 'other'). (Caveat: That is only a partial list; it may be better to set up a table and JOIN to it.)
I agree with #RickJames about NULL. Don't use NULL where you mean to use a real value like true. Likewise, don't use a real value like 0 or '' to signify absence of a value.
As for performance impact, you should know that to search for the presence/absence of NULL you would use the predicate is_archive IS [NOT] NULL.
If you use EXPLAIN on the query, you'll see that that predicate counts as a "range" access type. Whereas searching for a single specific value, e.g. is_archive = 1 or is_archive = 0 is a "ref" access type.
That will have performance implications for some queries. For example if you have an index on (is_archived, created_on) and you try to do a query like:
SELECT ... FROM roles
WHERE is_archived IS NULL AND created_on = '2017-01-31'
Then the index will only be half-useful. The WHERE clause cannot search the second column in the index.
But if you use real values, then the query like:
SELECT ... FROM roles
WHERE is_archived = 0 AND created_on = '2017-01-31'
Will use both columns in the index.
Re your comment about NULL storage:
Yes, in the InnoDB storage engine, internally each row stores a bitfield with 1 bit per column, where the bits indicate whether each column is NULL or not. These bits are stored compactly, i.e. one byte contains up to 8 bits. Following the bitfield is the series of column values. A column that is NULL stores no value. So yes, technically it is true that using a NULL reduces storage.
However, I urge you to simplify your data management and use false when you mean false. Do not use NULL for one of your values. I suppose there's an exception if you manage data at a scale where saving one byte per row matters. For example, if you are managing tens of billions of rows.
But at a smaller scale than that, the potential space savings aren't worth the extra complexity you add to your project.
To put it in perspective, InnoDB pages only fill each data page 15/16 full anyway. So the overhead of the InnoDB page format is likely to be greater than the savings you could get from micro-optimizing boolean storage.

MySQL - query by number or letter?

I need to set values to a "Yes or No" column name STATUS. And I'm thinking about 2 methods.
method 1 (use letter): set value Y/N then find all rows that have value Y in field STATUS by a query like:
SELECT * FROM post WHERE status="Y"
method 2 (use number): set value 1/0 then find all rows that have value 1 in field STATUS by a query like:
SELECT * FROM post WHERE status=1
Should I use method 1 or method 2? Which one is faster? Which one is better?
The two are essentially equivalent, so this becomes a question of which is better for your application.
If you are concerned about space, then the smallest space for one character is char(1), using 8 bits. With a number, you can use bit or set types for pack multiple flags. But, this only makes a difference if you have lots of flags.
The store-it-as-a-number approach has a slight advantage, where you can count the "Yes" values by doing:
select sum(status)
(Of course, in MySQL, this is only a marginal improvement on sum(status = 'Y').
The store-it-as-a-letter approach has a slight advantage if you decide to include "Maybe" or other values at some point in the future.
Finally, any difference in performance in different ways of representing these values is going to be very, very minimal. You would need a table with millions and millions of rows to start to notice a problem. So, use the mechanism that works best for your application and way of representing the value.
Second one is definitely faster primarily because whenever you involve something within quotes , it is meaningless to SQL. It would be better to use types that are non string in order to get better performance. I would suggest using METHOD 2.
Fastest way would be ;
SELECT * FROM post WHERE `status` = FIND_IN_SET(`status`,'y');
I think you should create column with ENUM('n','y'). Mysql stores this type in optimal way. It also will help you to store only allowed values in the field.
You can also make it more human friendly ENUM('no','yes') without affect to performance. Because strings 'no' and 'yes' are stored only once per ENUM definition. Mysql stores only index of the value per row.
I think the method 1 is better if you are concerned with the storage prospective .
As storing an integer i.e 1/2 takes 4 bytes of memory where as a character takes only 1 byte of memory. So its better to use method 1.
This may increase some performance .

MYSQL IN vs <> performance

I have a table where I have a status field which can have values like 1,2,3,4,5. I need to select all the rows from the table with status != 1. I have the following 2 options:
NOTE that the table has INDEX over status field.
SELECT ... FROM my_tbl WHERE status <> 1;
or
SELECT ... FROM my_tbl WHERE status IN(2,3,4,5);
Which of the above is a better choice? (my_tbl is expected to grow very big).
You can run your own tests to find out, because it will vary depending on the underlying tables.
More than that, please don't worry about "fastest" without having first done some sort of measurement that it matters.
Rather than worrying about fastest, think about which way is clearest.
In databases especially, think about which way is going to protect you from data errors.
It doesn't matter how fast your program is if it's buggy or gives incorrect answers.
How many rows have the value "1"? If less than ~20%, you will get a table scan regardless of how you formulate the WHERE (IN, <>, BETWEEN). That's assuming you have INDEX(status).
But indexing ENUMs, flags, and other things with poor cardinality is rarely useful.
An IN clause with 50K items causes memory problems (or at least used to), but not performance problems. They are sorted, and a binary search is used.
Rule of Thumb: The cost of evaluation of expressions (IN, <>, functions, etc) is mostly irrelevant in performance. The main cost is fetching the rows, especially if they need to be fetched from disk.
An INDEX may assist in minimizing the number of rows fetched.
you can use BENCHMARK() to test it yourself.
http://sqlfiddle.com/#!2/d41d8/29606/2
the first one if faster which makes sense since it only has to compare 1 number instead of 4 numbers.

mysql: 'WHERE something!=true' excludes fields with NULL

I have a 2 tables, one in which I have groups, the other where I set user restrictions of which groups are seen.
When I do LEFT JOIN and specify no condition, it shows me all records. When I do WHERE group_hide.hide!='true' it only shows these records that have false enum type set to them. With JOIN, other groups get the hide field set as "NULL".
How can I make it so that it excludes only these that are set to true, and show everything else that has either NULL or false?
In MySQL you must use IS NULL or IS NOT NULL when dealing with nullable values.
HEre you should use (group_hide.hide IS NULL OR group_hide.hide != 'true')
Don already provided good answer to the question that you asked and will solve your immediate problem.
However, let me address the point of wrong data type domain. Normally you would make hide be BOOLEAN but mysql does not really implement it completely. It converts it to TINYINT(1) which allows values from -128 to 127 (see overview of data types for mysql). Since mysql does not support CHECK constraint you are left with options to either use a trigger or foreign reference to properly enforce the domain.
Here are the problems with wrong data domain (your case), in order of importance:
The disadvantages of allowing NULL for a field that can be only 1 or 0 are that you have to employ 3 value logic (true, false, null), which btw is not perfectly implemented in SQL. This makes certain query more complex and slower then they need to be. If you can make a column NOT NULL, do.
The disadvantages of using VARCHAR for a field that can be only 1 or 0 are the speed of the query, due to the extra I/O and bigger storage needs (slows down reads, writes, makes indexes bigger if a field is part of the index and influences the size of backups; keep in mind that none of these effects might be noticeable with wrong domain of a single field for a smaller size tables, but if data types are consistently set too big or if the table has serious number of records the effects will bite). Also, you will always need to convert the VARCHAR to a 1 or 0 to use natural mysql boolean operators increasing complexity of queries.
The disadvantage of mysql using TINYINT(1) for BOOL is that certain values are allowed by RDBMS that should not be allowed, theoretically allowing for meaningless values to be stored in the system. In this case your application layer must guarantee the data integrity and it is always better if RDBMS guarantees integrity as it would protect you from certain bugs in application layer and also mistakes that might be done by database administrator.
an obvious answer would be:
WHERE (group_hide.hide is null or group_hide.hide ='false')
I'm not sure off the top of my head what the null behaviour rules are.