Which is better in MySQL, an IFNULL or OR logic - mysql

I have added deleted columns to lots of tables and I have a query that LEFT JOINs across 9 tables and want to checked the deleted column for each of the tables.
I made the deleted column a TINYINT rather than a BIT for some flexibility in terms of more than one "deleted" value for workflow reasons. I want NULL or zero to mean "not deleted" and any other non-null, non-zero value to mean "deleted". I can see two approaches in the WHERE clause:
WHERE (k.deleted IS NULL OR k.deleted = 0)
AND (c.deleted IS NULL OR c.deleted = 0)
...
Or alternatively
WHERE IFNULL(k.deleted,0) = 0
AND IFNULL(c.deleted,0) = 0
...
Efficiency matters a lot in this query as it is a 9 table LEFT JOIN that returns zero or one record and it runs a lot so I really need maximum efficiency. I think the IFNULL looks more elegant, but I have a nagging feeling that MySQL might optimize queries with functions differently than AND / OR logic in WHERE clauses. Unless I hear otherwise, I am going with the more verbose "OR" form just to be on the safe side.

They are both going to be pretty bad, because both preclude indexes. One suggestion is to default the value so it is 0 instead of NULL. That will at least make the WHERE clause able to use indexes. This query is much more optimal because it can use indexes:
WHERE k.deleted = 0 AND c.deleted = 0
For clarity, I would use the ANSI standard COALESCE() rather than IFNULL(). And my personal preference is for the OR, because I think it is clearer.

Related

Order of WHERE predicates and the SQL optimizer

When writting SQL queries with various where clauses (I only work with MySQL and sqlite) , I usually have the doubt of reordering the query clauses to put the "best ones" first (those which will remove a bigger amount of rows), and other "cosmetic" clauses later (which will barely change the output). In other words, I'm in doubt about if I really will help the optimizer to run faster by reordering clauses (specially when there are indexes in play), or if it could be another case of premature optimization. Optimizers are usually smarter than me.
For example:
select address.* from address inner join
user on address.user = user.id
where address.zip is not null and address.country == user.country
If we know that usually address.zip is not null, that check will be 90% true, and if the query order is respected, there will be a lot of dummy checks which can be avoided by placing the country check before.
Should I take care of that? In other words, is it important the order of where clauses or not?
The mysql optimizer seems well documented and you can find a many interesting considerations in offcial documents ..http://dev.mysql.com/doc/refman/5.7/en/where-optimizations.html
be taken into account especially of a very simple fact .... sql is not a procedural language but rather is a declarative language .. this mean it is not important the order in which the parts are written but it is important only the fact of what elements are been declared. This is evident in the documentation on optimization of mysql where where the focus is only on the components of a query and how they are transformed by optmizer in internal components
The order is mostly irrelevant.
In MySQL, with WHERE ... AND ...,
The Optimizer will first look for which part can use an index. If one can and one can't, the optimizer will use the index; the order becomes irrelevant
If both sides of the AND can use an index, MySQL will usually pick the 'better' one. (Sometimes it goofs.) Again, the order is ignored.
If neither side can use an index, it it evaluated left to right. But... Fetching rows is the bulk of effort in performing the query, so if one side of the AND is a little slower than the other, you probably won't notice. (Sure, if one side does SLEEP(3) you will notice.)
There's another issue in your example query (aside from the syntax error): The Optimizer will make a conscious decision of which table to start with.
If it decides to start with user, address needs INDEX(user, country) in either order.
If it decides to start with address, user needs (id, country) in either order.
It is unclear whether the Optimizer will bother with the NOT NULL test, even if that column is indexed.
Bottom line: Spend your time focusing on optimal indexes .
The answer is definitly maybe.
Mysterious are the ways of the optimizer.
Here is a demonstration based on exception caused due to division by zero.
create table t (i int);
insert into t (i) values (0);
The following query succeeds for Oracle, SQL Server, Postgres and Teradata (we'll skip the version information for now):
select 1 from t where i < 1 or 1/i < 1;
The following query fails for SQL Server and Postgres but succeeds for Oracle and Teradata
select 1 from t where 1/i < 1 or i < 1;
However, the following query does fail for Oracle and Teradata:
select 1 from t where 1/i < 1 or i/1 < 1;
What do we learn?
That some optimizers seem to respect the order of the predicates (or at least in some manner) and some seem to reorder the predicates by their estimated cost (e.g 1/i < 1 is more costly than i < 1 but not i/1 < 1).
For those who respect the order of the predicates we probably can improve performance by putting the light wait predicates first for OR operators and the frequently false predicates for AND operators.
Being that said, since databases do not guarantee to preserve the order of the predicates even if some of them currently seem to do that, you definitly can't count on it.
MySQL 5.7.11
This query returns immediately:
select 1 from t where i < 1 or sleep(3);
This query returns after 3 seconds:
select 1 from t where sleep(3) or i < 1

What difference does it make which column SQL COUNT() is run on?

Firstly, this is not asking In SQL, what's the difference between count(column) and count(*)?.
Say I have a users table with a primary key user_id and another field logged_in which describes if the user is logged in right now.
Is there a difference between running
SELECT COUNT(user_id) FROM users WHERE logged_in=1
and
SELECT COUNT(logged_in) FROM users WHERE logged_in=1
to see how many users are marked as logged in? Maybe a difference with indexes?
I'm running MySQL if there are DB-specific nuances to this.
In MySQL, the count function will not count null expressions, so the results of your two queries may be different. As mentioned in the comments and Remus' answer, this is as a general rule for SQL and part of the spec.
For example, consider this data:
user_id logged_in
1 1
null 1
SELECT COUNT(user_id) on this table will return 1, but SELECT COUNT(logged_in) will return 2.
As a practical matter, the results from the example in the question ought to always be the same, as long as the table is properly constructed, but the utilized indexes and query plans may differ, even though the results will be the same. Additionally, if that's a simplified example, counting on different columns may change the results as well.
See also this question: MySQL COUNT() and nulls
For the record: the two queries return different results. As the spec says:
Returns a count of the number of non-NULL values of expr in the rows
retrieved by a SELECT statement.
You may argue that given the condition for logged_in=1 the NULL logged_in rows are filtered out anyway, and user_id will not have NULLs in a table users. While this may be true, it does not change the fundamentals that the queries are different. You are asking the query optimizer to make all the logical deductions above, for you they may be obvious but for the optimizer may be is not.
Now, assuming that the results are in practice always identical between the two, the answer is simple: don't run such a query in production (and I mean either of them). Is a scan, no matter how you slice it. logged_in has too low cardinality to matter. Keep a counter, update it at each log in and each log out event. It will drift in time, refresh as often as needed (once a day, once an hour).
As for the question itself: SELECT COUNT(somefield) FROM sometable can use a narrow index on somefield resulting in less IO. The recommendation is to use * because this room for the optimizer to use any index it sees fit (this will vary from product to product though, depending on how smart a query optimizer are we dealing with, YMMV). But as you start adding WHERE clauses the possibile alternatives (=indexes to use) quickly vanish.

Difference in performance between two similar sql queries

What is the difference between doing:
SELECT * FROM table WHERE column IS NULL
or -
SELECT * FROM table WHERE column = 0
Is doing IS NULL significantly worse off than equating to a constant?
The use case comes up where I have something like:
SELECT * FROM users WHERE paying IS NULL
(or adding an additional column)
SELECT * FROM users WHERE is_paying = 0
If I understand your question correctly, you are asking about the relative benefits/problems with the two situations:
where is_paying = 0
where paying is null
Given that both are in the data table, I cannot think of why one would perform better than the other. I do think the first is clearer on what the query is doing, so that is the version I would prefer. But from a performance perspective, they should be the same.
Someone else mentioned -- and I'm sure you are aware -- that NULL and 0 are different beasts. They can also behave differently in the optimization of joins and other elements. But, for simple filtering, I would expect them to have the same performance.
Well, there is one technicaility. The comparison to "0" is probably built into the CPU. The comparison to NULL is probably a bit operation that requires something like a mask, shift, and comparison -- which might take an iota of time longer. However, this performance difference is negligible when compared to the fact that you are reading the data from disk to begin with.
comparing to NULL and zero are two different things. zero is a value (known value) while NULL is UNKNOWN. The zero specifically means that the value was set to be zero; null means that the value was not set, or was set to null.
You'll get entirely different results using these queries, it's not simply a matter of performance.
Suppose you have a variety of users. Some have non-zero values for the "paying" column, some have 0, and some don't have a value whatsoever. The last case is what "null" more or less represents.
As for performance, do you have an index on the "paying" column? If you only have a few hundred rows in the table, this is probably irrelevant. If you have many thousands of rows, you are basically telling the query to iterate over every row of the table unless you have some indexing in place. This is true regardless of whether you are searching for "paying = 0" or "paying is null".
But again, just to reemphasize, the two queries will give you completely different results.
As far as I know comparing to NULL is as fast as comparing to 0, so you should choose based on:
Simplicity - use the option which makes your code simpler
Minimal size - use the option which makes your table smaller
In this case making the paying column NULL-able will probably be better.
You should also check out these questions:
NULL in MySQL (Performance & Storage)
MySQL: NULL vs “”

Is it faster in MySQL to have one filter or several for the same column?

In MySQL, I always consider:
.... WHERE type > 5
to be faster than:
... WHERE type != 4 AND type != 6 AND type != 10
That is, I believe it is faster to one 'larger than' or 'smaller than' statement than having several comparison (equal to's or not equal to's). However, I have absolutely no idea if this is a valid thought. Anybody any idea?
I think this is virtually the same, as long as you don't have 1000 conditions in the WHERE clause. When trying to optimize sql queries, focus more on the multi-table operations, joins etc. This is a minor thing. Don't focus on this too much. If you try too much to optimize you can complicate your database and code with almost no benefit.
Once I was in a situation when it was better to have a multiple conditions OR'ed in the WHERE clause than join another table - using multiple WHERE condition was much more efficient because I saved one join.

mysql where condition

I'm interested in where condition; if I write:
Select * from table_name
where insert_date > '2010-01-03'
and text like '%friend%';
is it different from:
Select * from table_name
where text like '%friend%'
and insert_date > '2010-01-03';
I mean if the table is very big, has a lot of rows and if mysql takes records compliant with condition " where insert_date > '2010-01-03' " first and then searches in these records for a word "friend" it can be much faster than from first search for "friend" rows and than look into the date field.
Is it important to write where condition smartly, or mysql analyze the condition and rewrites where condition in the best way?
thanks
No, the two where clauses should be equivalent. The optimizer should pick the same index whichever you use.
The order of columns in an index does matter though.
If you think the optimizer is using the wrong index, you could give it a hint. More often than not though, there's a good reason for using the index it has chosen to use, so unless you know exactly what you are doing, giving the optimizer hints will often make things worse not better.
I don't know about MySQL in particular, but typically this kind of optimization is left to the database engine, as which order is faster depends on indexes, cardinality of data, and quantity of data among other things.
I think it's true, that both of where clause ar similar in database abstraction
By definition, a logical conjunction (the AND operator) is commutative. This means that WHERE A AND B is equal to WHERE B AND A.
It makes no difference in which order you write your conditions.
However, what makes a difference is what indexes you have in place on your table. The query analyzer takes these into account. It is also smart enough to find the part of the condition that is easiest to check and apply that one first.