MySQL Optimizing: replace OR statement with UNION [duplicate] - mysql

I just read part of an optimization article and segfaulted on the following statement:
When using SQL replace statements using OR with a UNION:
select username from users where company = ‘bbc’ or company = ‘itv’;
to:
select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;
From a quick EXPLAIN:
Using OR:
Using UNION:
Doesn't this mean UNION does in double the work?
While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.
Question
Am I wrong?

Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.
The UNION solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.

Those are not the same query.
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
select distinct username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union
select username from users where company = ‘itv’;
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
select username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union all
select username from users where company = ‘itv’;

It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.
I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.
Also, UNION has some overhead, since it creates a reset set (no duplicates).
Each statement in the UNION should execute pretty quickly if company is indexed... not sure it'd really be doing double the work.
Bottom line
Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR
Update
I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):
select username from users where company in ('bbc', 'itv');

This my benchmark result
When use UNION - Query took 13.8699 seconds
row examined primary select type - 247685
when use OR - Query took 0.0126 seconds and row examined primary
select type - 495371
MySQL uses one index for a query, so when we are using or then mysql use one column index and scan full table for another column
another part union same work can 2 times
that's why or is faster then union

In almost all cases, the union or union all version is going to do two full table scans of the users table.
The or version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.
The original statement just seems wrong, for just about any database and any situation.

Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).
It depends of the table's structure and needs generally but in large tables union gave to me better results.

Related

MySQL UNION ALL Providing 0 speed increase over equivalent OR statement [duplicate]

I just read part of an optimization article and segfaulted on the following statement:
When using SQL replace statements using OR with a UNION:
select username from users where company = ‘bbc’ or company = ‘itv’;
to:
select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;
From a quick EXPLAIN:
Using OR:
Using UNION:
Doesn't this mean UNION does in double the work?
While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.
Question
Am I wrong?
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.
The UNION solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
Those are not the same query.
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
select distinct username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union
select username from users where company = ‘itv’;
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
select username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union all
select username from users where company = ‘itv’;
It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.
I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.
Also, UNION has some overhead, since it creates a reset set (no duplicates).
Each statement in the UNION should execute pretty quickly if company is indexed... not sure it'd really be doing double the work.
Bottom line
Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR
Update
I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):
select username from users where company in ('bbc', 'itv');
This my benchmark result
When use UNION - Query took 13.8699 seconds
row examined primary select type - 247685
when use OR - Query took 0.0126 seconds and row examined primary
select type - 495371
MySQL uses one index for a query, so when we are using or then mysql use one column index and scan full table for another column
another part union same work can 2 times
that's why or is faster then union
In almost all cases, the union or union all version is going to do two full table scans of the users table.
The or version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.
The original statement just seems wrong, for just about any database and any situation.
Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).
It depends of the table's structure and needs generally but in large tables union gave to me better results.

MySql Explain ignoring the unique index in a particular query

I started looking into Index(es) in depth for the first time and started analyzing our db beginning from the users table for the first time. I searched SO to find a similar question but was not able to frame my search well, I guess.
I was going through a particular concept and this first observation left me wondering - The difference in these Explain(s) [Difference : First query is using 'a%' while the second query is using 'ab%']
[Total number of rows in users table = 9193]:
1) explain select * from users where email_address like 'a%';
(Actually matching columns = 1240)
2) explain select * from users where email_address like 'ab%';
(Actually matching columns = 109)
The index looks like this :
My question:
Why is the index totally ignored in the first query? Does mySql think that it is a better idea not to use the index in the case 1? If yes, why?
If the probability, based statistics mysql collects on distribution of the values, is above a certain ratio of the total rows (typically 1/11 of the total), mysql deems it more efficient to simply scan the whole table reading the disks pages in sequentially, rather than use the index jumping around the disk pages in random order.
You could try your luck with this query, which may use the index:
where email_address between 'a' and 'az'
Although doing the full scan may actually be faster.
This is not a direct answer to your question but I still want to point it out (in case you already don't know):
Try:
explain select email_address from users where email_address like 'a%';
explain select email_address from users where email_address like 'ab%';
MySQL would now use indexes in both the queries above since the columns of interest are directly available from the index.
Probably in the case where you do a "select *", index access is more costly since the optmizer has to go through the index records, find the row ids and then go back to the table to retrieve other column values.
But in the query above where you only do a "select email_address", the optmizer knows all the information desired is available right from the index and hence it would use the index irrespective of the 30% rule.
Experts, please correct me if I am wrong.

SQL Performance UNION vs OR

I just read part of an optimization article and segfaulted on the following statement:
When using SQL replace statements using OR with a UNION:
select username from users where company = ‘bbc’ or company = ‘itv’;
to:
select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;
From a quick EXPLAIN:
Using OR:
Using UNION:
Doesn't this mean UNION does in double the work?
While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.
Question
Am I wrong?
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.
The UNION solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
Those are not the same query.
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
select distinct username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union
select username from users where company = ‘itv’;
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
select username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union all
select username from users where company = ‘itv’;
It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.
I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.
Also, UNION has some overhead, since it creates a reset set (no duplicates).
Each statement in the UNION should execute pretty quickly if company is indexed... not sure it'd really be doing double the work.
Bottom line
Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR
Update
I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):
select username from users where company in ('bbc', 'itv');
This my benchmark result
When use UNION - Query took 13.8699 seconds
row examined primary select type - 247685
when use OR - Query took 0.0126 seconds and row examined primary
select type - 495371
MySQL uses one index for a query, so when we are using or then mysql use one column index and scan full table for another column
another part union same work can 2 times
that's why or is faster then union
In almost all cases, the union or union all version is going to do two full table scans of the users table.
The or version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.
The original statement just seems wrong, for just about any database and any situation.
Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).
It depends of the table's structure and needs generally but in large tables union gave to me better results.

Optimize mysql query using indexes

I have a problem with this query:
SELECT DISTINCT s.city, pc.start, pc.end
FROM postal_codes pc LEFT JOIN suspects s ON (s.postalcode BETWEEN pc.start AND pc.end)
WHERE pc.user_id = "username"
ORDER BY pc.start
Suspect table has about 340 000 entries, there is a index on postalcode, I have several users, but this individual query takes about 0.5s, when I run this SQL with explain, I get something like this: http://my.jetscreenshot.com/7536/20111225-myhj-41kb.jpg - does these NULLs mean that the query isn't using index? The index is a BTREE so I think this should run a little faster.
Can you please help me with this? If there are any other informations needed just let me know.
Edit: I have indexes on suspects.postalcode, postal_codes.start, postal_codes.end, postal_codes.user_id.
Basically what I'm trying to achieve: I have a table where each user ID has multiple postalcode ranges assigned, so it looks like:
user_id | start | end
Than I have a table of suspects where each suspect has an address (which contains a postalcode), so in this query I'm trying to get postalcode range - start and end and also name of the city in this range.
Hope this helps.
Whenever left join is used all the records of the first table are picked up rather than the selection on the basis of index. I would suggest to using an inner join. Something like in the below query.
select distinct
s.city,
pc.start,
pc.end
from postal_codes pc, suspect s
where
s.postalcode between (select pc1.start, pc1.end from postal_code pc1 where pc1.user_id = "username" )
and pc.user_id = "username"
order by pc.start
It's using only one index, and not for the fields involved in the join. Try creating an index for the start and end fields, or using >= and <= instead of BETWEEN
Not 100% sure, but this might be relevant:
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.
So try testing with LIMIT, and if it uses the index then, you found your cause.
I have to say I'm a little confused by your table naming convention, I would expect the "suspect" table to have a user_id not the postal_code, but you must have your reasons. If you were to leave this query as it is, you can add an index on postal_code (star,end) to avoid the complete table scan.
I think you can restructure your query like following,
SELECT DISTINCT s.city, pc1.start, pc1.end FROM
(SELECT pc.start and pc.end from postal_codes pc where pc.user_id = "username") as pc1, Suspect s
WHERE s.postalcode BETWEEN pc1.start, pc1.end ORDER BY pc1.start
your query is not picking up the index on s table because of left join and your between condition. Having an Index in your table doesn't necessarily mean that it will be used in all the queries.
Try FORCE INDEX.

Mysql count vs mysql SELECT, which one is faster?

If I want to do a check a name, I want to see how many rows/name exists in the "username" column under users table. Lets say thousands ... hundred of thousands, should I use:
count(name),
count(*) or
SELECT username FROM users where username = 'name'
Which one is the more appropriate? Or they will give same result in term of speed/response?
EDIT:
Thanks guys, I found the answer, count() will definitely faster
Is this query correct
SELECT COUNT( username )
FROM users
WHERE `username` = 'tim'
COUNT(*) and COUNT(Name) might produce different values. COUNT will not include NULL values, so if there are any instances of Name that equal NULL they will not be counted.
COUNT(*) will also perform better than Count(Name). By specifying COUNT(*) you are leaving the optimizer free to use any index it wishes. By specifying COUNT(Name) you are forcing the query engine to use the table, or at least an index that contains the NAME column.
COUNT(name) or COUNT(*) will be somewhat faster because they do not need to return much data.
(see Andrew Shepherd's reply on the semantic difference between these two forms of COUNT, as well as COUNT() ). The focus being to "check a name", these differences matter little with the following trick: Instead than count you can also use
SELECT username FROM users where username = 'name' LIMIT 1;
Which will have the effect of checking (the existence) of the name, but returning as soon at one is found.
Count(*) would be faster as MySQL engine is free to choose index to count.
Select statement produces more traffic if there lot of users with same name(lots of rows instead of one).
Try all three and use whichever preforms the best. If they all preform around the same this is an and example of premature optimization and you should probably just use whichever one you feel most comfortable with and tweak it later if necessary. If you superstitious you could also consider using count(1) which I have been told could performance advantages as well.
I would say you should use a select top 1, but your checking a username which is probably an indexed unique column so theoretically count should preform just as well considering there can only be one.