Mysql count vs mysql SELECT, which one is faster? - mysql

If I want to do a check a name, I want to see how many rows/name exists in the "username" column under users table. Lets say thousands ... hundred of thousands, should I use:
count(name),
count(*) or
SELECT username FROM users where username = 'name'
Which one is the more appropriate? Or they will give same result in term of speed/response?
EDIT:
Thanks guys, I found the answer, count() will definitely faster
Is this query correct
SELECT COUNT( username )
FROM users
WHERE `username` = 'tim'

COUNT(*) and COUNT(Name) might produce different values. COUNT will not include NULL values, so if there are any instances of Name that equal NULL they will not be counted.
COUNT(*) will also perform better than Count(Name). By specifying COUNT(*) you are leaving the optimizer free to use any index it wishes. By specifying COUNT(Name) you are forcing the query engine to use the table, or at least an index that contains the NAME column.

COUNT(name) or COUNT(*) will be somewhat faster because they do not need to return much data.
(see Andrew Shepherd's reply on the semantic difference between these two forms of COUNT, as well as COUNT() ). The focus being to "check a name", these differences matter little with the following trick: Instead than count you can also use
SELECT username FROM users where username = 'name' LIMIT 1;
Which will have the effect of checking (the existence) of the name, but returning as soon at one is found.

Count(*) would be faster as MySQL engine is free to choose index to count.

Select statement produces more traffic if there lot of users with same name(lots of rows instead of one).

Try all three and use whichever preforms the best. If they all preform around the same this is an and example of premature optimization and you should probably just use whichever one you feel most comfortable with and tweak it later if necessary. If you superstitious you could also consider using count(1) which I have been told could performance advantages as well.
I would say you should use a select top 1, but your checking a username which is probably an indexed unique column so theoretically count should preform just as well considering there can only be one.

Related

MySQL UNION ALL Providing 0 speed increase over equivalent OR statement [duplicate]

I just read part of an optimization article and segfaulted on the following statement:
When using SQL replace statements using OR with a UNION:
select username from users where company = ‘bbc’ or company = ‘itv’;
to:
select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;
From a quick EXPLAIN:
Using OR:
Using UNION:
Doesn't this mean UNION does in double the work?
While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.
Question
Am I wrong?
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.
The UNION solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
Those are not the same query.
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
select distinct username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union
select username from users where company = ‘itv’;
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
select username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union all
select username from users where company = ‘itv’;
It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.
I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.
Also, UNION has some overhead, since it creates a reset set (no duplicates).
Each statement in the UNION should execute pretty quickly if company is indexed... not sure it'd really be doing double the work.
Bottom line
Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR
Update
I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):
select username from users where company in ('bbc', 'itv');
This my benchmark result
When use UNION - Query took 13.8699 seconds
row examined primary select type - 247685
when use OR - Query took 0.0126 seconds and row examined primary
select type - 495371
MySQL uses one index for a query, so when we are using or then mysql use one column index and scan full table for another column
another part union same work can 2 times
that's why or is faster then union
In almost all cases, the union or union all version is going to do two full table scans of the users table.
The or version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.
The original statement just seems wrong, for just about any database and any situation.
Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).
It depends of the table's structure and needs generally but in large tables union gave to me better results.

MySQL Optimizing: replace OR statement with UNION [duplicate]

I just read part of an optimization article and segfaulted on the following statement:
When using SQL replace statements using OR with a UNION:
select username from users where company = ‘bbc’ or company = ‘itv’;
to:
select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;
From a quick EXPLAIN:
Using OR:
Using UNION:
Doesn't this mean UNION does in double the work?
While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.
Question
Am I wrong?
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.
The UNION solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
Those are not the same query.
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
select distinct username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union
select username from users where company = ‘itv’;
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
select username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union all
select username from users where company = ‘itv’;
It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.
I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.
Also, UNION has some overhead, since it creates a reset set (no duplicates).
Each statement in the UNION should execute pretty quickly if company is indexed... not sure it'd really be doing double the work.
Bottom line
Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR
Update
I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):
select username from users where company in ('bbc', 'itv');
This my benchmark result
When use UNION - Query took 13.8699 seconds
row examined primary select type - 247685
when use OR - Query took 0.0126 seconds and row examined primary
select type - 495371
MySQL uses one index for a query, so when we are using or then mysql use one column index and scan full table for another column
another part union same work can 2 times
that's why or is faster then union
In almost all cases, the union or union all version is going to do two full table scans of the users table.
The or version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.
The original statement just seems wrong, for just about any database and any situation.
Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).
It depends of the table's structure and needs generally but in large tables union gave to me better results.

SQL Performance UNION vs OR

I just read part of an optimization article and segfaulted on the following statement:
When using SQL replace statements using OR with a UNION:
select username from users where company = ‘bbc’ or company = ‘itv’;
to:
select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;
From a quick EXPLAIN:
Using OR:
Using UNION:
Doesn't this mean UNION does in double the work?
While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.
Question
Am I wrong?
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.
The UNION solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
Those are not the same query.
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
select distinct username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union
select username from users where company = ‘itv’;
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
select username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union all
select username from users where company = ‘itv’;
It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.
I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.
Also, UNION has some overhead, since it creates a reset set (no duplicates).
Each statement in the UNION should execute pretty quickly if company is indexed... not sure it'd really be doing double the work.
Bottom line
Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR
Update
I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):
select username from users where company in ('bbc', 'itv');
This my benchmark result
When use UNION - Query took 13.8699 seconds
row examined primary select type - 247685
when use OR - Query took 0.0126 seconds and row examined primary
select type - 495371
MySQL uses one index for a query, so when we are using or then mysql use one column index and scan full table for another column
another part union same work can 2 times
that's why or is faster then union
In almost all cases, the union or union all version is going to do two full table scans of the users table.
The or version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.
The original statement just seems wrong, for just about any database and any situation.
Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).
It depends of the table's structure and needs generally but in large tables union gave to me better results.

MySQL: SELECT(x) WHERE vs COUNT WHERE?

This is going to be one of those questions but I need to ask it.
I have a large table which may or may not have one unique row. I therefore need a MySQL query that will just tell me TRUE or FALSE.
With my current knowledge, I see two options (pseudo code):
[id = primary key]
OPTION 1:
SELECT id FROM table WHERE x=1 LIMIT 1
... and then determine in PHP whether a result was returned.
OPTION 2:
SELECT COUNT(id) FROM table WHERE x=1
... and then just use the count.
Is either of these preferable for any reason, or is there perhaps an even better solution?
Thanks.
If the selection criterion is truly unique (i.e. yields at most one result), you are going to see massive performance improvement by having an index on the column (or columns) involved in that criterion.
create index my_unique_index on table(x)
If you want to enforce the uniqueness, that is not even an option, you must have
create unique index my_unique_index on table(x)
Having this index, querying on the unique criterion will perform very well, regardless of minor SQL tweaks like count(*), count(id), count(x), limit 1 and so on.
For clarity, I would write
select count(*) from table where x = ?
I would avoid LIMIT 1 for two other reasons:
It is non-standard SQL. I am not religious about that, use the MySQL-specific stuff where necessary (i.e. for paging data), but it is not necessary here.
If for some reason, you have more than one row of data, that is probably a serious bug in your application. With LIMIT 1, you are never going to see the problem. This is like counting dinosaurs in Jurassic Park with the assumption that the number can only possibly go down.
AFAIK, if you have an index on your ID column both queries will be more or less equal performance. The second query will need 1 less line of code in your program but that's not going to make any performance impact either.
Personally I typically do the first one of selecting the id from the row and limiting to 1 row. I like this better from a coding perspective. Instead of having to actually retrieve the data, I just check the number of rows returned.
If I were to compare speeds, I would say not doing a count in MySQL would be faster. I don't have any proof, but my guess would be that MySQL has to get all of the rows and then count how many there are. Altough...on second thought, it would have to do that in the first option as well so the code will know how many rows there are as well. But since you have COUNT(id) vs COUNT(*), I would say it might be slightly slower.
Intuitively, the first one could be faster since it can abort the table(or index) scan when finds the first value. But you should retrieve x not id, since if the engine it's using an index on x, it doesn't need to go to the block where the row actually is.
Another option could be:
select exists(select 1 from mytable where x = ?) from dual
Which already returns a boolean.
Typically, you use group by having clause do determine if there are duplicate rows in a table. If you have a table with id and a name. (Assuming id is the primary key, and you want to know if name is unique or repeated). You would use
select name, count(*) as total from mytable group by name having total > 1;
The above will return the number of names which are repeated and the number of times.
If you just want one query to get your answer as true or false, you can use a nested query, e.g.
select if(count(*) >= 1, True, False) from (select name, count(*) as total from mytable group by name having total > 1) a;
The above should return true, if your table has duplicate rows, otherwise false.

Why does MySQL allow "group by" queries WITHOUT aggregate functions?

Surprise -- this is a perfectly valid query in MySQL:
select X, Y from someTable group by X
If you tried this query in Oracle or SQL Server, you’d get the natural error message:
Column 'Y' is invalid in the select list because it is not contained in
either an aggregate function or the GROUP BY clause.
So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.
There’s even a MySQL configuration parameter to turn off this “looseness”.
http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard.
http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html
My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?
According to this page (the 5.0 online manual), it's for better performance and user convenience.
I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:
SELECT user.id, user.name, COUNT(post.*) AS posts
FROM user
LEFT OUTER JOIN post ON post.owner_id=user.id
GROUP BY user.id
In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY clause (although, as you say, there is definite scope for problems)
Unfortunately almost all the SQL varieties have situations where they break ANSI and have unpredictable results.
It sounds to me like they intended it to be treated like the "FIRST(Y)" function that many other systems have.
More than likely, this construct is something that the MySQL team regret, but don't want to stop supporting because of the number of applications that would break.
MySQL treats this is a single column DISTINCT when you use GROUP BY without an aggregate function. Using other options you either have the whole result be distinct, or have to use subqueries, etc. The question is whether the results are truly predictable.
Also, good info is in this thread.
From what I have read in the mysql reference page, it says:
"You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group."
I suggest you to read this page (link to the reference manual of mysql):
http://dev.mysql.com/doc/refman/5.5/en//group-by-extensions.html
Its actually a very useful tool that all other fields dont have to be in an aggregate function when you group by a field. You can manipulate the result which will be returned by simply ordering it first and then grouping it after. for instance if i wanted to get user login information and i wanted to see the last time the user logged in i would do this.
Tables
USER
user_id | name
USER_LOGIN_HISTORY
user_id | date_logged_in
USER_LOGIN_HISTORY has multiple rows for one user so if i joined users to it it would return many rows. as i am only interested in the last entry i would do this
select
user_id,
name,
date_logged_in
from(
select
u.user_id,
u.name,
ulh.date_logged_in
from users as u
join user_login_history as ulh
on u.user_id = ulh.user_id
where u.user_id = 1234
order by ulh.date_logged_in desc
)as table1
group by user_id
This would return one row with the name of the user and the last time that user logged in.