This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
COUNT() vs. COUNT(1) vs. COUNT(pk): which is better?
count() and count(column_name), what's the diff?
count(*) vs count(column-name) - which is more correct?
The benefit of using count(*) in a select statement is that I can use it with any table and that makes automating scripts easier:
count_sql = 'select count(*) ' + getRestOfSQL('tablename');
But, is it less efficient than using count(specific_field)?
For InnoDB
If specific_field is not nullable, they are equivalent and have the same performance.
If specific_field is nullable, they don't do the same thing. COUNT(specific_field) counts the rows which have a not null value of specific_field. This requires looking at the value of specific_field for each row. COUNT(*) simply counts the number of rows and in this case can be faster as it does not require examining the value of specific_field.
For MyISAM
There is a special optimization for the following so that it does not even need to fetch all rows:
SELECT COUNT(*) FROM yourtable
Generally, it wouldn't matter so much, as we're returning the same number of rows.
This link covers it nicely
count(*) vs count(column-name) - which is more correct?
Count(*) vs Count(1)
This link also explains more, specifically with Oracle
Related
This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 2 years ago.
I have a simple online leaderboard which also encodes replays in a string. Although the leaderboard stores every laptime reported (currently), the retrieval php just returns the best times for each unique player, thus:
SELECT driver
, MIN(time)
, track
, replay
FROM Laptimes
WHERE track = '$track'
GROUP
BY driver
ORDER
BY MIN(time) ASC
LIMIT 10
This correctly reports the fastest laptime, but does NOT select the replay associated with that laptime.
Instead you just get the first replay submitted for that driver.
I'm 100% sure the replays are correctly stored in the database, because if I remove the MIN() I get every laptime by every player, and can watch each replay without any problem.
I just can't seem to convince SQL to give me the replay associated with the minimum laptime.
You want entire rows, so you need to filter rather than aggregate. A simple approach uses a correlated subquery:
select l.*
from laptimes l
where
track = ?
l.time = (select min(l1.time) from laptimes l1 where l1.driver = l.driver and l1.track = l.track)
Note that, as commented by JNevill, your original query is not valid standard SQL, because the select and group by clauses are not consistent. MySQL might tolerate it (if you have option ONLY_FULL_GROUP_BY disabled, which is the default in old versions), but then you get an arbitrary values in non-aggregated columns that are not present in the group by clause. This might be simpler to understand when the query is written as follows (which is equivalent to your original code - and is valid MySQL code):
SELECT driver, MIN(time), ANY_VALUE(track), ANY_VALUE(replay)
FROM Laptimes
WHERE (track='$track')
GROUP BY driver
ORDER BY MIN(time) ASC LIMIT 10
Note #2: use prepared statements! Do not mungle parameters into the query string - this is both inefficient and unsafe.
this is my first post here since most of the time I already found a suitable solution :)
However this time nothing seems to help properly.
Im trying to migrate information from some mysql Database I have just read-only access to.
My problem is similar to this one: Group by doesn't give me the newest group
I also need to get the latest information out of some tables but my tables have >300k entries therefore checking whether the "time-attribute-value" is the same as in the subquery (like suggested in the first answer) would be too slow (once I did "... WHERE EXISTS ..." and the server hung up).
In addition to that I can hardly find the important information (e.g. time) in a single attribute and there never is a single primary key.Until now I did it like it was suggested in the second answer by joining with subquery that contains latest "time-attribute-entry" and some primary keys but that gets me in a huge mess after using multiple joins and unions with the results.
Therefore I would prefer using the having statement like here: Select entry with maximum value of column after grouping
But when I tried it out and looked for a good candidate as the "time-attribute" I noticed that this queries give me two different results (more = 39721, less = 37870)
SELECT COUNT(MATNR) AS MORE
FROM(
SELECT DISTINCT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG
FROM
FKT_LAB
) AS TEMP1
SELECT COUNT(MATNR) AS LESS
FROM(
SELECT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG,
LAB_PDATUM
FROM
FKT_LAB
GROUP BY
LAB_MTKNR,
LAB_STG,
LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
)AS TEMP2
Although both are applied to the same table and use "GROUP BY" / "SELECT DISTINCT" on the same entries.
Any ideas?
If nothing helps and I have to go back to my mess I will use string variables as placeholders to tidy it up but then I lose the overview of how many subqueries, joins and unions I have in one query... how many temproal tables will the server be able to cope with?
Your second query is not doing what you expect it to be doing. This is the query:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG, LAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
) TEMP2;
The problem is the having clause. You are mixing an unaggregated column (LAB_PDATUM) with an aggregated value (MAX(LAB_PDATAUM)). What MySQL does is choose an arbitrary value for the column and compare it to the max.
Often, the arbitrary value will not be the maximum value, so the rows get filtered. The reference you give (although an accepted answer) is incorrect. I have put a comment there.
If you want the most recent value, here is a relatively easy way:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG,
max(LAB_PDATUM) as maxLAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
) TEMP2;
It does not, however, affect the outer count.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
MYSQL OR vs IN performance
I was wondering whats the difference between:
SELECT * FROM table WHERE cat = 'cat1' OR cat = 'cat2' OR cat = 'cat3'
And:
SELECT * FROM table WHERE cat in ('cat1', 'cat2', 'cat3')
Is there any difference? As I tried they both give out same result.
They are identical the IN() is just a short-hand version using listing out all of the OR statements.
IN() just makes for a much shorter, and easier to read, syntax especially when you have a lot of OR clauses.
There is one more thing that you can do quite easily with IN but you can't with =.
SELECT *
FROM `table`
WHERE `column` IN (
SELECT col_name
FROM `table2` -- or the same table
WHERE `some_column` = 5
)
So basically you search in a subset of another table, which sometimes comes in handy.
Real life usage:
You have a list of administrable user types (each with its own permissions) in another table, and you want to enforce that the user type actually exists.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
COUNT(*) vs. COUNT(1) vs. COUNT(pk): which is better?
I want to retrieve the count from a select query.
What is faster: count(*) or count(table_field_name)?
I want to know which way is faster for performance.
The difference is Count(field) returns count of NOT NULL values in the field, whether COUNT(*) returns COUNT of rows.
COUNT(*) in MyIsam should be faster.
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count
at least on MyISAM tables count(*) should be faster than count(fliedname) as it allows mysql to use an index (the primary key most times) to do the counting. if the given fieldname is the primary key, it wont make any difference.
using *, mysql wont be so dump to "load the data of the entire row" as others said - count(*) is always the fastest or one of the fastest options while count(fieldname) could be slower, depending on what field is given.
EDIT:
the documantation says:
COUNT(*) is optimized to return very quickly [...]. This optimization applies only to MyISAM tables only
read on the documentation for more information about this topic.
Important note: count(*) returns to total count of rows while count(fieldname) returns the count of rows there that given field isn't NULL. this is logically consistent as with * mysql can't know wich NULL-values to leave out. always think of this when doing count() as it may have a bic impact on the result.
Related (SQL Server): Count(*) vs Count(1)
Could you please tell me what is better in performance (MySQL)? Count(*) or count(1)?
This is a MySQL answer.
They perform exactly the same - unless you are using MyISAM, then a special case for COUNT(*) exists. I always use COUNT(*) anyway.
https://dev.mysql.com/doc/refman/5.6/en/aggregate-functions.html#function_count
For MyISAM tables, COUNT(*) is optimized to return very quickly if the
SELECT retrieves from one table, no other columns are retrieved, and
there is no WHERE clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization only applies to MyISAM
tables, because an exact row count is stored for this storage engine
and can be accessed very quickly. COUNT(1) is only subject to the same
optimization if the first column is defined as NOT NULL.
###EDIT
Some of you may have missed the dark attempt at humour. I prefer to keep this as a non-duplicate question for any such day when MySQL will do something different to SQL Server. So I threw a vote to reopen the question (with a clearly wrong answer).
The above MyISAM optimization applies equally to
COUNT(*)
COUNT(1)
COUNT(pk-column)
COUNT(any-non-nullable-column)
So the real answer is that they are always the same.