count(*) and count(column_name), what's the diff? - mysql

count(*) and count(column_name), what's the difference in mysql.

COUNT(*) counts all rows in the result set (or group if using GROUP BY).
COUNT(column_name) only counts those rows where column_name is NOT NULL. This may be slower in some situations even if there are no NULL values because the value has to be checked (unless the column is not nullable).
COUNT(1) is the same as COUNT(*) since 1 can never be NULL.
To see the difference in the results you can try this little experiment:
CREATE TABLE table1 (x INT NULL);
INSERT INTO table1 (x) VALUES (1), (2), (NULL);
SELECT
COUNT(*) AS a,
COUNT(x) AS b,
COUNT(1) AS c
FROM table1;
Result:
a b c
3 2 3

Depending on the column definition -i.e if your column allow NULL - you could get different results (and it could be slower with count(column) in some situations as Mark already told).

There is no performance difference between COUNT (*), COUNT (ColumnName), COUNT (1).
Now, if you have COUNT (ColumnName) then the database has to check if the column has a NULL value, and NULLs are eliminated from aggregates. So COuNT (*) or COUNT (1) is preferable to COUNT (ColumnName) unless you want COUNT (DISTINCT ColumnName)

In most cases there's little difference, and COUNT(*) or COUNT(1) is generally preferred. However, there's one important situation where you must use COUNT(columnname): outer joins.
If you're performing an outer join from a parent table to a child table, and you want to get zero counts in rows that have no related items in the child table, you have to use COUNT(column in child table). When there's no matches, that column will be NULL, and you'll get the desired zero count (actually, you'll get NULL, but you can convert that to 0 with IFNULL() or COALESCE()). If you use COUNT(*), it counts the row from the parent table, so you'll get a count of 1.
SELECT c.name, COALESCE(COUNT(o.id), 0) AS order_count
FROM customers AS c
LEFT JOIN orders AS o ON o.customer_id = c.id

Related

Apache Drill: Providing a limit in the subquery for a lateral join is not returning the correct results

I am trying to create a simple query with a inner lateral join but I want to restrict the join to a single result in the subquery
select b.`CODE`
from foo.bar.`BRANCH` b
inner join lateral (
select branch_id
from foo.bar.`BRANCH_DISTANCE`
where branch_id=b.CODE
and distance < 100
limit 1
) on true
The BRANCH_DISTANCE table contains the distances between any two branches and I want to return all branches that are within 100 km of another branch, which is why in the subquery, as long as there is one record that contains the branch and its distance is less than 100, it should return the branch (and stop looking for any further matches).
But when I add the limit, the query returns only one record. On removing the limit, around 2000 records are returned.
If I replace the select b.CODE with select distinct b.CODE, the get around 500 results (which is the correct answer).
My objective is to not use the distinct keyword in the select statement and that is why I was adding the limit in the subquery so that the join is done not on every record in the BRANCH_DISTANCE table that contains the branch code and distance < 100 (because it is possible for a branch to be less than 100 km away from more than one branch).
Join may multiply resulting rows count for the case when joining is happening on the column with duplicate values (in this one, or both of branch_id and b.CODE columns have duplicate values).
To restrict the join to a single result in a subquery, please use IN clause.
So something like this should work as expected:
select b.`CODE`
from foo.bar.`BRANCH` b
where b.`CODE` in (
select branch_id
from foo.bar.`BRANCH_DISTANCE`
and distance < 100
)

Explaining MySQL query with multiple tables listed in FROM

a, b are not directly related.
What does a,b have to do with the results?
select * from a,b where b.id in (1,2,3)
can you explain sql?
Since you haven't specified a relationship between a and b, this produces a cross product. It's equivalent to:
SELECT *
FROM a
CROSS JOIN b
WHERE b.id IN (1, 2, 3)
It will combine every row in a with the three selected rows from b. If a has 100 rows, the result will be 300 rows.
What you using is Multitable SELECT.
Multitable SELECT (M-SELECT) is similar to the join operation. You
select values from different tables, use WHERE clause to limit the
rows returned and send the resulting single table back to the
originator of the query.
The difference with M-SELECT is that it would return multiply tables
as the result set. For more deatils: https://dev.mysql.com/worklog/task/?id=358
In other word, you query is :
SELECT *
FROM a
CROSS JOIN b
WHERE b.id in (1,2,3)

How to join a derived table

I have a complex query which results in a table which includes a time column. There are always two rows with the same time:
The result also contains a value column. The value of two rows with the same time is always different.
I now want to extend the query to join the rows with the same time together. So my thought was to join the derived table like this:
SELECT A.time, A.value AS valueA, B.value as valueB FROM
(
OLD_QUERY
) AS A INNER JOIN A AS B ON
A.time=B.time AND
A.value <> B.value;
However, the JOIN A AS B part of the query does not work. A is not recognized as the derived table. MySQL is searching for a table A in the database and does not find it.
So the question is: How can I join a derived table?
You cannot join a single reference to a table (or subquery) to itself; a subquery must be repeated.
Example: You cannot even do
SELECT A.* FROM sometable AS A INNER JOIN A ...
The A after the INNER JOIN is invalid unless you actually have a real table called A.
You can insert the subquery's results into another table, and use that; but it cannot be a true TEMPORARY table, as those cannot be joined to themselves or referenced twice at all in almost any query. _By referenced twice, I mean joined, unioned, used as an "WHERE IN" subquery when it is already referenced in the FROM.
If nothing else distinguishes the rows, you can just use aggregation to get the two values:
select time, min(value), max(value)
from (<your query here>) a
group by time;
In MySQL 8+, you can use a cte:
with a as (
<your query here>
)
select a1.time, a1.value, a2.value
from a a1 join
a a2
on a1.time = a2.time and a1.value <> a2.value;

select distinct count(id) vs select count(distinct id)

I'm trying to get distinct values from a table. When I ran select distinct count(id) from table I got over a million counts. However if I ran select count(distinct id) from table I've got only around 300k counts. What was the difference of the two queries?
Thanks
When you do select distinct count(id) then you are basically doing:
select distinct cnt
from (select count(id) as cnt from t) t;
Because the inner query only returns one row, the distinct is not doing anything. The query counts the number of rows in the table (well, more accurately, the number of rows where id is not null).
On the other hand, when you do:
select count(distinct id)
from t;
Then the query counts the number of different values that id takes on in the table. This would appear to be what you want.
If id is the pk the count with distinct count(id) will match the no of rows returned with count(distinct id).
If id is not the pk but has a unique constraint(on id alone, not in combination with any other column), the no of rows returned with count(distinct id) will be equal to the count with distinct count(id), as in the case of pk.
If id is just another column, select count distinct count(id) from table will return one row with the no of records where the id column is NOT NULL where as select count count(distinct id) from table will return 'one column' with all non NULL unique ids in the table.
In no case will the count or the no of rows returned exceed the total no of rows in your table.
The second select is definitely what you want, because it will aggregate the id's (if you have 10 records with id=5 then they will all be counted as one record) and the select will return "how many distinct id's were in the table".
However the first select will do something odd, and i'm not entirely sure what it will do.

Get number of duplicate rows resulting from a DISTINCT query

I have a table with rows where a, b, and c are commonly the same.
I have a query that gives me each unique record. I'm trying to get the count, of the duplicate records for each distinct record returned.
SELECT DISTINCT
a,
b,
c,
COUNT(id) as counted
FROM
table
The COUNT here returns the count for all the records. What I was looking for was the count of records identical to the unique record.
SELECT a,b,c,COUNT(*) FROM table GROUP BY a,b,c
SELECT DISTINCT
a,
b,
c,
(
SELECT
COUNT(id)
FROM
table_name t1
WHERE
t2.a = t1.a
) AS counted
FROM
table_name t2
The above sub query know as inline sub query. in where clause t1 and t2 treat as different table(It's single table in DB) by query. So it check the equality and then count. as we put distinct for a column so all play done with that only.
I hope am able to enplane.
Ah, figured this one out from a duplicate as I was writing the question - I figured I'd share my results as they were different enough from the answer I got mine from.
I have to use a subquery to get query non-distinct records. Then, I can use results from the first query in the subquery's WHERE clause.
SELECT DISTINCT
a,
b,
c,
(
SELECT
COUNT(id)
FROM
table_name t1
WHERE
t2.a = t1.a
) AS counted
FROM
table_name t2
This works. Let me know if there are gaps in my understanding.
With help from this answer: https://stackoverflow.com/a/14110336/1270996