How is COUNT(*) different to COUNT(id)? [duplicate] - mysql

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.

count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2

Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;

A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.

The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.

We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.

The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table

COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.

Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.

Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL

It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not

There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,

As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)

Related

What is a Self join and how does it works? I do know it is used to join a table to itself, I am trying to understand the working mechanism behind it

Why the o/p is different if I self join the table using a.b instead of a.c? Moreover, why the o/p is Y is equal to 1 and not 2?
CREATE TABLE A (
B INT,
C CHAR(20)
);
INSERT INTO A VALUES (1,"X"),(2,"X"),(3,"Y"),(1,"T"),(2,"T");
SELECT
*
FROM
A;
SELECT
a.c, COUNT(a.c) AS c1
FROM
A a
JOIN
A a1 ON a.c = a1.c
GROUP BY a.c;
The reason you get a duplicate is because there is no GROUP BY or DISTINCT clause in your query reducing the results to distinct rows based on the column value. Your query returns a count for every row of the table, that is normal.
The reason your results are different is because your joining on a different column, so if you are joining on the number column, you are counting the numbers, not the letters.
TableA
If your raw table values are as follows:
SELECT * FROM tableA;
id | letter
-----------
1 | X
1 | T
2 | X
2 | T
3 | Y
Example 1
You can manually write a query in your select statement as follows. Effectively, a separate query is performed for every row returned.
EXPLAIN -- show query breakdown
SELECT
DISTINCT -- get distinct letter, no duplicate rows.
a1.letter
, (SELECT count(*) FROM tableA a2 WHERE a1.letter = a2.letter) letter_cnt
FROM
tableA a1
;
This query is less effective, requiring two separate queries.
id
select_type
table
type
possible_keys
key
ref
rows
filtered
Extra
1
PRIMARY
a1
ALL
NULL
NULL
NULL
5
100
NULL
2
DEPENDENT SUBQUERY
a2
ALL
NULL
NULL
NULL
5
20
Using where
letter | id
-----------
X | 2
Y | 1
T | 2
Example 2
EXPLAIN -- show query breakdown
SELECT
a.letter
, COUNT(a.letter) AS cnt
FROM
tableA a
GROUP BY
a.letter;
This method is more effective using one query and groups the first column giving you distinct letter rows.
id
select_type
table
type
possible_keys
key
ref
rows
filtered
Extra
1
SIMPLE
a
ALL
NULL
NULL
NULL
5
100
Using temporary; Using filesort
letter | letter
-----------
T | 2
X | 2
Y | 1

Kindly explain clearly the CTE Query?

I saw
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name) As RowNumber, *
FROM <table_name>
)
DELETE FROM tblTemp where RowNumber > 1
This query for deleting duplicate rows, but I can't understand the Query. Could you please explain clearly?
The most instructive thing for you to do would be to select from the CTE:
SELECT *
FROM tblTemp
ORDER BY Department, Name;
Off the top of my head, you might see a result set looking like this:
Name | Department | RowNumber
Jon Skeet | Software | 1
Gordon Linoff | Database | 1
Gordon Linoff | Database | 2
The (Gordon Linoff, Database) record appears in duplicate, so there is a row number value in the second record which is greater than 1. Your delete logic would remove this duplicate, but would not affect the (Jon Skeet, Software) record, which has no duplicate.
The common table expression is locating duplicate rows, then the delete query that follows removes all but one row if there are duplicates.
SELECT
ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name) As RowNumber
,*
FROM <table_name>
In the over() clause the partition and order by defines where numbering starts at 1, then within each partition each extra row gets a larger row number. Hence when [RowNumber] = 1 you have a unique set of rows for name and department.
see: Over() and row_number()
Demonstration
CREATE TABLE mytable(
name VARCHAR(6) NOT NULL
,department VARCHAR(11) NOT NULL
);
INSERT INTO mytable(name,department) VALUES ('fred','sales');
INSERT INTO mytable(name,department) VALUES ('fred','sales');
INSERT INTO mytable(name,department) VALUES ('barney','admin');
INSERT INTO mytable(name,department) VALUES ('barney','admin');
INSERT INTO mytable(name,department) VALUES ('wilma','engineering');
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name) As RowNumber, *
FROM mytable
)
DELETE FROM tblTemp where RowNumber > 1
;
select
*
from mytable
;
+---+--------+-------------+
| | name | department |
+---+--------+-------------+
| 1 | fred | sales |
| 2 | barney | admin |
| 3 | wilma | engineering |
+---+--------+-------------+

SELECT where the first two numbers are equal

I have this in my database:
75012
75016
94400
94500
94300
78400
I would like to select only the string where only the first two numbers match and show how many 94 there are so it will output 75012 = 2, 94 = 3, 78 = 1.
Here is what I tried:
select cpostal from fiche_personne WHERE cpostal LIKE LEFT(cpostal, 2);
you need to use a group by clause in your query.
SELECT LEFT(cpostal,2), COUNT(*) AS total
FROM fiche_personne
GROUP BY LEFT(cpostal,2)
please note that the COUNT(*) isn't the best way to complete the query but I don't know your actual table structure, so you should change this to an actual column name
select count(cpostal) from fiche_personne WHERE cpostal LEFT(cpostal, 2) = 94;
Resource: https://www.w3schools.com/sql/func_mysql_count.asp
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(i INT NOT NULL PRIMARY KEY);
INSERT INTO my_table VALUES
(75012),
(75016),
(94400),
(94500),
(94300),
(78400);
SELECT MIN(i) i, COUNT(*) total FROM my_table GROUP BY LEFT(i,2);
+-------+-------+
| i | total |
+-------+-------+
| 75012 | 2 |
| 78400 | 1 |
| 94300 | 3 |
+-------+-------+

SELECT that returns list of values not occurring in any row

Query:
select id from users where id in (1,2,3,4,5)
If the users table contains ids 1, 2, 3, this would return 1, 2, and 3. I want a query that would return 4 and 5. In other words, I don't want the query to return any rows that exist in the table, I want to give it a list of numbers and get the values from that list that don't appear in the table.
(updated to clarify question following several inapplicable answers)
If you don't want to (explicitly) use temporary tables, this will work:
SELECT id FROM (
(SELECT 1 AS id) UNION ALL
(SELECT 2 AS id) UNION ALL
(SELECT 3 AS id) UNION ALL
(SELECT 4 AS id) UNION ALL
(SELECT 5 AS id)
) AS list
LEFT JOIN users USING (id)
WHERE users.id IS NULL
However, it is quite ugly, quite long, and I am dubious about how it would perform if the list of IDs is long.
Had the same need and built on the answer by BugFinder using a temporary table in session. This way it will automatically be destroyed after I'm done with the query, so I don't have to deal with house cleaning as I will run this type of query often.
Create the temporary table:
CREATE TEMPORARY TABLE tmp_table (id INT UNSIGNED);
Populate tmp_table with the values you will check:
INSERT INTO tmp_table (id) values (1),(2),(3),(4),(5);
With the table created and populated, run the query as with any regular table:
SELECT tmp_table.id
FROM tmp_table
LEFT JOIN users u
ON tmp_table.id = u.id
WHERE u.id IS NULL;
This info on MySQL Temporary Tables was also useful
Given the numbers are a fixed list. Quickest way I can think of is have a test table, populated with those numbers and do
untested select statement - but you will follow the princpal.
select test.number
from test
left join
users
on
test.number = users.id
where test.number <> users.id
Then you'll get back all the numbers that dont have a matching user.id and so can fill in the holes..
A different option is to use another table containing all possible ids and then do a select from there:
mysql> describe ids;
+-------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------------+------+-----+---------+-------+
| id | int(5) unsigned | NO | | 0 | |
+-------+-----------------+------+-----+---------+-------+
1 row in set (0.05 sec)
mysql> select * from ids;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----+
5 rows in set (0.00 sec)
mysql> select * from users;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
3 rows in set (0.00 sec)
mysql> select id from ids where id not in (select id from users);
+----+
| id |
+----+
| 4 |
| 5 |
+----+
2 rows in set (0.04 sec)
Added side effect - allows you to expand the result list by inserting into the ids table
select missing.id
from
(select ELT(#indx, 1,2,3,4,5) as id, #indx:=#indx+1
from (select #indx:=1) init,
users
where ELT(#indx, 1,2,3,4,5) is not null
) missing
left join users u using(id)
where u.id is null;
What you have in here is:
ELT together with the variable #indx allows you to 'transpose' the list into a column.
(select #indx:=1) is needed to initialize indx to 1
and users table in the inner select is needed so that MySQL has something to iterate on (so the size of your list cannot exceed the number of the rows in users table, if that's the case that you can use any other table that is big enough instead of inner users, again, table itself does not matter it's just to have something to iterate on, so only its size that matters).
ELT(#indx, 1,2,3,4,5) is not null condition in the nested select is to stop iteration once you are at the index exceeding your list size.
The rest is simple - left join and check for null.
Sorry, I cannot add comments, #Austin,
You have my +1.
Anyway, not sure if this works on mysql, but change all that atomic selects concatenated unions for a value set, so you have something like that:
SELECT id FROM (
VALUES (1), (2), (3), (4), (5)
) AS list(id)
LEFT JOIN users USING (id)
WHERE users.id IS NULL

Duplicating rows in sql query

I have a table containing several fields. The primary key is userId. Currently the user id column contains values '1,2,3,4...etc' like so:
+------+
|userId|
+------+
| 1 |
| 2 |
| 3 |
| 4 |
...etc
I now want to add new rows ending in a,b,c, like so:
+------+
|userId|
+------+
| 1 |
| 1a |
| 1b |
| 1c |
| 2 |
| 2a |
| 2b |
| 2c |
...etc
The new rows should be identical to their parent row, except for the userId. (i.e. 1a,1b & 1c should match 1)
Also I can't guarantee that there won't already be a few 'a', 'b' or 'c's in userid column.
Is there a way to write an sql query to do this quickly and easily?
DON'T DO IT you will run into more problems than the one you are trying to solve!
add a new column to store the letter and make the primary key cover the original UserId and this new column.
If you ever just want the userId, you need to split the letter portion off, which will be expensive for your query and be a real pain.
I agree with KM. I'm not sure why you're creating these duplicate/composite IDs, but it feels like an uncomfortable direction to take.
That said, there is only really one obsticle to overcome; Apparently you can't select from and insert into the same table in MySQL.
So, you need to insert into a Temporary Table first, then insert into the real table...
CREATE Temporary TABLE MyNewUserIDs (
UserID VARCHAR(32)
)
INSERT INTO
myNewUserIDs
SELECT
CONCAT(myTable.UserID, suffix.val)
FROM
myTable
INNER JOIN
(SELECT 'A' as val UNION ALL SELECT 'B' UNION ALL SELECT 'C' UNION ALL SELECT 'D') AS suffix
ON RIGHT(myTable.UserID, 1) <> Suffix.val
WHERE
NOT EXISTS (SELECT * FROM myTable AS lookup WHERE UserID = CONCAT(myTable.UserID, suffix.val))
INSERT INTO
myTable
SELECT
UserID
FROM
MyNewUserIDs
Depending on your environment, you may want to look into locking the tables, so that changes are not made between creating the list of IDs and inserting them into your table.
This is quite simple from a SQL perspective to generate the extra rows: I'll do that here
#Km's answer tells you how to store it as 2 distinct values which I've assumed here. Feel free to concatenate userid and suffix if you prefer.
INSERT myTable (userid, suffix, co11, col2, ...coln)
SELECT M.userid, X.suffix, M.col1, M.col2, ..., M.coln
FROM
myTable M
CROSS JOIN
(SELECT 'a' AS Suffix UNION ALL SELECT 'b' UNION ALL SELECT 'c') X
WHERE
NOT EXISTS (SELECT *
FROM
MyTable M2
WHERE
M2.userid = M.userid ANS M2.Suffix = X.Suffix)