SQL query which includes COUNT(*) in it's SELECT `clause` confuses me - mysql

I'm a newbie in SQL, trying to find my way through.
I have the following diagram:
and I'm being requested to
"Produce a list of number of items from each product which was ordered
in June 2004. Assume there's a function MONTH() and YEAR()"
The given solution is:
SELECT cat_num, COUNT(*)
FROM ord_rec AS O, include AS I
WHERE O.ord_num = I.ord_num AND
MONTH(O.ord_date) = 6 AND
YEAR(O.ord_date) = 2004
GROUP BY cat_num;
What I'm confused about is the COUNT(*). (specifically the asterisk within).
Does it COUNT all rows that are returned from the given query? So the asterisk refers to all of the returned ROWS? or am I far off?
Is it any different than having:
SELECT cat_num, COUNT(cat_num)
Thanks!

The COUNT(*) function returns the number of rows in a dataset using the SELECT statement. The function counts rows with NULL, duplicate, and non-NULL values.
The COUNT(cat_num) function returns the number of rows that do not contain NULL values.
Consider an example:
Block
Range
A
1-10
A
10-1
B
(NULL)
B
(NULL)
B
(NULL)
For this data,using query:
SELECT
COUNT(*),
COUNT(t.`Block`),
COUNT(t.`Range`)
FROM
`test_table` t
You'll obtain results :
count(*)
count(t.Block)
count(t.Range)
5
5
2
I hope that clears your confusion.

The COUNT(*) function returns the number of rows in a table in a query. It counts duplicate rows and rows that contain null values.
Overall, you can use * or ALL or DISTINCT or some expression along
with COUNT to COUNT the number of rows w.r.t. some condition or all of
the rows, depending up on the arguments you are using along with
COUNT() function.
Possible parameters for COUNT()
When the * is used for COUNT(), all records ( rows ) are COUNTed if some content NULL but COUNT(column_name) does not COUNT a record if its field is NULL.
Resources here.

Related

multiple count in join

I am trying to get the count of ids from one table with left join in a mysql query. it works well when i have one count. but when i try to add an additional count the result of the second count is the same as first count. so how to fix this query to have two counts.
note: 1 st count result should be based on join condition
2 nd count result should be over all count not based on join
SELECT COUNT(*)
counts all rows.
SELECT COUNT(column_name)
counts just the values that are not NULL in that particular column.
So in your case your first count should be COUNT(a column from your joined table) and your second count should be COUNT(*).
For special cases you can also use boolean expressions. For example
SELECT SUM(my_column = 'foo')
counts just the values where the value in my_column is foo, because the boolean expression returns 1 if true and 0 otherwise.

Why does HAVING MAX() return a different value than SELECT MAX()?

I have a table log that contains, among others, a DateTime column called TimeOfLog and a foreign key Logger_ID.
What I was trying to do was get the newest entry per Logger_ID.
SELECT l.TimeOfLog AS TimeOfLog, l.Logger_ID AS Logger_ID
FROM `log` `l`
GROUP BY l.Logger_ID
HAVING MAX(l.TimeOfLog)
this however returns more or less a random TimeOfLog belonging to that Logger_ID. If I then run
SELECT MAX(l.TimeOfLog) AS TimeOfLog, l.Logger_ID AS Logger_ID
FROM `log` `l`
GROUP BY l.Logger_ID
I get the expected, newest, result. However, I'm pretty sure the Logger_ID is not the one belonging to that TimeOfLog.
Why is that/What am I misunderstanding here?
To get the maximum row, don't think group by; think filtering. Here is one method:
select l.*
from log l
where l.timeoflog = (select max(t2.timeoflog)
from log l2
where l2.logger_id = l.logger_id
);
If you just want the maximum time, then aggregation is appropriate:
select logger_id, max(timeoflog)
from log l
group by logger_id;
You have the expression:
HAVING MAX(l.TimeOfLog)
This just checks that the maximum is not 0 or NULL.
You are misunderstarding how GROUP BY AND HAVING works.
GROUP BY groups all rows that have same values in columns specified columns together into one group. If you select one column that is not mentioned in GROUP BY without using agregate function, you will randomly get one value from the grouped rows.
If you use agregate function like MAX() then the function is applied on all grouped rows and then result is selected.
HAVING is a filter similar to WHERE but while WHERE is applied before grouping the HAVING filter is applied after grouping.
You can use aggregate functions there. The correct usage of having might be for example
SELECT column,
FROM table
GROUP BY column
HAVING COUNT(*) > 1
This query would only select values of column that are present more than once.
In your example the MAX(c.TimeOfLog) will always be true as long as c.TimeOfLog is not empty for at least one row in group so it won't filter anything.

Why can't this subquery return more than one row?

This Query is being generated by Django ORM using RawSQL:
SELECT `productos`.`codigo_barras`, (
SELECT
articulos.costo_us * (1 + articulos.iva_coef)
FROM
articulos
INNER JOIN (
SELECT
articulos.id, MAX(encargosProveedor.fecha_entrega)
FROM
articulos, encargosProveedor_listado_articulos, encargosProveedor, itemArticulosProveedor
WHERE
articulos.id = itemArticulosProveedor.articulos_id AND
encargosProveedor.id = encargosProveedor_listado_articulos.encargosproveedor_id
GROUP BY
articulos.producto_id
)
AS ultimos ON articulos.id = ultimos.id
) AS `ultimo_precio` FROM `productos`
It's giving an error
1242 - Subquery returns more than 1 row
This is the result of the subquery
+----+--------------------------------------+
| id | MAX(encargosProveedor.fecha_entrega) |
+----+--------------------------------------+
| 1 | 2019-04-17 |
+----+--------------------------------------+
| 3 | 2019-04-17 |
+----+--------------------------------------+
I read the MYSQL documentation but i can't understand why is there a problem with returning two rows. I've tried a lot of alternatives.
Where is the problem?
Subqueries included as columns of a SELECT statement are called "scalar subqueries". A scalar subquery should be able to produce zero or one row only since its value (the scalar) will be placed in the returned row of the result set of the query, where there's room for one value only. Therefore, if a subquery returns more than a single row, it cannot be used directly as a SELECT column.
One option is to force it to produce one row at most, maybe using an aggregation function such as MAX(), MIN(), COUNT(), etc.
Another option is to join the subquery as a "table expression", where there are no restriction on the number of returned rows.
Too long for a comment.
It's not the
SELECT articulos.id, MAX(encargosProveedor.fecha_entrega)
FROM ...
subquery that's the problem. As that is part of a JOIN expression it is allowed to return more than one row. However, since that returns more than one row, the surrounding subquery:
SELECT articulos.costo_us * (1 + articulos.iva_coef)
FROM articulos
INNER JOIN (SELECT articulos.id, MAX(encargosProveedor.fecha_entrega)
FROM ...)
will also return more than one row.
You need to figure out a way to prevent the outer subquery returning more than one row even when the inner one does, possibly by using aggregation functions such as MIN or MAX. Alternatively, you need to find a way to distinguish between rows in the inner subquery that have the same MAX(encargosProveedor.fecha_entrega) value (perhaps ordering by another value with a LIMIT 1) so that query only returns one row.

Multiple counting in 1 sql statement

Lets say I have a table with a column of ages..
Here is the list of ages
1
2
3
1
1
3
I want the SQL to count how many of age 1s, how many of 2s and 3s.
The code:
Select count(age) as age1 where age = ‘1’;
Select count(age) as age2 where age = ‘2’;
Select count(age) as age3 where age = ‘3’;
Should work but would there be a way to just display it all using only 1 line of code?
This is an instance where the GROUP BY clause really shines:
SELECT age, COUNT(age)
FROM table_name
GROUP BY age
Just an additional tip:
You shouldn't use single quotes here in your query:
WHERE age = '1';
This is because age is an INT data type and therefore does not have single quotes. MySQL will implicitly convert age to the correct data type for you - and it's a negligible amount of overhead here. But imagine if you were doing a JOIN of two tables with millions of rows, then the overhead introduced would be something to consider.
Try this ,if the count is limited to three ages ,also using aggregate functions without grouping them will result in a single row,you can use SUM() with the condition which will result in a boolean and you can get the count based on your criteria
Select SUM(age = '1') as age1,
SUM(age = '2') as age2,
SUM(age = '3') as age3
from table
SELECT SUM(CASE WHEN age = 1 THEN 1 ELSE 0 END) AS age1,
SUM(CASE WHEN age = 2 THEN 1 ELSE 0 END) AS age2,
SUM(CASE WHEN age = 3 THEN 1 ELSE 0 END) AS age3
FROM YourTable
If your query should return only one column (age in this case, you can use Count+groupby):
SELECT age, Count(1) as qty
FROM [yourTable]
GROUP BY age
Remember you must include any additional column in your group by condition.
Select age as Age_Group, count(age) as Total_count from table1 group by age;
select age, count(age) from SomeTable group by age
http://sqlfiddle.com/#!2/b40da/2
The group by clause works like this:
When using aggregate functions, like the count function without a group by clause the function will apply to the entire dataset determined by the from and where clauses. A count will for instance count the number of rows in the result set, and sum over a specfic column will sum all the rows in the result set.
What the group by clause allows us to do, is to divide the result set determined by the from and where clause into partitions, so that the aggregate functions no longer applies to the result set as a whole, but rather within each partition of the result set.
When you specify a column to group by, what you are saying is something like "for each distinct value of column x in the result set, create a partition containing any row in the result set with this particular value in column x". Then, instead of yielding one result covering the entire resultset, aggregate functions will yield one result for each distinct value of column x in the result set.
With your example input of:
1
2
3
1
1
3
let's analyze the above query. As always, we should look at the from clause and the where clause first. The from clause tells us that we are selecting from SomeTable and only this, and the lack of a where clause tells us that we are selecting from the full contents of SomeTable.
Next, we'll look at the group by clause. It's present, and it groups by the age column, which is the only column in our example. The presence of the group by clause changes our dataset completely! Instead of selecting from the entire row set of SomeTable, we are now selecting from a set of partitions, one for each distinct value of the age-column in our original result set (which was every row in SomeTable).
At last, we'll look at the select-clause. Now, since we are selecting from partitions and not regular rows, the select-clause has fewer options for what it can contain, actually it only has 2: The column that it is grouped by, or an aggregate function.
Now, in our example we only have one column, but consider that we had another column, like here:
http://sqlfiddle.com/#!2/d5479/2
Now, imagine that in our data set we have two rows, both with age='1', but with different values in the other column. If we were to include this other column in a query that is grouped by the age-column (which we now know will return one row for each partition over the age-column), which value should be presented in the result? It makes no sense to include other column than the one you grouped by. (I'll leave multiple columns in the group by clause out of this, in my experience one usually just wants one..)
But back to our select-clause, knowing our dataset has the distinct values {1, 2, 3} in the age-column, we should expect to get 3 rows in our result set. The first thing to be selected is the age-column, which will yield the values [1, 2, 3]´ in the three rows. Next in theselect-list is an aggregate functioncount(age), which we now know will count the number of rows in each partition. So, for the row in the result whereage='1', it will count the number of rows withage='1', for the row whereage='2'it will count the number of rows whereage='2'`, and so on.
The result would look something like this:
age count(age)
1 3
2 1
3 2
(of course you are free to override the name of the second column in the result, with the as-operator..)
And that concludes today's lesson.

func.count(distinct(...)) does not give the same result as distinct().count()

I have a column with null entries, e.g. the possible values in this column are None, 1, 2, 3
When I count the number of unique entries in the column with session.query(func.count(distinct(Entry.col))).scalar() I get back '3'.
But when I perform the count with session.query(Entry.col).distinct().count(), I get back '4'.
Why does the latter method count the None, but the first doesn't?
In the first case, the resulting query will look like this:
SELECT COUNT(DISTINCT(col)) FROM Entry
... and, as you probably already know, COUNT here won't actually count the NULL values.
In the second case, however, the query is different, as shown in the doc:
SELECT count(1) AS count_1 FROM (
SELECT DISTINCT(col) FROM Entry
) AS anon_1
Now that just counts the total number of the rows returned by SELECT DISTINCT query (which is 4 - NULL is included in the output of DISTINCT queries).
The reason is simple: query.count purpose is to return the number of rows the query would have returned if run without count clause. This method doesn't give you control over which columns should be used to count - that's what func.count(...) is for.
MySQL COUNT doesn't count NULL values, so if you are counting values by a field that has NULL values, that rows won't be counted by COUNT.
DISTINCT returns just number of different values so NULL is included.