MySQL COUNT() and duplicates - mysql

I have this query:
select count(name) as nr
from team where city='ny' and name=ANY
(select teamName from contract where playerCode=ANY
(select code from player where name='X' and surname='Y'));
I don't understand why the count() function doesn't count the duplicates even if there is no distinct clause.
These are the tables:
Player(code, name, surname)
Contract(id, playerCode, teamName, year)
Team(name, city)
With this integrity constraints:
Contract(playerCode)-->Player(code)
Contract(teamName)-->Team(name)
The query extracts the number of teams of NY city which have a contract with a player named X,Y.
Thanks.

Your query will count duplicates. My guess is that you are expecting duplicates from multiple matches in the subquery. However, the subquery is not a JOIN and so will not be duplicating results from the team table, regardless of how many matches there are in the subquery.
Each row in the team table will only have the WHERE conditions evaluated once, and will be included (once) based on whether those conditions are satisfied.
Assuming there are no NULL values in team.name, if you SELECT COUNT(name), COUNT(*) both fields should have the same value.

Related

Ages of people above average MySQL ages WHERE, HAVING How? [duplicate]

I have the following two tables:
1. Lecturers (LectID, Fname, Lname, degree).
2. Lecturers_Specialization (LectID, Expertise).
I want to find the lecturer with the most Specialization.
When I try this, it is not working:
SELECT
L.LectID,
Fname,
Lname
FROM Lecturers L,
Lecturers_Specialization S
WHERE L.LectID = S.LectID
AND COUNT(S.Expertise) >= ALL (SELECT
COUNT(Expertise)
FROM Lecturers_Specialization
GROUP BY LectID);
But when I try this, it works:
SELECT
L.LectID,
Fname,
Lname
FROM Lecturers L,
Lecturers_Specialization S
WHERE L.LectID = S.LectID
GROUP BY L.LectID,
Fname,
Lname
HAVING COUNT(S.Expertise) >= ALL (SELECT
COUNT(Expertise)
FROM Lecturers_Specialization
GROUP BY LectID);
What is the reason? Thanks.
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows. Your query calls for a second kind of condition (i.e. a condition on an aggregation) hence HAVING works correctly.
As a rule of thumb, use WHERE before GROUP BY and HAVING after GROUP BY. It is a rather primitive rule, but it is useful in more than 90% of the cases.
While you're at it, you may want to re-write your query using ANSI version of the join:
SELECT L.LectID, Fname, Lname
FROM Lecturers L
JOIN Lecturers_Specialization S ON L.LectID=S.LectID
GROUP BY L.LectID, Fname, Lname
HAVING COUNT(S.Expertise)>=ALL
(SELECT COUNT(Expertise) FROM Lecturers_Specialization GROUP BY LectID)
This would eliminate WHERE that was used as a theta join condition.
First we should know the order of execution of Clauses i.e
FROM > WHERE > GROUP BY > HAVING > DISTINCT > SELECT > ORDER BY.
Since WHERE Clause gets executed before GROUP BY Clause the records cannot be filtered by applying WHERE to a GROUP BY applied records.
"HAVING is same as the WHERE clause but is applied on grouped records".
first the WHERE clause fetches the records based on the condition then the GROUP BY clause groups them accordingly and then the HAVING clause fetches the group records based on the having condition.
HAVING operates on aggregates. Since COUNT is an aggregate function, you can't use it in a WHERE clause.
Here's some reading from MSDN on aggregate functions.
WHERE clause can be used with SELECT, INSERT, and UPDATE statements, whereas HAVING can be used only with SELECT statement.
WHERE filters rows before aggregation (GROUP BY), whereas HAVING filter groups after aggregations are performed.
Aggregate function cannot be used in WHERE clause unless it is in a subquery contained in HAVING clause, whereas aggregate functions can be used in HAVING clause.
Source
Didn't see an example of both in one query. So this example might help.
/**
INTERNATIONAL_ORDERS - table of orders by company by location by day
companyId, country, city, total, date
**/
SELECT country, city, sum(total) totalCityOrders
FROM INTERNATIONAL_ORDERS with (nolock)
WHERE companyId = 884501253109
GROUP BY country, city
HAVING country = 'MX'
ORDER BY sum(total) DESC
This filters the table first by the companyId, then groups it (by country and city) and additionally filters it down to just city aggregations of Mexico. The companyId was not needed in the aggregation but we were able to use WHERE to filter out just the rows we wanted before using GROUP BY.
You can not use where clause with aggregate functions because where fetch records on the basis of condition, it goes into table record by record and then fetch record on the basis of condition we have give. So that time we can not where clause. While having clause works on the resultSet which we finally get after running a query.
Example query:
select empName, sum(Bonus)
from employees
order by empName
having sum(Bonus) > 5000;
This will store the resultSet in a temporary memory, then having clause will perform its work. So we can easily use aggregate functions here.
1.
We can use aggregate function with HAVING clause not by WHERE clause e.g. min,max,avg.
2.
WHERE clause eliminates the record tuple by tuple
HAVING clause eliminates entire group from the collection of group
Mostly HAVING is used when you have groups of data and WHERE is used when you have data in rows.
WHERE clause is used to eliminate the tuples in a relation,and HAVING clause is used to eliminate the groups in a relation.
HAVING clause is used for aggregate functions such as
MIN,MAX,COUNT,SUM .But always use GROUP BY clause before HAVING clause to minimize the error.
Both WHERE and HAVING are used to filter data.
In case of a WHERE statement, data filtering happens before you pull the data for operation.
SELECT name, age
FROM employees
WHERE age > 30;
Here the WHERE clause filters rows before the SELECT operation is performed.
SELECT department, avg(age) avg_age
FROM employees
GROUP BY department
HAVING avg_age> 35;
HAVING filters the data after the SELECT operation is performed. Here the operation of computing (aggregation) is done first and then a filter is applied to the result using a HAVING clause.

SQL Writing the average wage [duplicate]

I have the following two tables:
1. Lecturers (LectID, Fname, Lname, degree).
2. Lecturers_Specialization (LectID, Expertise).
I want to find the lecturer with the most Specialization.
When I try this, it is not working:
SELECT
L.LectID,
Fname,
Lname
FROM Lecturers L,
Lecturers_Specialization S
WHERE L.LectID = S.LectID
AND COUNT(S.Expertise) >= ALL (SELECT
COUNT(Expertise)
FROM Lecturers_Specialization
GROUP BY LectID);
But when I try this, it works:
SELECT
L.LectID,
Fname,
Lname
FROM Lecturers L,
Lecturers_Specialization S
WHERE L.LectID = S.LectID
GROUP BY L.LectID,
Fname,
Lname
HAVING COUNT(S.Expertise) >= ALL (SELECT
COUNT(Expertise)
FROM Lecturers_Specialization
GROUP BY LectID);
What is the reason? Thanks.
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows. Your query calls for a second kind of condition (i.e. a condition on an aggregation) hence HAVING works correctly.
As a rule of thumb, use WHERE before GROUP BY and HAVING after GROUP BY. It is a rather primitive rule, but it is useful in more than 90% of the cases.
While you're at it, you may want to re-write your query using ANSI version of the join:
SELECT L.LectID, Fname, Lname
FROM Lecturers L
JOIN Lecturers_Specialization S ON L.LectID=S.LectID
GROUP BY L.LectID, Fname, Lname
HAVING COUNT(S.Expertise)>=ALL
(SELECT COUNT(Expertise) FROM Lecturers_Specialization GROUP BY LectID)
This would eliminate WHERE that was used as a theta join condition.
First we should know the order of execution of Clauses i.e
FROM > WHERE > GROUP BY > HAVING > DISTINCT > SELECT > ORDER BY.
Since WHERE Clause gets executed before GROUP BY Clause the records cannot be filtered by applying WHERE to a GROUP BY applied records.
"HAVING is same as the WHERE clause but is applied on grouped records".
first the WHERE clause fetches the records based on the condition then the GROUP BY clause groups them accordingly and then the HAVING clause fetches the group records based on the having condition.
HAVING operates on aggregates. Since COUNT is an aggregate function, you can't use it in a WHERE clause.
Here's some reading from MSDN on aggregate functions.
WHERE clause can be used with SELECT, INSERT, and UPDATE statements, whereas HAVING can be used only with SELECT statement.
WHERE filters rows before aggregation (GROUP BY), whereas HAVING filter groups after aggregations are performed.
Aggregate function cannot be used in WHERE clause unless it is in a subquery contained in HAVING clause, whereas aggregate functions can be used in HAVING clause.
Source
Didn't see an example of both in one query. So this example might help.
/**
INTERNATIONAL_ORDERS - table of orders by company by location by day
companyId, country, city, total, date
**/
SELECT country, city, sum(total) totalCityOrders
FROM INTERNATIONAL_ORDERS with (nolock)
WHERE companyId = 884501253109
GROUP BY country, city
HAVING country = 'MX'
ORDER BY sum(total) DESC
This filters the table first by the companyId, then groups it (by country and city) and additionally filters it down to just city aggregations of Mexico. The companyId was not needed in the aggregation but we were able to use WHERE to filter out just the rows we wanted before using GROUP BY.
You can not use where clause with aggregate functions because where fetch records on the basis of condition, it goes into table record by record and then fetch record on the basis of condition we have give. So that time we can not where clause. While having clause works on the resultSet which we finally get after running a query.
Example query:
select empName, sum(Bonus)
from employees
order by empName
having sum(Bonus) > 5000;
This will store the resultSet in a temporary memory, then having clause will perform its work. So we can easily use aggregate functions here.
1.
We can use aggregate function with HAVING clause not by WHERE clause e.g. min,max,avg.
2.
WHERE clause eliminates the record tuple by tuple
HAVING clause eliminates entire group from the collection of group
Mostly HAVING is used when you have groups of data and WHERE is used when you have data in rows.
WHERE clause is used to eliminate the tuples in a relation,and HAVING clause is used to eliminate the groups in a relation.
HAVING clause is used for aggregate functions such as
MIN,MAX,COUNT,SUM .But always use GROUP BY clause before HAVING clause to minimize the error.
Both WHERE and HAVING are used to filter data.
In case of a WHERE statement, data filtering happens before you pull the data for operation.
SELECT name, age
FROM employees
WHERE age > 30;
Here the WHERE clause filters rows before the SELECT operation is performed.
SELECT department, avg(age) avg_age
FROM employees
GROUP BY department
HAVING avg_age> 35;
HAVING filters the data after the SELECT operation is performed. Here the operation of computing (aggregation) is done first and then a filter is applied to the result using a HAVING clause.

mysql math operations - where clause [duplicate]

I have the following two tables:
1. Lecturers (LectID, Fname, Lname, degree).
2. Lecturers_Specialization (LectID, Expertise).
I want to find the lecturer with the most Specialization.
When I try this, it is not working:
SELECT
L.LectID,
Fname,
Lname
FROM Lecturers L,
Lecturers_Specialization S
WHERE L.LectID = S.LectID
AND COUNT(S.Expertise) >= ALL (SELECT
COUNT(Expertise)
FROM Lecturers_Specialization
GROUP BY LectID);
But when I try this, it works:
SELECT
L.LectID,
Fname,
Lname
FROM Lecturers L,
Lecturers_Specialization S
WHERE L.LectID = S.LectID
GROUP BY L.LectID,
Fname,
Lname
HAVING COUNT(S.Expertise) >= ALL (SELECT
COUNT(Expertise)
FROM Lecturers_Specialization
GROUP BY LectID);
What is the reason? Thanks.
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows. Your query calls for a second kind of condition (i.e. a condition on an aggregation) hence HAVING works correctly.
As a rule of thumb, use WHERE before GROUP BY and HAVING after GROUP BY. It is a rather primitive rule, but it is useful in more than 90% of the cases.
While you're at it, you may want to re-write your query using ANSI version of the join:
SELECT L.LectID, Fname, Lname
FROM Lecturers L
JOIN Lecturers_Specialization S ON L.LectID=S.LectID
GROUP BY L.LectID, Fname, Lname
HAVING COUNT(S.Expertise)>=ALL
(SELECT COUNT(Expertise) FROM Lecturers_Specialization GROUP BY LectID)
This would eliminate WHERE that was used as a theta join condition.
First we should know the order of execution of Clauses i.e
FROM > WHERE > GROUP BY > HAVING > DISTINCT > SELECT > ORDER BY.
Since WHERE Clause gets executed before GROUP BY Clause the records cannot be filtered by applying WHERE to a GROUP BY applied records.
"HAVING is same as the WHERE clause but is applied on grouped records".
first the WHERE clause fetches the records based on the condition then the GROUP BY clause groups them accordingly and then the HAVING clause fetches the group records based on the having condition.
HAVING operates on aggregates. Since COUNT is an aggregate function, you can't use it in a WHERE clause.
Here's some reading from MSDN on aggregate functions.
WHERE clause can be used with SELECT, INSERT, and UPDATE statements, whereas HAVING can be used only with SELECT statement.
WHERE filters rows before aggregation (GROUP BY), whereas HAVING filter groups after aggregations are performed.
Aggregate function cannot be used in WHERE clause unless it is in a subquery contained in HAVING clause, whereas aggregate functions can be used in HAVING clause.
Source
Didn't see an example of both in one query. So this example might help.
/**
INTERNATIONAL_ORDERS - table of orders by company by location by day
companyId, country, city, total, date
**/
SELECT country, city, sum(total) totalCityOrders
FROM INTERNATIONAL_ORDERS with (nolock)
WHERE companyId = 884501253109
GROUP BY country, city
HAVING country = 'MX'
ORDER BY sum(total) DESC
This filters the table first by the companyId, then groups it (by country and city) and additionally filters it down to just city aggregations of Mexico. The companyId was not needed in the aggregation but we were able to use WHERE to filter out just the rows we wanted before using GROUP BY.
You can not use where clause with aggregate functions because where fetch records on the basis of condition, it goes into table record by record and then fetch record on the basis of condition we have give. So that time we can not where clause. While having clause works on the resultSet which we finally get after running a query.
Example query:
select empName, sum(Bonus)
from employees
order by empName
having sum(Bonus) > 5000;
This will store the resultSet in a temporary memory, then having clause will perform its work. So we can easily use aggregate functions here.
1.
We can use aggregate function with HAVING clause not by WHERE clause e.g. min,max,avg.
2.
WHERE clause eliminates the record tuple by tuple
HAVING clause eliminates entire group from the collection of group
Mostly HAVING is used when you have groups of data and WHERE is used when you have data in rows.
WHERE clause is used to eliminate the tuples in a relation,and HAVING clause is used to eliminate the groups in a relation.
HAVING clause is used for aggregate functions such as
MIN,MAX,COUNT,SUM .But always use GROUP BY clause before HAVING clause to minimize the error.
Both WHERE and HAVING are used to filter data.
In case of a WHERE statement, data filtering happens before you pull the data for operation.
SELECT name, age
FROM employees
WHERE age > 30;
Here the WHERE clause filters rows before the SELECT operation is performed.
SELECT department, avg(age) avg_age
FROM employees
GROUP BY department
HAVING avg_age> 35;
HAVING filters the data after the SELECT operation is performed. Here the operation of computing (aggregation) is done first and then a filter is applied to the result using a HAVING clause.

select distinct count(id) vs select count(distinct id)

I'm trying to get distinct values from a table. When I ran select distinct count(id) from table I got over a million counts. However if I ran select count(distinct id) from table I've got only around 300k counts. What was the difference of the two queries?
Thanks
When you do select distinct count(id) then you are basically doing:
select distinct cnt
from (select count(id) as cnt from t) t;
Because the inner query only returns one row, the distinct is not doing anything. The query counts the number of rows in the table (well, more accurately, the number of rows where id is not null).
On the other hand, when you do:
select count(distinct id)
from t;
Then the query counts the number of different values that id takes on in the table. This would appear to be what you want.
If id is the pk the count with distinct count(id) will match the no of rows returned with count(distinct id).
If id is not the pk but has a unique constraint(on id alone, not in combination with any other column), the no of rows returned with count(distinct id) will be equal to the count with distinct count(id), as in the case of pk.
If id is just another column, select count distinct count(id) from table will return one row with the no of records where the id column is NOT NULL where as select count count(distinct id) from table will return 'one column' with all non NULL unique ids in the table.
In no case will the count or the no of rows returned exceed the total no of rows in your table.
The second select is definitely what you want, because it will aggregate the id's (if you have 10 records with id=5 then they will all be counted as one record) and the select will return "how many distinct id's were in the table".
However the first select will do something odd, and i'm not entirely sure what it will do.

Mysql query to find distinct rows shows incorrect result

I wish to find the total number of distinct records in a table.
I have a table with the following columns
id, name, product, rating, manufacturer price
This has around 128 rows with some duplicates based on different column names.
I only want to select distinct rows:
select distinct name, product, rating, maufacturer, price from table
This returns 47 rows
For pagination purposes, I need to find the total number of distinct records, so I have another satatement:
select distinct count(name), product, rating, maufacturer, price from table
But this returns 128 instead of 47.
How can I get the total number of distinct rows? Any help will be much appreciated. Thanks
You have the distinct and count reversed.
SELECT COUNT(DISTINCT column_name) FROM table_name
Also, I would drop the extra fields when counting, your results will be unexpected for those other fields.
It is not quite clear if you want to get the count in the SAME query with the results or if you want to run a different query. Here go both solutions. In the result as a new column:
select distinct name, product, rating, manufacturer, price, (
select count(*) from (
select distinct name, product, rating, manufacturer, price from table1
) as resultCount) as resultCount
from table1
Notice the previous solution will repeat the count(*) for each row, which is not very efficient, not even visually appealing. Try running two queries one getting the actual data and the other one to get the amount of records in the table that match that data:
select distinct name, product, rating, manufacturer, price from table1
select count(*) from (
select distinct name, product, rating, manufacturer, price from table1
) as result
Hope this helps
Try adding GROUP BY name, product, rating, maufacturer, price clause
It would require running your actual query TWICE... an INNER for distinct and then get the count of those as a single row returned, and then join that to the original select distinct...
select distinct
t1.product,
t1.rating,
t1.maufacturer,
t1.price,
JustTheCount.DistCnt
from
table t1,
( select count(*) as DistCnt
from ( select distinct
t2.product,
t2.rating,
t2.maufacturer,
t2.price
from
table t2 )
) JustTheCount
In the following query, you're getting rows with distinct names since the DISTINCT clause precedes the name column:
SELECT DISTINCT name, product, rating, maufacturer, price FROM table
However, to get the count of the same records, use the following format:
SELECT COUNT(DISTINCT name) FROM table
Notice that DISTINCT goes inside of the COUNT function so that you're counting the distinct names. You probably don't want to include the other columns in the count query because they will be a random sample from the set. Of course, if you want a random sample, then include them.
Most applications will run the count query first, followed by the query to return the results. Also keep in mind that COUNT(*) is only an estimate, and the value may differ from the actual number of records returned.
SELECT DISTINCT COUNT(name), product FROM table isn't even a valid query in MySQL 4.x. You can't mix aggregate and non-aggregate columns. IN 5.x, it'll run, but the values for the non aggregate columns will be a random sample from the set.
At the risk of sparking some flames here.. you could always use:
SQL_CALC_FOUND_ROWS as the first part of your SQL. This is very mysql specific though.
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();