How does this count work? - sql-server-2008

My query is given below:
select vend_id,
COUNT(*) as num_prods
from Products
group by vend_id;
Please tell me how does this part work - select vend_id, COUNT(vend_id) as opposed to select COUNT(vend_id)?

select COUNT(vend_id)
That will return the number of rows where the vendor ID is not null
select vend_id, COUNT(*) as num_prods
from Products
group by vend_id
That will group the elements by Id's, and return, for each Id, how many rows do you have.
An example:
ID name salary start_date city region
----------- ---------- ----------- ----------------------- ---------- ------
1 Jason 40420 1994-02-01 00:00:00.000 New York W
2 Robert 14420 1995-01-02 00:00:00.000 Vancouver N
3 Celia 24020 1996-12-03 00:00:00.000 Toronto W
4 Linda 40620 1997-11-04 00:00:00.000 New York N
5 David 80026 1998-10-05 00:00:00.000 Vancouver W
6 James 70060 1999-09-06 00:00:00.000 Toronto N
7 Alison 90620 2000-08-07 00:00:00.000 New York W
8 Chris 26020 2001-07-08 00:00:00.000 Vancouver N
If you run this query, you will get One row for city, and you can apply a function (in this case, count) to that row. So, for each city, you will get the count of rows. You can also use other functions.
SELECT City, COUNT(*) as Employees
FROM Employee
GROUP BY City
The result is:
City Employees
--------- ---------
New York 3
Toronto 2
Vancouver 3
as you can compare the numbers of rows for each city

When you simply select COUNT(vend_id) with no GROUP BY clause, you get one row with the total count of rows with a non-NULL vendor ID - that last bit is important and is one reason why you may prefer COUNT(*) so as to avoid "missing" rows. Some people may argue that COUNT(*) is somehow less efficient but that's true in no DBMS I've used. In any case, if you are using a brain-dead DBMS, you can always try COUNT(1).
When you group by vend_id, you get one row per vendor ID with the count being the number of rows for that ID.
In step-by-step detail (conceptually, though there are almost certainly efficiencies to be gained by optimising), the first query:
SELECT COUNT(vend_id) AS num_prods FROM products
Get a list of all rows in products.
Count the rows where vend_id is not NULL, then deliver one row containing that count in the single num_prods column.
For the grouping one:
SELECT vend_id, COUNT(vend_id) AS num_prods FROM products GROUP BY vend_id
Get a list of all rows in products.
For each value of vend_id:
Count the rows matching that vend_id where vend_id is not NULL, then deliver one row containing the vend_id in the first column and that count in the second num_prods column.
Note that those rows with a null vend_id do not contribute to the aggregate function (count in this case).
In the first query, that simply means they don't appear in the overall total.
In the second case, it means that the output row still exists but the count will be zero. That's another good reason to use COUNT(*) or COUNT(1).

select vend_id will only select the vend_id field, where select * will select all the fields

select vend_id, COUNT(vend_id) and select COUNT(vend_id) gives same result for the count column as long as you use group by vend_id. when you use select vend_id, COUNT(vend_id) you must group by it using vend_id

Related

Select Distinct whilst adding tuples together in SQL

I have a table that contains random data against a key with duplicate entries. I'm looking to remove the duplicates (a projection as it is called in relational algebra), but rather than discarding the attached data, sum it together. For example:
orderID cost
1 5
1 2
1 10
2 3
2 3
3 15
Should remove duplicates from orderID whilst summing each orderID's values:
orderID cost
1 17 (5 + 2 + 10)
2 6
3 15
My assumption is I'd use SELECT DISTINCT somehow, but I don't know how I'd go about doing so. I understand GROUP BY might be able to do something but I am unsure.
This is a very basic aggregation:
SELECT orderId, SUM(cost) AS cost
FROM MyTable
GROUP BY orderId
This says, for each "orderId" grouping, sum the "cost" field and return one value per group.
You can use the group by clause to get one row per distinct values of the column(s) you're grouping by - orderId in this case. You can the apply an aggregate function to get a result of the columns you aren't grouping by - sum, in this case:
SELECT orderId, SUM(cost)
FROM mytable
GROUP BY orderId

get AVG() after GROUP BY in MYSQL

I just start to learn MYSQL and meet a problem like this
So the table is like this:
id name moneySpent
1 Alex 3
2 Alex 1
3 Bill 4
4 Alex 2
5 Alex 1
6 Chris 5
7 Chris 3
Lets say I wanna know the Average money spent per person. I try to do that by using SUM() GROUP BY and AVG() but I got stuck at AVG()
SELECT name, sum(moneySpent) AS total FROM table GROUP BY name;
then this will return
name total
Alex 7
Bill 4
Chris 8
Then how can I get a (7+4+8)/3 using AVG()?
You can get average per person using:
SELECT AVG(total) AS AVERAGE
FROM (SELECT name, sum(moneySpent) AS total
FROM table GROUP BY name) A
;
Output:
AVERAGE
6,3333
You can use inner query to get sum and outer query to derive average from sum as below.
SELECT Avg(sum1) FROM (
SELECT Sum(amount) AS sum1
FROM table1
GROUP BY NAME
) T1
It will generate below output.
AVERAGE_AMOUNT_SPENT
------------------
6.3333
which is what you want to be the output i.e. (7+4+8)/3 = 6.333
You can check demo here
So there are 2 ways to do this, first is use a new table to store the SELECT result. It is much more esay but may take more space.
Second is by jarlh, It comes to me that I do not need to GROUP BY the whole table, I can just add all moneySpent up and divided by distinct name count.
Thanks people!
select avg(total) as average from (SELECT name, sum(moneySpent) AS total FROM table GROUP BY name);
You can use this query to get your desired output
OUTPUT:
AVERAGE
6.3333333333333333

SQL query for selecting maximum from 2 different columns

I got a question in my homework for SQL about selecting the maximum values from the same table that have different class "Letters"
For example:
ID Student Group Avg(value)
-------------------------------------
1 stud1 A 9
2 stud2 A 9.5
3 stud3 B 8
4 stud4 B 8.5
What my query should do, is to show stud2 and stud4.The maximum from their respective groups.
I managed to do it in the end, but it took a lot of characters so I thought that maybe there's a shorter way to do. Any ideas? I used to first search the id or the stud that has max avg(value) from group A, intersecting with the id of the stud that has max avg(value) from B and then putting everything into one big select and then using those intersected IDs into another query that requested to show some different things about those IDs. But as I said, it looked far too long and thought that maybe there's an shorter way.
Try this (I renamed group to grp and avg to avg_val as those are reserved keywords):
select t1.*
from your_table t1
inner join (
select grp, max(avg_val) avg_val
from your_table
group by grp
) t2 on t1.grp = t2.grp
and t1.avg_val = t2.avg_val;
It finds maximum avg value per group and joins it with original table to get the corresponding students.
Please note that if there are multiple students with same avg as the max value of the that group, all of those students will be returned.

How to combine integers from rows and make it one distinct row

My current SQL query outputs something like this:
Team Amount
A 10.00
B 20.00
C 40.00
C 30.00
I was wondering how I could make the query only output a single row for multiple "teams" and add the integers together for all teams - basically all teams display only once and the amount is the sum of all that team's entries in the database.
For example, the correct way I want the example above to output would be like this:
Team Amount
A 10.00
B 20.00
C 70.00
You need a straightforward sum and group-by:
select team, sum(amount) as amount
from mytable
group by team
order by team
It is unclear whether you want arbitrary (ie no) ordering, ordering by team, or ordering by the sum. If you want to order by the sum, change the order-by clause to:
order by sum(amount)

Select distinct column along with some other columns in MySQL

I can't seem to find a suitable solution for the following (probably an age old) problem so hoping someone can shed some light. I need to return 1 distinct column along with other non distinct columns in mySQL.
I have the following table in mySQL:
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
3 James Barbados 3 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
6 Declan Trinidad 3 WI
I would like SQL statement to return the DISTINCT name along with the destination, rating based on country.
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
As you can see, James and Declan have different ratings, but the same name, so they are returned only once.
The following query returns all rows because the ratings are different. Is there anyway I can return the above result set?
SELECT (distinct name), destination, rating
FROM table
WHERE country = 'WI'
ORDER BY id
Using a subquery, you can get the highest id for each name, then select the rest of the rows based on that:
SELECT * FROM table
WHERE id IN (
SELECT MAX(id) FROM table GROUP BY name
)
If you'd prefer, use MIN(id) to get the first record for each name instead of the last.
It can also be done with an INNER JOIN against the subquery. For this purpose the performance should be similar, and sometimes you need to join on two columns from the subquery.
SELECT
table.*
FROM
table
INNER JOIN (
SELECT MAX(id) AS id FROM table GROUP BY name
) maxid ON table.id = maxid.id
The problem is that distinct works across the entire return set and not just the first field. Otherwise MySQL wouldn't know what record to return. So, you want to have some sort of group function on rating, whether MAX, MIN, GROUP_CONCAT, AVG, or several other functions.
Michael has already posted a good answer, so I'm not going to re-write the query.
I agree with #rcdmk . Using a DEPENDENT subquery can kill performance, GROUP BY seems more suitable provided that you have already INDEXed the country field and only a few rows will reach the server. Rewriting the query giben by #rcdmk , I added the ORDER BY NULL clause to suppress the implicit ordering by GROUP BY, to make it a little faster:
SELECT MIN(id) as id, name, destination as rating, country
FROM table WHERE country = 'WI'
GROUP BY name, destination ORDER BY NULL
You can do a GROUP BY clause:
SELECT MIN(id) AS id, name, destination, AVG(rating) AS rating, country
FROM TABLE_NAME
GROUP BY name, destination, country
This query would perform better in large datasets than the subquery alternatives and it can be easier to read as well.