I appreciate that this may appear to many as a dum question but I cannot find a clear explanation anywhere as to what the effect of "group by" has on a select max(...) from SQL statement.
I have the following data (there is another column image of type mediumblob which is not shown):
id title test_id
1 bomb 0
2 Soft watch 2
3 Dali 1
4 Narciss 1
5 The Woman In Green 0
6 A summer in Vetheuil 0
7 Artist's Garden 2
8 Beech Forest 2
9 Claude Monet 0
I know if I perform
select max(id) from images
where image is not null;
I get the max value of id i.e.:
max(id)
9
However can someone please explain what is happening when I perform
select max(id), title, test_id
from images
where image is not null
group by id;
I find that the max(id) serves no useful purpose (results shown below)?
max(id) title test_id
1 bomb 0
2 Soft watch 2
3 Dali 1
4 Narciss 1
5 The Woman In Green 0
6 A summer in Vetheuil 0
7 Artist's Garden 2
8 Beech Forest 2
9 Claude Monet 0
In the case of using MAX() the GROUP BY clause essentially tells the query engine how to group the items from which to determine a maximum. In your first example you were selecting only a single column, so there was no need for grouping. But in your second example you had multiple columns. So you need to tell the query engine how to determine which ones are going to be compared to find a maximum.
You told it to group by the id column. Which means that it's going to compare records which have the same id and give you the maximum one for each unique id. Since every record has a different id, you essentially didn't do anything with that clause.
It grouped all records with an id of 1 (which was a single record), and returned the record with the maximum id from that group (which was that record). It did the same for 2, 3, etc.
In the case of the three columns shown here, the only place where it would make sense to group your records would be on the test_id column. Something like this:
SELECT MAX(id), title, test_id
FROM images
WHERE image IS NOT null
GROUP BY test_id
This would group them by the test_id, so the results will include records 6 (the maximum id for test_id 0), 4 (the maximum id for test_id 1), and 8 (the maximum id for test_id 2). By splitting the records into those three groups based on the three unique values in the test_id column, it can effectively find a "maximum" id within each group.
Yes, in your example it serves no useful purpose.
You're grouping by ID then finding the maximum ID. But that doesn't make sense since there's only one of each ID. Normally MAX() is used on quantities, like prices or item counts or such like.
Group by is not used for this kind of queries
Its is used for queries like this
OId OrderDate OrderPrice Customer
1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
Now if you want to get sum of material bought by each customer of these you will use group by
SELECT Customer,SUM(OrderPrice) FROM Orders
GROUP BY Customer
customer SUM(OrderPrice)
Hansen 2000
Nilsen 1700
Jensen 2000
In above case id is unique so group by id will not make any sense
Related
I need a MySQL query for following:
product Occur
---------------
alpha 3
beta 5
gamma 2
beta 1
gamma 6
alpha 0
and what I want:
Product total(Occur)
-------------------
alpha 3
beta 6
gamma 8
SELECT SUM(occur) FROM tableName GROUP BY product
you need to enter table name. so you will chose "occur" values from occur column from YOUR TABLE, group them by "product" and sum those products in the same group.
SELECT SUM(occur) AS total FROM tableName GROUP BY product
this will sort the result under "total" column. you can name anything you want.
I am very new to Microsoft reporting. I have the following table in my database:
CategoryName Id
Normal 1
High 2
Normal 3
Low 4
Normal 5
Normal 6
Normal 7
Normal 8
Low 9
Low 10
Low 11
High 12
I want to group by Category and also show the count of each category. Here is what I did:
I inserted a two column table and I grouped by the categoryName in the first column and in the second column, I tried doing
=CountDistinct(Fields!CategoryName.Value)
This is what I am seeing in the report
High 1
1
Normal 1
1
1
1
1
1
1
Low 1
1
1
1
I want to see something like this:
Category Count
Normal 6
Low 4
High 2
any help will be highly appreciated
Delete the Detail row and put the Count epxression in the Group row.
You'd probably be better off doing this in the query. Easier and leaves less room for error. Something like the following should work.
SELECT categoryName, COUNT(*)
FROM your table
GROUP BY categoryName
Is there a way to use an Over and Intersect function to get the average sales for the first 3 periods (not always consecutive months, sometimes a month is skipped) for each Employee?
For example:
EmpID 1 is 71.67 ((80 + 60 + 75)/3) despite skipping "3/1/2007"
EmpID 3 is 250 ((350 + 250 + 150)/3).
I'm not sure how EmpID 2 would work because there are just two data points.
I've used a work-around by calculated column using DenseRank over Date, "asc", EmpID and then used another Boolean calculated column where DenseRank column name is <= 3, then used Over functions over the Boolean=TRUE column but I want to figure the correct way to do this.
There are Last 'n' Period functions but I haven't seen anything resembling a First 'n' Period function.
EmpID Date Sales
1 1/1/2007 80
1 2/1/2007 60
1 4/1/2007 75
1 5/1/2007 30
1 9/1/2007 100
2 2/1/2007 200
2 3/1/2007 100
3 12/1/2006 350
3 1/1/2007 250
3 3/1/2007 150
3 4/1/2007 275
3 8/1/2007 375
3 9/1/2007 475
3 10/1/2007 300
3 12/1/2007 200
I suppose the solution depends on where you want this data represented, but here is one example
If((Rank([Date],"asc",[EmpID])<=3) and (Max(Rank([Date],"asc",[EmpID])) OVER ([EmpID])>=3),Avg([Sales]) over ([EmpID]))
You can insert this as a calculated column and it will give you what you want (assuming your data is sorted by date when imported).
You may want to see the row numbering, and in that case insert this as a calculated column as well and name it RN
Rank([Date],"asc",[EmpID])
Explanation
Rank([Date],"asc",[EmpID])
This part of the function is basically applying a row number (labeled as RN in the results below) to each EmpID grouping.
Rank([Date],"asc",[EmpID])<=3
This is how we are taking the top 3 rows regardless if Months are skipped. If your data isn't sorted, we'd have to create one additional calculated column but the same logic applies.
(Max(Rank([Date],"asc",[EmpID])) OVER ([EmpID])>=3)
This is where we are basically ignoring EmpID = 2, or any EmpID who doesn't have at least 3 rows. Removing this would give you the average (dynamically) for each EmpID based on their first 1, 2, or 3 months respectively.
Avg([Sales]) over ([EmpID])
Now that our data is limited to the rows we care about, just take the average for each EmpID.
#Chris- Here is the solution I came up with
Step 1: Inserted a calculated column 'rank' with the expression below
DenseRank([Date],"asc",[EmpID])
Step 2: Created a cross table visualization from the data table and limited data with the expression below
Table 1:Domain Link Result
======================================================================
||Column1(words) ||Column2(links) ||Column3(frequency) ||
======================================================================
1 1 Any Number
2 1 Any Number
3 1 Any Number
4 1 Any Number
1 2 Any Number
2 2 Any Number
3 2 Any Number
4 2 Any Number
Table 2:Sub Link Result
======================================================================
||Column1(words) ||Column2(sublinks) ||Column3(frequency) ||
======================================================================
1 a Any Number
2 b Any Number
3 c Any Number
4 d Any Number
1 e Any Number
2 f Any Number
3 g Any Number
4 h Any Number
And so on.
In the above scenario user entered 4 words and 2 domain links. Now the frequency of 4 keywords is calculated on domain links as well sublinks and stored in separate tables as shown above. I want an aggregate result like below:
Table 3:Final Result
==================================================================================
||Column1(words) ||Column2(Domain links) ||Column3(Total frequency) ||
==================================================================================
Row1: 1 1 Total of frequency in both tables
2 for word "1"
----------------------------------------------------------------------------------
Row2: 2 1 Total of frequency in both tables
2 for word "2"
----------------------------------------------------------------------------------
Row3: 3 1 Total of frequency in both tables
2 for word "3"
----------------------------------------------------------------------------------
Row4: 4 1 Total of frequency in both tables
2 for word "4"
----------------------------------------------------------------------------------
I tried the following query in MySQL:
SELECT t.`keyword`, t.`link` SUM( t.`frequency` ) AS total
FROM (
SELECT `frequency`
FROM `domain_link_result`
WHERE `keyword` = 'national'
UNION ALL
SELECT `frequency`
FROM `sub_link_result`
WHERE `keyword` = 'national'
)t GROUP BY `keyword`
But in Column 2 of the final result I get only first link instead of two links for row 1. How can I get both links or any number of links entered by user in a single row ?
Words and Links have VARCHAR as type and frequency has INT type.
If you want to collapse several rows into one and still be able to see the information, you have to use GROUP_CONCAT
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
This outputs the collapsed values separated by commas, i.e.: a string. In your programming language you can split this string if you need individual values.
Your query would look somehow like this
SELECT keyword, GROUP_CONCAT(links), SUM(frequency)
FROM (subquery)
GROUP BY keyword
Which would output something like this:
==================================================================================
||Column1(words) ||Column2(Domain links) ||Column3(Total frequency) ||
==================================================================================
Row1: 1 1,2 sum of freq.
----------------------------------------------------------------------------------
Row2: 2 1,2 sum of freq.
----------------------------------------------------------------------------------
Row3: 3 1,2 sum of freq.
----------------------------------------------------------------------------------
Row4: 4 1,2 sum of freq.
EDIT: Extra help for your query
Your query looks a little bit confusing to me. Try with a JOIN approach:
SELECT domain_link_results.word AS word,
GROUP_CONCAT(domain_link_results.links) AS domain_links,
domain_link_results.frequency + sub_link_results.frequency AS total_frequency
FROM domain_link_results
INNER JOIN sub_link_results
ON domain_link_results.word = sub_link_results.word
WHERE domain_link_results.word = "national"
GROUP BY domain_link_results.word
On the other hand, it might be better to have all the links in the same table, and an extra field to determine if it's a domain link or a sublink. Without knowing more about your system it is hard to say.
I found it hard to find a fitting title. For simplicity let's say I have the following table:
cook_id cook_rating
1 2
1 1
1 3
1 4
1 2
1 2
1 1
1 3
1 5
1 4
2 5
2 2
Now I would like to get an output of 'good' cooks. A good cook is someone who has a rating of at least 70% of 1, 2 or 3, but not 4 or 5.
So in my example table, the cook with id 1 has a total of 10 ratings, 7 of which have type 1, 2 and 3. Only three have type 4 or 5. Therefore the cook with id 1 would be a 'good' cook, and the output should be the cook's id with the number of good ratings.
cook_id cook_rating
1 7
The cook with id 2, however, doesn't satisfy my condition, therefore should not be listed at all.
select cook_id, count(cook_rating) - sum(case when cook_rating = 4 OR cook_rating = 5 then 1 else 0 end) as numberOfGoodRatings from cook
where cook_rating in (1,2,3,4,5)
group by cook_id
order by numberOfGoodRatings desc
However, this doesn't take into account the fact that there might be more 4 or 5 than good ratings, resulting in negative outputs. Plus, the requirement of at least 70% is not included.
You can get this with a comparison in your HAVING clause. If you must have just the two columns in the result set, this can be wrapped as a sub-select select cook_id, positive_ratings FROM (...)
SELECT
cook_id,
count(cook_rating < 4 OR cook_rating IS NULL) as positive_ratings,
count(*) as total_ratings
FROM cook
GROUP BY cook_id
HAVING (positive_ratings / total_ratings) >= 0.70
ORDER BY positive_ratings DESC
Edit Note that count(cook_rating < 4) is intended to only count rows where the rating is less than 4. The MySQL documentation says that count will only count non-null rows. I haven't tested this to see if it equates FALSE with NULL but I would be surprised it it doesn't. Worst case scenario we would need to wrap that in an IF(cook_rating < 4, 1,NULL).
I suggest you change a little your schema to make this kind of queries trivial.
Suppose you add 5 columns to your cook table, to simply count the number of each ratings :
nb_ratings_1 nb_ratings_2 nb_ratings_3 nb_ratings_4 nb_ratings_5
Updating such a table when a new rating is entered in DB is trivial, just as would be recomputing those numbers if having redundancy makes you nervous. And it makes all filterings and sortings fast and easy.