SQL group by and order by issue - mysql

Lets say I have a table - tasks - with the following data:
task user when_added
---------------------------
run 1 2012-08-09
walk 2 2012-08-07
bike 2 2012-08-07
car 1 2012-08-06
run 2 2012-08-06
car 1 2012-08-05
bike 1 2012-08-04
run 1 2012-08-04
As you can see the task is repetitive.
Question is, when i show the data e.g.
select * from tasks group by task order by when_added desc
How does the group by affecting the results? Does 'group by' group them in any order, can I make it?
The reason i ask is that I have a large table which i show data as above and if I lose the group by and just show results in date order, I get some results which do not show on group by, which means the task has been done before but it seems to be grouping by the oldest date and i want the newest date at the top of the pile.
Hope this makes sense...is it possible to affect the group by order?

Is that what you want?
select task, group_concat(user), max(when_added)
from tasks
group by task
order by when_added desc
group by is an aggregate function. In MySQL you can select not aggregates columns anyway, but you should not do that.
If you group by a column then the results will be distinct for that column and all other data will be grouped around it. So there might be multiple data where task is run for instance. Just selecting other columns will select a random result. You should pick a specific result from that group like max or min or sum or concatenate them.

Related

Is it possible to order by two fields in a given table,one in ascending and the other in descending order at the same time?

Recently, I have come across a question that has been asked in an interview which states that:
You have mysql database with a table students. Write a query string to
select all the items of the table students and order by two fields one
ascending and the other descending.
Let's have a table "students" for example:
From this example, if we order by Score in descending order then there is no way to order by roll_no in ascending order at the same time.
From the point of view of the question, can there be written any query to obtain the desired result Or is the question ambiguous or wrong or my approach to the understanding of the question is wrong?
In your order by you have asked system to order by First column first so it have ordered then you have asked it to order by second column so it have
-> It have to keep ordering of first column.
-> Order by second column too
So it does ordering within group means if
Table Test
A| B
--------
1 1
1 3
1 2
2 2
2 4
Select * from test order by A desc, B asc
Output
Table Test
A| B
--------
2 2
2 4
1 1
1 2
1 3
So in your case, if you first order by score desc then it will order
First, all students according to there scores descending
And then if two or more students have the same score say 60 then within that group the students will be ordered according to roll number ascending
Hope this clears your doubt.
When you use the order by clause, you can specifiy the direction Asc or desc.
In your example with your sample data, there is no point of order by score, then by roll_no, because there is no duplicate in the score column.
If in your real table there are score wich appeared more than once, you can order by score desc, roll_no.
( asc is the default value)

ORDER BY and GROUP BY those results in a single query

I am trying to query a dataset from a single table, which contains quiz answers/entries from multiple users. I want to pull out the highest scoring entry from each individual user.
My data looks like the following:
ID TP_ID quiz_id name num_questions correct incorrect percent created_at
1 10154312970149546 1 Joe 3 2 1 67 2015-09-20 22:47:10
2 10154312970149546 1 Joe 3 3 0 100 2015-09-21 20:15:20
3 125564674465289 1 Test User 3 1 2 33 2015-09-23 08:07:18
4 10153627558393996 1 Bob 3 3 0 100 2015-09-23 11:27:02
My query looks like the following:
SELECT * FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`
ORDER BY `correct` DESC
In my mind, what that should do is get the two users from the IN clause, order them by the number of correct answers and then group them together, so I should be left with the 2 highest scores from those two users.
In reality it's giving me two results, but the one from Joe gives me the lower of the two values (2), with Bob first with a score of 3. Swapping to ASC ordering keeps the scores the same but places Joe first.
So, how could I achieve what I need?
You're after the groupwise maximum, which can be obtained by joining the grouped results back to the table:
SELECT * FROM entries NATURAL JOIN (
SELECT TP_ID, MAX(correct) correct
FROM entries
WHERE TP_ID IN ('10153627558393996', '10154312970149546')
GROUP BY TP_ID
) t
Of course, if a user has multiple records with the maximal score, it will return all of them; should you only want some subset, you'll need to express the logic for determining which.
MySql is quite lax when it comes to group-by-clauses - but as a rule of thumb you should try to follow the rule that other DBMSs enforce:
In a group-by-query each column should either be part of the group-by-clause or contain a column-function.
For your query I would suggest:
SELECT `TP_ID`,`name`,max(`correct`) FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`,`name`
Since your table seems quite denormalized the group by name-par could be omitted, but it might be necessary in other cases.
ORDER BY is only used to specify in which order the results are returned but does nothing about what results are returned - so you need to apply the max()-function to get the highest number of right answers.

Average value for top n records?

i have this SQL Schema: http://sqlfiddle.com/#!9/eb34d
In particular these are the relevant columns for this question:
ut_id,ob_punti
I need to get the average of the TOP n (where n is 4) values of "ob_punti" for each user (ut_id)
This query returns the AVG of all values of ob_punti grouped by ut_id:
SELECT ut_id, SUM(ob_punti), AVG(ob_punti) as coefficiente
FROM vw_obiettivi_2015
GROUP BY ut_id ORDER BY ob_punti DESC
But i can't figure out how to get the AVG for only the TOP 4 values.
Can you please help?
It will give SUM and AVG of top 4. You may replace 4 by n to get top n.
select ut_id,SUM(ob_punti), AVG(ob_punti) from (
select #rank:=if(#prev_cat=ut_id,#rank+1,1) as rank,ut_id,ob_punti,#prev_cat:=ut_id
from Table1,(select #rank:=0, #prev_cat:="")t
order by ut_id, ob_punti desc
) temp
where temp.rank<=4
group by ut_id;
This is not exactly related to the question asked, I am placing this because some one might get benefited.
I got the hackerearth problem to write mysql query to fetch top 10 records based on average of product quantity in stock available.
SELECT productName, avg(quantityInStock) from products
group by quantityInStock
order by quantityInStock desc
limit 10
Note: If someone can make better the above query, please welcome to modify.

Assistance with complex MySQL query (using LIMIT ?)

I wonder if anyone could help with a MySQL query I am trying to write to return relevant results.
I have a big table of change log data, and I want to retrieve a number of record 'groups'. For example, in this case a group would be where two or more records are entered with the same timestamp.
Here is a sample table.
==============================================
ID DATA TIMESTAMP
==============================================
1 Some text 1379000000
2 Something 1379011111
3 More data 1379011111
3 Interesting data 1379022222
3 Fascinating text 1379033333
If I wanted the first two grouped sets, I could use LIMIT 0,2 but this would miss the third record. The ideal query would return three rows (as two rows have the same timestamp).
==============================================
ID DATA TIMESTAMP
==============================================
1 Some text 1379000000
2 Something 1379011111
3 More data 1379011111
Currently I've been using PHP to process the entire table, which mostly works, but for a table of 1000+ records, this is not very efficient on memory usage!
Many thanks in advance for any help you can give...
Get the timestamps for the filtering using a join. For instance, the following would make sure that the second timestamp is in a completed group:
select t.*
from t join
(select timestamp
from t
order by timestamp
limit 2
) tt
on t.timestamp = tt.timestamp;
The following would get the first three groups, no matter what their size:
select t.*
from t join
(select distinct timestamp
from t
order by timestamp
limit 3
) tt
on t.timestamp = tt.timestamp;

Mysql subquery with sum causing problems

This is a summary version of the problems I am encountering, but hits the nub of my problem. The real problem involves huge UNION groups of monthly data tables, but the SQL would be huge and add nothing. So:
SELECT entity_id,
sum(day_call_time) as day_call_time
from (
SELECT entity_id,
sum(answered_day_call_time) as day_call_time
FROM XCDRDNCSum201108
where (day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
) as summary
is the problem: when the table in the subquery XCDRDNCSum201108 returns no rows, because it is a sum, the column values contain null. And entity_id is part of the primary key, and cannot be null.
If I take out the sum, and just query entity_id, the subquery contains no rows, and thus the outer query does not fail, but when I use sum, I get error 1048 Column 'entity_id' cannot be null
how do I work around this problem ? Sometimes there is no data.
You are completely overworking the query... pre-summing inside, then summing again outside. In addition, I understand you are not a DBA, but if you are ever doing an aggregation, you TYPICALLY need the criteria that its grouped by. In the case presented here, you are getting sum of calls for all entity IDs. So you must have a group by any non-aggregates. However, if all you care about is the Grand total WITHOUT respect to the entity_ID, then you could skip the group by, but would also NOT include the actual entity ID...
If you want inclusive to show actual time per specific entity ID...
SELECT
entity_id,
sum(answered_day_call_time) as day_call_time,
count(*) number_of_calls
FROM
XCDRDNCSum201108
where
(day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
group by
entity_id
This would result in something like (fictitious data)
Entity_ID Day_Call_Time Number_Of_Calls
1 10 3
2 45 4
3 27 2
If all you cared about were the total call times
SELECT
sum(answered_day_call_time) as day_call_time,
count(*) number_of_calls
FROM
XCDRDNCSum201108
where
(day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
This would result in something like (fictitious data)
Day_Call_Time Number_Of_Calls
82 9
Would:
sum(answered_day_call_time) as day_call_time
changed to
ifnull(sum(answered_day_call_time),0) as day_call_time
work? I'm assuming mysql here but the coalesce function would/should work too.