SELECT specific rows when using GROUP BY

SELECT specific rows when using GROUP BY - mysql

I have the following SQL table that keeps track of a user's score at a particular timepoint. A user can have multiple scores per day.
+-------+------------+-------+-----+
| user | date | score | ... |
+-------+------------+-------+-----+
| bob | 2014-04-19 | 100 | ... |
| mary | 2014-04-19 | 100 | ... |
| alice | 2014-04-20 | 100 | ... |
| bob | 2014-04-20 | 110 | ... |
| bob | 2014-04-20 | 125 | ... |
| mary | 2014-04-20 | 105 | ... |
| bob | 2014-04-21 | 115 | ... |
+-------+------------+-------+-----+
Given a particular user (let's say bob), How would I generate a report of each user's score, but only use the highest submitted score per day? (Getting the specific row with the highest score is important as well, not just the highest score)
SELECT * FROM `user_score` WHERE `user` = 'bob' GROUP BY `date`
is the base query that I'm building off of. It results in the following result set:
+-------+------------+-------+-----+
| user | date | score | ... |
+-------+------------+-------+-----+
| bob | 2014-04-19 | 100 | ... |
| bob | 2014-04-20 | 110 | ... |
| bob | 2014-04-21 | 115 | ... |
+-------+------------+-------+-----+
bob's higher score of 125 from 2014-04-20 is missing. I tried rectifying that with MAX(score)
SELECT *, MAX(score) FROM `user_score` WHERE `user` = 'bob' GROUP BY `date`
returns the highest score for the day, but not the row that has the highest score. Other column values on that row are important,
+-------+------------+-------+-----+------------+
| user | date | score | ... | max(score) |
+-------+------------+-------+-----+------------+
| bob | 2014-04-19 | 100 | ... | 100 |
| bob | 2014-04-20 | 110 | ... | 125 |
| bob | 2014-04-21 | 115 | ... | 110 |
+-------+------------+-------+-----+------------+
Lastly, I tried
SELECT *, MAX(score) FROM `user_score` WHERE `user` = 'bob' AND score = MAX(score) GROUP BY `date`
But that results in an invalid use of GROUP BY.
Selecting a row with specific value from a group? is on the right track with what I am trying to accomplish, but I dont know the specific score to filter by.
EDIT:
SQLFiddle: http://sqlfiddle.com/#!2/ee6a2

If you want all the fields, the easiest (and fastest) way in MySQL is to use not exists:
SELECT *
FROM `user_score` us
WHERE `user` = 'bob' AND
NOT EXISTS (SELECT 1
FROM user_score us2
WHERE us2.`user` = us.`user` AND
us2.date = us.date AND
us2.score > us.score
);
This may seem like a strange approach. And, I'll admit that it is. What it is doing is pretty simple: "Get me all rows for Bob from user_score where there is no higher score (for Bob)". That is equivalent to getting the row with the maximum score. With an index on user_score(name, score), this is probably the most efficient way to do what you want.

You can use a JOIN:
SELECT a.*
FROM `user_score` as a
INNER JOIN (SELECT `user`, `date`, MAX(score) MaxScore
FROM `user_score`
GROUP BY `user`, `date`) as b
ON a.`user` = b.`user`
AND a.`date` = b.`date`
AND a.score = b.MaxScore
WHERE a.`user` = 'bob'

One option is to use an inline view and a JOIN operation. If there is more than one row with the "high score" value for a given day, this query will return all the rows. (If (user,date,score) is unique, then this isn't a problem.)
For example:
SELECT t.user
, t.date
, t.score
, t.`...`
FROM user_score t
JOIN ( SELECT d.user
, d.date
, MAX(s.score) AS score
FROM user_score d
WHERE d.user = 'bob'
GROUP BY d.user, d.date
) s
ON s.user = t.user
AND s.date = t.date
AND s.score = t.score

Related

MySQL - Return Latest Date and Total Sum from two rows in a column for multiple entries

For every ID_Number, there is a bill_date and then two types of bills that happen. I want to return the latest date (max date) for each ID number and then add together the two types of bill amounts. So, based on the table below, it should return:
| 1 | 201604 | 10.00 | |
| 2 | 201701 | 28.00 | |
tbl_charges
+-----------+-----------+-----------+--------+
| ID_Number | Bill_Date | Bill_Type | Amount |
+-----------+-----------+-----------+--------+
| 1 | 201601 | A | 5.00 |
| 1 | 201601 | B | 7.00 |
| 1 | 201604 | A | 4.00 |
| 1 | 201604 | B | 6.00 |
| 2 | 201701 | A | 15.00 |
| 2 | 201701 | B | 13.00 |
+-----------+-----------+-----------+--------+
Then, if possible, I want to be able to do this in a join in another query, using ID_Number as the column for the join. Would that change the query here?
Note: I am initially only wanting to run the query for about 200 distinct ID_Numbers out of about 10 million. I will be adding an 'IN' clause for those IDs. When I do the join for the final product, I will need to know how to get those latest dates out of all the other join possibilities. (ie, how do I get ID_Number 1 to join with 201604 and not 201601?)

I would use NOT EXISTS and GROUP BY
select, t1.id_number, max(t1.bill_date), sum(t1.amount)
from tbl_charges t1
where not exists (
select 1
from tbl_charges t2
where t1.id_number = t2.id_number and
t1.bill_date < t2.bill_date
)
group by t1.id_number
the NOT EXISTS filter out the irrelevant rows and GROUP BY do the sum.

I would be inclined to filter in the where:
select id_number, sum(c.amount)
from tbl_charges c
where c.date = (select max(c2.date)
from tbl_charges c2
where c2.id_number = c.id_number and c2.bill_type = c.bill_type
)
group by id_number;
Or, another fun way is to use in with tuples:
select id_number, sum(c.amount)
from tbl_charges c
where (c.id_number, c.bill_type, c.date) in
(select c2.id_number, c2.bill_type, max(c2.date)
from tbl_charges c2
group by c2.id_number, c2.bill_type
)
group by id_number;

MAX aggregate function with VIEW

I have this query :
CREATE VIEW MOSTACTIVESELLER AS
Select a.* from
(
SELECT a.ownerID, b.sellerName, count(distinct a.ITEMID) as item_qty
FROM item AS a
INNER JOIN seller AS b ON a.ownerID = b.sellerID
GROUP BY a.ownerID,b.sellerName
) a
the resutl of this view query is:
+--ID--+--seller-+-qty--+
| 1000 | Nick | 3 |
| 1001 | Morgan | 2 |
| 1002 | stancly | 1 |
| 1003 | chandler | 1 |
| 1004 | chiptle | 3 |
| 1005 | samir | 2 |
| 1006 | matuidi | 3 |
| 1007 | medjek | 1 |
| 1008 | leo | 1 |
| 1009 | georgi | 1 |
| 1010 | bocheli | 2 |
+------+----------+---+
So what I want is use an aggregate function like max to return only the most active seller in this list(ONLY 1) as you can see 3 sellers with qty 3 , i think that max function will return one . if not i maybe order the view DESC and return the top value. but could not make that to work. I tried to use max with the view MOSTACTIVESELLER i dont know how to do that??

I'm surprised your query works in MySQL -- usually subqueries are not allowed in the FROM clause in a view. But this should work as a view:
CREATE VIEW MOSTACTIVESELLER AS
SELECT s.sellerID, s.sellerName, count(distinct i.ITEMID) as item_qty
FROM item i INNER JOIN
seller s
ON i.ownerID = s.sellerID
GROUP BY s.sellerID, s.sellerName
HAVING count(distinct i.ITEMID) = (SELECT count(distinct i.ITEMID)
FROM item i
GROUP BY i.ownerID
ORDER BY 1 DESC
LIMIT 1
);
EDIT:
I thought LIMIT was allowed with =, but MySQL has strange limits on the use of LIMIT in a subquery. It is easy enough to replace:
HAVING count(distinct i.ITEMID) >= ALL (SELECT count(distinct i.ITEMID)
FROM item i
GROUP BY i.ownerID
);

SQL Query for selecting multiple rows but highest value for each PK

I know that the title sounds horrible but I have no idea how to summarize it better. I'm pretty sure that somebody had the same problem before but I couldn't find anything. RDBMS: MySQL.
Problem:
I have the following (simplified) table:
+------+------------+---------------------------------+
| name | date | score |
+------+------------+---------------------------------+
| A | 01.01.2015 | 1 |
| A | 01.02.2015 | 3 |
| A | 01.03.2015 | 4 |
| B | 01.01.2015 | 3 |
| B | 01.02.2015 | 4 |
| B | 01.03.2015 | 5 |
| C | 01.01.2015 | 1 |
| C | 01.02.2015 | 2 |
| C | 01.03.2015 | 3 |
+------+------------+---------------------------------+
There is no unique constraint or PK defined.
The table represents a highscore of a game. Every day the score of all players are inserted with values that are: name, points, now(),...
The data represent a snapshot of the score of each player at a specific time.
I want the most recent entry for each user only but only for the highest X players. So the result should look like
+------+------------+---------------------------------+
| name | date | score |
+------+------------+---------------------------------+
| A | 01.03.2015 | 4 |
| B | 01.03.2015 | 5 |
+------+------------+---------------------------------+
C doesn't appear since he's not in the top 2 (by score)
A appears with the most recent row (by date)
B appears, like A, with the most recent row (by date) and because he is in the top 2
I hope it becomes clear what I mean.
Thanks in advance!

I understand that what you need is to first select the X players who've gotten the highest score and then get their latest performance. In this case, you should do this:
SELECT *
FROM tablename t
JOIN
(
SELECT t.name, max(t.date) as max_date
FROM tablename t
JOIN
(
SELECT name
FROM
(
SELECT name, max(score) as max_score
FROM table_name
GROUP BY name
) all_highscores
ORDER BY max_score DESC
LIMIT X
) top_scores
ON top_scores.name = t.name
GROUP BY t.name
) top_last
on t.name = top_last.name
and t.date = top_last.date;

unable to join two tables, one with multiple rows

I have two tables that I am attempting to join in MySQL:
reviews:
| review_id | comment | reviewer_id | user_id |
-----------------------------------------------------------
| 1 | some text. | 501 | 100 |
| 2 | lorem ipsum | 606 | 100 |
| 3 | blah blah. | 798 | 120 |
| 4 | foo bar! | 798 | 133 |
-----------------------------------------------------------
review_status:
| review_id | status | timestamp |
----------------------------------------
| 1 | 10 | 1364507521 |
| 1 | 101 | 1364508057 |
| 2 | 100 | 1364509033 |
| 1 | 150 | 1364509149 |
| 2 | 120 | 1364509283 |
| 2 | 122 | 1364855948 |
| 3 | 120 | 1364509283 |
| 3 | 122 | 1364855948 |
| 1 | 110 | 1364855945 |
| 4 | 100 | 1364509283 |
| 4 | 115 | 1364855948 |
| 4 | 210 | 1364855945 |
----------------------------------------
What I WANT is a result that looks something like this:
result
| review_id | comment | reviewer_id | user_id | status | timestamp |
--------------------------------------------------------------------------
| 1 | some text. | 501 | 100 | 200 | 1364855945 |
| 2 | lorem ipsum | 606 | 120 | 122 | 1364855948 |
--------------------------------------------------------------------------
I'm after: 1) The newest entry from the review_status table 2) A certain range of status codes (100 - 199 in this case) 3) And multiple user_id's from the review table.
This is currently my query, that I can't get to work for the life of me:
SELECT r.review_id, r.comment, r.reviewer_id, r.user_id
FROM reviews AS r
INNER JOIN
(SELECT s.status, max(s.timestamp)
FROM review_status AS s
WHERE s.status < 200
AND s.status > 99;
GROUP BY s.review_id) AS r_s
ON r.review_id = r_s.review_id
WHERE r.user_id IN (100,120);
Any help is greatly appreciated! Thanks.

You have a few issues with your current query.
the subquery is not returning review_id so you cannot use that in the join
you have an extra semi-colon in the subquery
I might suggest rewriting the query to use the following:
SELECT r.review_id, r.comment, r.reviewer_id, r.user_id,
rs.status, rs.timestamp
FROM reviews AS r
INNER JOIN review_status rs
ON r.review_id = rs.review_id
INNER JOIN
(
SELECT s.review_id, max(s.timestamp) MaxDate
FROM review_status AS s
WHERE s.status < 200
AND s.status > 99
GROUP BY s.review_id
) AS r_s
ON rs.review_id = r_s.review_id
AND rs.timestamp = r_s.MaxDate
WHERE r.user_id IN (100,120)
and rs.status < 200
AND rs.status > 99
See SQL Fiddle with Demo.
The main reason for the query to be written this way is because in your current query you are grouping by review_id but are returning the status. MySQL uses an extension to the GROUP BY clause that will allow items in the select list to be excluded being used in a GROUP BY or aggregate function but this could cause unexpected results. (see MySQL Extensions to GROUP BY)
From the MySQL Docs:
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. ... You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.

Try this:
SELECT r.*, r_s.*
FROM review_status r_s LEFT JOIN reviews r
ON r.review_id = r_s.review_id
WHERE r_s.user_id > 100 AND r_s.user_id < 120
ORDER BY r_s.timestamp DESC;

SELECT r.review_id, r.comment, r.reviewer_id, r.user_id, tt.status,tt.timestamp
FROM (
SELECT rs2.review_id,rs2.status,rs2.timestamp
FROM (
SELECT MAX(rs.timestamp) as mts
FROM reviews rr
JOIN review_status AS rs ON rs.review_id = rr.id
WHERE rs.status < 200 AND rs.status > 99
AND rr.user_id IN (100,120)
GROUP BY rs.review_id
) as t
JOIN review_status rs2 ON rs2.timestamp = t.mts
GROUP BY rs2.review_id #remove duplicate statuses with the same timestamp
) as tt
JOIN reviews as r ON r.id = tt.review_id
The user_id and status filters have to be in the innermost query to avoid selecting and join-ing the entire statuses table every time.

Here's my attempt with one JOIN and one correlated sub-query:
SELECT r.*, rs.*
FROM Reviews AS r
INNER JOIN Review_status AS rs ON r.review_id = rs.review_id
WHERE rs.status BETWEEN 99 AND 200 AND
r.user_id IN (100,120) AND
rs.timestamp = (SELECT MAX(timestamp) FROM Review_status
WHERE review_id = r.review_id
ORDER BY timestamp DESC)
ORDER BY r.review_id;
Its SQL Fiddle: http://sqlfiddle.com/#!2/02f18/6

query to fetch records and their rank in the DB

I have a table that holds usernames and results.
When a user insert his results to the DB, I want to execute a query that will return
the top X results ( with their rank in the db) and will also get that user result
and his rank in the DB.
the result should be like this:
1 playername 4500
2 otherplayer 4100
3 anotherone 3900
...
134 current player 140
I have tried a query with union, but then I didnt get the current player rank.
ideas anyone?
The DB is MYSQL.
10x alot and have agreat weekend :)
EDIT
This is what I have tried:
(select substr(first_name,1,10) as first_name, result
FROM top_scores ts
WHERE result_date >= NOW() - INTERVAL 1 DAY
LIMIT 10)
union
(select substr(first_name,1,10) as first_name, result
FROM top_scores ts
where first_name='XXX' and result=3030);

SET X = 0;
SELECT #X:=#X+1 AS rank, username, result
FROM myTable
ORDER BY result DESC
LIMIT 10;
Re your comment:
How about this:
SET X = 0;
SELECT ranked.*
FROM (
SELECT #X:=#X+1 AS rank, username, result
FROM myTable
ORDER BY result DESC
) AS ranked
WHERE ranked.rank <= 10 OR username = 'current';

Based on what I am reading here:
Your table structure is:
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| name | varchar(50) | YES | | NULL | |
| result | int(11) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
Table Data looks like:
+---------+--------+
| name | result |
+---------+--------+
| Player1 | 4500 |
| Player2 | 4100 |
| Player3 | 3900 |
| Player4 | 3800 |
| Player5 | 3700 |
| Player6 | 3600 |
| Player7 | 3500 |
| Player8 | 3400 |
+---------+--------+
You want a result set to look like this:
+------+---------+--------+
| rank | name | result |
+------+---------+--------+
| 1 | Player1 | 4500 |
| 2 | Player2 | 4100 |
| 3 | Player3 | 3900 |
| 4 | Player4 | 3800 |
| 5 | Player5 | 3700 |
| 6 | Player6 | 3600 |
| 7 | Player7 | 3500 |
| 8 | Player8 | 3400 |
+------+---------+--------+
SQL:
set #rank = 0;
select
top_scores.*
from
(select ranks.* from (select #rank:=#rank+1 AS rank, name, result from ranks) ranks) top_scores
where
top_scores.rank <= 5
or (top_scores.result = 3400 and name ='Player8');
That will do what you want it to do

assuming your table has the following columns:
playername
score
calculated_rank
your query should look something like:
select calculated_rank,playername, score
from tablename
order by calculated_rank limit 5

I assume you have PRIMARY KEY on this table. If you don't, just create one. My table structure (because you didn't supply your own) is like this:
id INTEGER
result INTEGER
first_name VARCHAR
SQL query should be like that:
SELECT #i := #i+1 AS position, first_name, result FROM top_scores, (SELECT #i := 0) t ORDER BY result DESC LIMIT 10 UNION
SELECT (SELECT COUNT(id) FROM top_scores t2 WHERE t2.result > t1.result AND t2.id > t1.id) AS position, first_name, result FROM top_scores t1 WHERE id = LAST_INSERT_ID();
I added additional condition into subquery ("AND t2.id > t1.id") to prevent multiple people with same result having same position.
EDIT: If you have some login system, it would be better to save userid with result and get current user result using it.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SELECT specific rows when using GROUP BY - mysql

You can use a JOIN: SELECT a.* FROM `user_score` as a INNER JOIN (SELECT `user`, `date`, MAX(score) MaxScore FROM `user_score` GROUP BY `user`, `date`) as b ON a.`user` = b.`user` AND a.`date` = b.`date` AND a.score = b.MaxScore WHERE a.`user` = 'bob'

Related

MySQL - Return Latest Date and Total Sum from two rows in a column for multiple entries

MAX aggregate function with VIEW

SQL Query for selecting multiple rows but highest value for each PK

unable to join two tables, one with multiple rows

query to fetch records and their rank in the DB

Categories

Resources