I'm learning HIVE these days and meet some problems...
I have a table called SAMPLE:
USER_ID PRODUCT_ID NUMBER
1 3 20
1 4 30
1 2 25
1 6 50
1 5 40
2 1 10
2 3 15
2 2 40
2 5 30
2 3 35
How can I use HIVE to group table by user_id and in each group order the records by DESC order of NUMBER and in each group I want to keep up to 3 records.
The result I want to have is like:
USER_ID PRODUCT_ID NUMBER(optional column)
1 6 50
1 5 40
1 4 30
2 2 40
2 3 35
2 5 30
or
USER_ID PRODUCT_IDs
1 [6,5,4]
2 [2,3,5]
Could someone help me ?..
Thanks very much!!!!!!!!!!!!!!!!
try this,
select user_id,product_id,number
from(
select user_id,product_id,number, ROW_NUMBER() over (Partition BY user_id) as RNUM
from (
select user_id, number,product_id
from SAMPLE
order by number desc
) t) t2
where RNUM <=3
output
1 6 50
1 5 40
1 4 30
2 2 40
2 3 35
2 5 30
hive version should be 0.11 or greater, may I know if your version is lower
Related
I have a order table with the following information
Order ID, Product ID, Quantity ordered
OID PID Qty
1 10 1
1 20 2
2 10 2
2 40 4
3 50 1
3 20 3
4 30 1
4 90 2
4 90 5
5 10 2
5 20 2
5 70 5
5 60 1
6 80 2
If I run the following query
select `Qty`, count(`Qty`)
from `table`
group by `Qty`
I get the distribution of quantities in the table, which is
Qty count(`Qty`)
1 4
2 6
3 1
4 1
5 2
I want to find the distribution of quantity at order_line_item level too
That is how many orders which have one line item, had items with 1 quantity, 2 quantity and so one, something like
Count(Order_line_item) Qty Count(Qty)
1 2 1
2 1 2
2 2 2
2 3 1
2 4 1
3 1 1
3 2 1
3 5 1
4 1 1
4 2 2
4 5 1
What modification should i make in the above query to achieve this
Try this query
SELECT count_order_line_items, `Qty`, count(*)
FROM (
SELECT count(*) over (partition by `OID`) as count_order_line_items,
`Qty`
FROM Table1
) x
GROUP BY count_order_line_items, `Qty`
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=07dfb27a7d434eca1f0b9641aadd53c8
If your mysql version is less than 8 then try this one
SELECT count_order_line_items, `Qty`, count(*)
FROM Table1 t1
JOIN (
SELECT `OID`, count(*) as count_order_line_items
FROM Table1
GROUP BY `OID`
) t2 ON t1.`OID` = t2.`OID`
GROUP BY count_order_line_items, `Qty`
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=28c291a4693f31029a103f5c41a97d77
I have Data
id Date CatId itemid itemname price
1 10/5/2019 1 1 ABC 20
2 10/5/2019 1 2 XYZ 30
3 10/5/2019 2 1 ABC 20
4 10/5/2019 3 1 ABC 20
5 11/5/2019 1 2 XYZ 30
6 11/5/2019 2 1 ABC 20
7 11/5/2019 2 3 PQR 40
8 12/5/2019 3 1 ABC 20
9 12/5/2019 3 2 XYZ 30
10 12/5/2019 1 2 XYZ 30
11 12/5/2019 2 1 ABC 20
expected result data
date CatId toal
10/5/2019 1 50
10/5/2019 2 20
10/5/2019 3 20
11/5/2019 1 30
11/5/2019 2 60
12/5/2019 1 30
12/5/2019 2 20
12/5/2019 3 50
I want result order by date and after sum of price group by catid,
I have tried multiple query applied but get exact solution. I have spent so much time.
I have tried bellow queries
SELECT * FROM (SELECT * FROM `table` ORDER BY `date` DESC) as tbl
GROUP BY tbl.`catid`
SELECT *
FROM `table`
GROUP BY `date` ORDER BY `date` DESC
SELECT *
FROM (
SELECT * FROM `table`
ORDER BY `date` DESC
) AS sub
GROUP BY `catid` ORDER BY `date` DESC
the columns you can use in order by and group by can be different
You could use group by and sum directly
select date, catid, sum(price)
from my_table
group by date, catid
order by date, sum(price), catid
i have a table that looks like this:
ID GameID DateID Points Place
-------------------------------------
10 1 1 100 1
11 1 1 90 2
12 1 1 80 3
13 1 1 70 4
14 1 1 60 5
10 1 1 100 1
10 1 1 50 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 50 5
10 1 1 50 5
12 1 1 100 1
-------------------------------------
I want a table with two columns, one for the total points (summated scores/points) of one player and one for the id of the player. But for one player only ten scores may be counted, so for example if one player played thirteen times, only the ten highest scores are counted.
For the example above I want a table that looks like this:
ID totalPoints
-------------------
10 950
11 90
12 180
13 70
14 60
------------------
At the moment I tried this:
SELECT ID,
sum(Points) AS totalPoints
FROM (SELECT Points, ID
FROM Gamer
ORDER BY Points DESC LIMIT 10) AS totalPoints
ORDER BY Points DESC
but it limits the entries at all to ten and not to ten per player.
I hope anybody can help me :)
In all existing versions:
DELIMITER $
CREATE FUNCTION `totalPoints`(gamer_id INT) RETURNS int(11)
BEGIN
DECLARE s INT DEFAULT 0;
SELECT SUM(Points) INTO s FROM ( SELECT Points FROM Gamer WHERE ID=gamer_id ORDER BY Points DESC LIMIT 10) sq;
RETURN s;
END$
DELIMITER ;
SELECT DISTINCT ID, totalPoints(ID) FROM Gamer;
Alternative in MariaDB 10.2 (currently Beta), which has window functions:
SELECT ID, SUM(Points) FROM (
SELECT ID, Points, ROW_NUMBER()
OVER (PARTITION BY ID ORDER BY Points DESC) AS nm
FROM Gamer
) sq WHERE nm <= 10 GROUP BY ID;
I'm pretty sure there are other ways to do the same, these two are first that came to mind.
I can't figure out the mysql query to extract the data I want from this table "timeEntry":
hours creationDate userId clientId projectId taskId
20 2012-02-18 1 1 1 1
40 2012-02-18 1 1 1 1
30 2012-02-21 2 1 1 1
20 2012-02-22 2 1 1 2
30 2012-02-22 2 1 1 2
80 2012-02-23 1 2 2 2
10 2012-02-23 3 2 2 2
15 2012-02-23 1 2 2 3
40 2012-02-23 1 2 4 1
And I would like to have this kind of result as another table, or csv/excel file or php array (where totalHours is the sum of the hours for a userId) for a given period of time, let say (between 2012-02-01 and 2012-02-25):
clientId projectId taskId userId totalHours
1 1 1 1 60
2 30
2 2 50
2 2 2 1 80
3 10
3 1 15
4 1 1 40
I guess I have to use multiple group by, I tried something like:
SELECT clientId, projectId, taskId, userId, sum(hours)
FROM `timeEntry`
WHERE date_creation >= "2012-02-01"
AND date_creation <= "2012-02-25"
GROUP BY clientId, projectId, taskId, userId;
But didn't work...
Thanks in advance.
Where needs to go before Group by.
If you want to filter by date before grouping, use the where clause you have but moved before the group by.
If you'd instead like to filter entire groups in or out, use a having:
...
Group by ...
Having max(date) <= someValue and min(date) >= someValue
SELECT clientId, projectId, taskId, userId sum(Hours) total_hours
FROM timeEntry
GROUP BY clientID, ProjectID, TaskID, userID;
You can use [with Rollup][1] to generate sub-aggregrates. if you want
Here is a simple solution. You need to group by all necessary fields
SELECT
t.clientId,
t.projectId,
t.taskId,
t.userId,
SUM(t.hours) AS Total
FROM test AS t
GROUP BY t.creationDate , t.userId , t.clientId , t.projectId , t.taskId
Fiddle Demo
OUTPUT:
clientId projectId taskId userId Total
____________________________________________________________
1 1 1 1 60
1 1 1 2 30
1 1 2 2 50
2 2 2 1 80
2 2 3 1 15
2 4 1 1 40
2 2 2 3 10
I have 4 major tables in my database.
Season --> seasonID
Trials --> trialID
Competition --> CID,name
Camps --> campID,DivisionID(FK)
Divisions ---> DivisionID
Contestants --->ContestantID
Now a contestant belongs to / are members of a divisions.
Then a division belongs to a camp.
All this leads to my Performance table.
PERFORMANCE TABLE
SeasonID|TrialID|CampID|DivID|CompetionID|CtestantID|Score1 |Score2 |Total
1 1 1 1 1 1 20 20 40
1 1 1 1 2 1 20 15 30
1 2 1 1 1 2 10 5 15
1 2 1 1 2 2 5 5 10
1 2 1 1 1 1 10 30 40
1 2 1 1 2 1 20 10 30
How can I query this performance table to give me the competition name, total score and rank (ranking over total score) of each contestant in each competition by trials and by seasons?
Example:
In season 1 and trial 2 I want to have:
SeasonID| TrialID | ContestantID| Competition | TotalScore | Rank
1 2 1 1 40 1
1 2 2 1 15 2
1 2 1 2 30 1
1 2 2 2 10 2
How do I go about this? I have tried table variables, pivot and joins but I can only rank by competitions, but I don't how to aggregate the results to get the result above!
I'm not exactly sure how you calculated your desired results. I think this is what you are after but, if so, the TotalScore in the desired results of your question should be 10 for the last record, not 20.
SELECT SeasonID, TrialID, ContestantID, CompetitionID, Total,
DENSE_RANK() OVER(PARTITION BY CompetitionId ORDER BY Total DESC) AS [Rank]
FROM PerformanceTable