I need to find a Hive query that returns the 2 top names for each 6 minutes interval since 00:00:00,
Data looks like -
Name Time
A 00:00:00
B 00:03:53
C 00:01:16
A 00:04:34
A 00:07:32
A 00:18:36
C 00:16:12
C 00:05:04
B 00:01:50
B 00:12:05
A 00:11:20
B 00:04:27
B 00:02:47
A 00:00:23
A 00:00:23
B 00:36:21
B 00:02:46
I would like to write the query in Hive which is very new for me but even using mysql query , I can derive the result in Hive.
select
*
from
(
select NAME
, time_interval_6
, rank() over (partition by NAME, time_interval_6 order by ct desc) as ranking
from
(select count(1) as ct
, NAME
, floor((floor(cast(substring(time,1,2) as int)*60 + cast(substring(time,4,2) as int)))/6) as time_interval_6
FROM MY_TABLE
group by NAME, floor((floor(cast(substring(time,1,2) as int)*60 + cast(substring(time,4,2) as int)))/6)
) a
)b
where ranking <= 2
;
Related
Let's say I have table like this:
some_id
date
1
2022-02-01
2
2022-02-02
3
2022-02-03
3
2022-02-04
3
2022-02-05
3
2022-02-06
I want to get the number of rows based on the id where the date was found?
I tried this but it's not working:
SELECT COUNT(id) FROM dates WHERE date = '2022-02-04'
Expected output should be 4 rows since there are 4 same id's where the 2022-02-04 was found.
This should do the job:
SELECT COUNT(*) FROM tbl
WHERE id IN (
SELECT id FROM tbl WHERE `date`='2022-02-04'
)
An exists query should do it:
SELECT id, COUNT(*)
FROM t
WHERE EXISTS (
SELECT 1
FROM t AS x
WHERE x.id = t.id
AND x.date = '2022-02-04'
)
GROUP BY id
Using exists logic we can try:
SELECT COUNT(*)
FROM dates d1
WHERE EXISTS (SELECT 1 FROM dates d2
WHERE d2.some_id = d1.some_id AND
d2.date = '2022-02-04');
please help me to write this query.I have tried with leftjoin but its not working.
I have two table tdate and tollname. In tdate table I have dates only, like say of one month and second table tollname I have names of toll with dates.
I want to find toll wise dates missing from table tollname.
Table name: tdate
Dates
1
2
3
4
...
30
Tollname
Dates TollName
1 A
1 B
1 C
5 A
5 B
6 C
9 B
12 A
12 B
12 C
28 A
28 B
30 C
You can just use a cross join and left join (or equivalently not exists/not in). This generates all the combinations of the tollname and date, and then returns the ones that are not present in your table:
select d.date, t.tollname
from tdate d cross join
(select distinct tollname from tollname) t
where not exists (select 1
from tollname t2
where d.date = t2.date and t.tollname = t2.tollname
);
If you have a separate table with the tollnames, then you can use that instead of the subquery:
SQL FIDDLE DEMO
SELECT D.*
FROM tdate D
LEFT JOIN Tollname T
ON D.Dates = T.Dates
WHERE T.Dates IS NULL
SELECT d.* from tdate d left join Tollname t on d.Dates = t.Dates
WHERE t.TollName is null
Here's the data:
empID Date Type
----- -------- ----
1 1/1/2012 u
1 1/2/2012 u
1 1/3/2012 u
1 2/2/2012 u
4 1/1/2012 u
4 1/3/2012 u
4 1/4/2012 u
4 1/6/2012 u
Would return:
empID count
----- -----
1 2
4 3
When two dates are "together" they count as one occurrence, if the dates are separated out, they count as two occurrences. This is for tracking employee attendance... how would the SQL statement look to group by "together" dates and count them as 1... I'm really struggling with the logic.
SELECT
empID
, COUNT(*) AS cnt
FROM
tableX AS x
WHERE
NOT EXISTS
( SELECT *
FROM tableX AS y
WHERE y.empID = x.empID
AND DATEADD ("d", -1, x.[Date]) = y.[Date]
)
GROUP BY
empID ;
try this:
;WITH CTE as
(select *,ROW_NUMBER() over (partition by empID order by date) as rn from test2 t1)
select empID,COUNT(*) as count
from CTE c1
where isnull((DATEDIFF(day,(select date from CTE where c1.rn=rn+1 and empID=c1.empID ),c1.date)),0) <> 1
group by empID
I want to count how many times each user has rows within '5' of eachother.
For example, Don - 501 and Don - 504 should be counted, while Don - 501 and Don - 1600 should not be counted.
Start:
Name value
_________ ______________
Don 1235
Don 6012
Don 6014
Don 6300
James 9000
James 9502
James 9600
Sarah 1110
Sarah 1111
Sarah 1112
Sarah 1500
Becca 0500
Becca 0508
Becca 0709
Finish:
Name difference_5
__________ _____________
Don 1
James 0
Sarah 2
Becca 0
Use the ABS() function, in conjunction with a self-join in a subquery:
So, something like:
SELECT name, COUNT(*) / 2 AS difference_5
FROM (
SELECT a.name name, ABS(a.value - b.value)
FROM tbl a JOIN tbl b USING(name)
WHERE ABS(a.value - b.value) BETWEEN 1 AND 5
) AS t GROUP BY name
edited as per Andreas' comment.
Assuming that each name -> value pair is unique, this will get you the count of times the value is within 5 per name:
SELECT a.name,
COUNT(b.name) / 2 AS difference_5
FROM tbl a
LEFT JOIN tbl b ON a.name = b.name AND
a.value <> b.value AND
ABS(a.value - b.value) <= 5
GROUP BY a.name
As you'll notice, we also have to exclude the pairs that are equal to themselves.
But if you wanted to count the number of times each name's values came within 5 of any value in the table, you can use:
SELECT a.name,
COUNT(b.name) / 2 AS difference_5
FROM tbl a
LEFT JOIN tbl b ON NOT (a.name = b.name AND a.value = b.value) AND
ABS(a.value - b.value) <= 5
GROUP BY a.name
See the SQLFiddle Demo for both solutions.
Because the OP also wants de zero counts, we'll need a self- left join. Extra logic is needed if one person has two exactly the same values, these should also be counted only once.
WITH cnts AS (
WITH pair AS (
SELECT t1.zname,t1.zvalue
FROM ztable t1
JOIN ztable t2
ON t1.zname = t2.zname
WHERE ( t1.zvalue < t2.zvalue
AND t1.zvalue >= t2.zvalue - 5 )
OR (t1.zvalue = t2.zvalue AND t1.ctid < t2.ctid)
)
SELECT DISTINCT zname
, COUNT(*) AS znumber
FROM pair
GROUP BY zname
)
, names AS (
SELECT distinct zname AS zname
FROM ztable
GROUP BY zname
)
SELECT n.zname
, COALESCE(c.znumber,0) AS znumber
FROM names n
LEFT JOIN cnts c ON n.zname = c.zname
;
RESULT:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 14
zname | znumber
-------+---------
Sarah | 3
Don | 1
Becca | 0
James | 0
(4 rows)
NOTE: sorry for the CTE, I had not seen th mysql tag,I just liked the problem ;-)
SELECT
A.Name,
SUM(CASE WHEN (A.Value < B.Value) AND (A.Value >= B.Value - 5) THEN 1 ELSE 0 END) Difference_5
FROM
tbl A INNER JOIN
tbl B USING(Name)
GROUP BY
A.Name
I posted something similar to this yesterday, but now I'd like something a little different from my query-
I'm trying to query a database to retrieve the number of one-time users who have visited a website over time. The data looks something like this:
Day | UserID
1 | A
1 | B
2 | B
3 | A
4 | B
4 | C
5 | D
I'd like the query result to look this this
Time Span | COUNT(DISTINCT UserID)
Day 1 to Day 1 | 2
Day 1 to Day 2 | 1
Day 1 to Day 3 | 0
Day 1 to Day 4 | 1
Day 1 to Day 5 | 2
The result is 2,1,0,1,2 because, at the end of those days, there are X number of users who have visited a single time. e.g. for day 5, at the end of day 5, users c and d have visited only once each.
I think I'm looking for a query similar to this:
select d.day, (select count(distinct userid) from visits where day<=d.day)
from (select distinct day from visits) d
The difference between the query above and what I'm looking for is that I'd like this new query to consider only one-time users for each time span, and not repeat users.
Thanks
This subquery should work for the clarified requirements.
select d.day, count(distinct case when b.userid is null then a.userid end)
from (select day from visits group by day) d
inner join
(
select a.day, a.userid, count(*) c
from visits a
join visits b on a.userid=b.userid and b.day <= a.day
group by a.day, a.userid
having count(*) = 1
) a on a.day <= d.day
left join
(
select a.day, a.userid, count(*) c
from visits a
join visits b on a.userid=b.userid and b.day <= a.day
group by a.day, a.userid
having count(*) > 1
) b on a.userid = b.userid and b.day <= d.day
group by d.day
Original
You must have taken the idea from SQL Server - it is the only RDBMS (IIRC) that will allow you to reference a twice removed (nesting) query. Please indicate what you want and we can rewrite the query.
For the exact query shown, you don't need 2 levels of subquery
SELECT
C.col_c1 AS Data,
(
SELECT count(col_b1)
FROM tbl
WHERE col_b2 <= C.col_c1
) A
FROM (
SELECT col_c1 # subquery to get distinct c1
FROM tbl
GROUP BY col_c1) C;