I have a table in hive that looks something like this
cust_id prod_id timestamp
1 11 2011-01-01 03:30:23
2 22 2011-01-01 03:34:53
1 22 2011-01-01 04:21:03
2 33 2011-01-01 04:44:09
3 33 2011-01-01 04:54:49
so on and so forth.
For each record I want to check that how many unique products has this customer bought within the last 24 hrs excluding the current transaction. So the output should look something like this -
1 0
2 0
1 1
2 1
3 0
My hive query looks something like this
select * from(
select t1.cust_id, count(distinct t1.prod_id) as freq from temp_table t1
left outer join temp_table t2 on (t1.cust_id=t2.cust_id)
where t1.timestamp>=t2.timestamp
and unix_timestamp(t1.timestamp)-unix_timestamp(t2.timestamp) < 24*60*60
group by t1.cust_id
union all
select t.cust_id, 0 as freq from temp_table t2
)unioned;
Just get all the rows for last 24 hours do a group by on custid and count(distinct productid) -1 as the output. Overall query would look something like this.
select cust_id, COUNT(distinct prod_id) - 1 from table_name where
unix_timestamp(t1.timestamp)-unix_timestamp(t2.timestamp) < 24*60*60
GROUP BY cust_id
*I am subtracting 1 here to exclude the latest transactionid of the user. (hope this is what you meant)
You can join to a derived table that contains the distinct # of products purchased in the past 24 hours for each customer/timestamp pair.
select t1.cust_id, t1.prod_id, t1.timestamp, t2.count_distinct_prod_id - 1
from mytable t1
join (
select t2.cust_id, t2.timestamp, count(distinct t3.prod_id) count_distinct_prod_id
from mytable t2
join mytable t3 on t3.cust_id = t2.cust_id
where unix_timestamp(t2.timestamp) - unix_timestamp(t3.timestamp) < 24*60*60
group by t2.cust_id, t2.timestamp
) t2 on t1.cust_id = t2.cust_id and t1.timestamp = t2.timestamp
Related
Let's say I have table like this:
some_id
date
1
2022-02-01
2
2022-02-02
3
2022-02-03
3
2022-02-04
3
2022-02-05
3
2022-02-06
I want to get the number of rows based on the id where the date was found?
I tried this but it's not working:
SELECT COUNT(id) FROM dates WHERE date = '2022-02-04'
Expected output should be 4 rows since there are 4 same id's where the 2022-02-04 was found.
This should do the job:
SELECT COUNT(*) FROM tbl
WHERE id IN (
SELECT id FROM tbl WHERE `date`='2022-02-04'
)
An exists query should do it:
SELECT id, COUNT(*)
FROM t
WHERE EXISTS (
SELECT 1
FROM t AS x
WHERE x.id = t.id
AND x.date = '2022-02-04'
)
GROUP BY id
Using exists logic we can try:
SELECT COUNT(*)
FROM dates d1
WHERE EXISTS (SELECT 1 FROM dates d2
WHERE d2.some_id = d1.some_id AND
d2.date = '2022-02-04');
Following are the tables
Table 1
price col1 col2 time
10 1 1 10
100 1 1 13
150 1 1 15
Table 2
id startTm endTm col1 col2
1 12 20 1 1
2 15 26 1 1
3 11 13 1 1
I want all the rows from table 2 satisfying startTm >= x and endTm <= y. And for each row in result I want to find count of all the records in table 1 where table1.time lies in startTm and endTm for that particular row
Something like this-
SELECT (#sTime:=T2.startTm) AS startTm,JT.totalNo, JT.totalPrice,
(#eTime:=T2.endTm) AS endTm
some more columns FROM table 2 AS T2
LEFT JOIN (SELECT COUNT(id) AS totalNo,col1, col2 SUM(price) AS
totalPrice FROM table 1 WHERE time BETWEEN #sTime AND #eTime GROUP
BY col1, col2)
AS JT ON JT.col1 = T2.col1
WHERE T2.startTm >= some value AND T2.endTm <= some value.
There are no related foreign keys.I Am not getting proper results. How is it done?
Edit
I want all the records from table 2 within specified time range suppose startTm >= 10 to endTm<=20
so output table will be
startTm endTm totalNo totalPrice some more col
12 20 2 250 ...
11 13 1 100 ...
to calculate total Price and total number I want to consider startTm and endTime of that particular row.
Is this what you want?
SELECT startTm, endTm, COUNT(price), SUM(price), t3_others, t1_others
FROM
(
SELECT T3.startTm AS startTm, T3.endTm AS endTm, T3.others AS t3_others, T1.price AS price, T1.others AS t1_others
FROM T1
RIGHT JOIN
(
SELECT T2.startTm, T2.endTm, T2.others
FROM T2
WHERE T2.startTm >= 10 AND T2.endTm <= 20 AND T2.col1 = col1_value AND T2.col2 = col2_value
) AS T3
ON T1.time >= T3.startTm AND T1.time <= T3.endTm
) AS T4
GROUP BY startTm, endTm;
Add other more fields as you need.
Try Following Query:
Select t1.* , t2.* from Table 1 as t1 RIGHT JOIN Table 2 as t2 ON t1.col1 = t2.col1 WHERE t2.startTm >= 'your value' AND t2.endTm <= 'your value'
table like this:
-----------------------------
id pid key value
-----------------------------
1 3 all 120
2 3 today 180
3 9 all 200
4 9 today 150
5 9 others 0
-----------------------------
how to
select * from table if all(120) < today(180) and if they have same pid(3)
I hope the result should be:
---------------------------
id pid key value
---------------------------
1 3 all 120
2 3 today 180
Assuming you want all entries for which another entry with all < today exists for the same pid
select t from table t where exists(
select * from table t2 where t.all < t2.today and t.pid = t2.pid
)
Try this:
select t1.*
from yourtable t1
inner join (
select *
from yourtable
group by pid
having sum(if(`key` = 'all', `value`, 0)) < sum(if(`key` = 'today', `value`, 0))
) t2 on t1.pid = t2.pid
SQLFiddle Demo
i have two tables as follows------
table-1
CalenderType periodNumber periodstartdate
1 1 01-01-2013
1 2 11-01-2013
1 3 15-01-2013
1 4 25-01-2013
2 1 01-01-2013
2 2 15-01-2013
2 3 20-01-2013
2 4 25-01-2013
table2
Incidents Date
xyz 02-01-2013
xxyyzz 03-01-2013
ccvvb 12-01-2013
vvfg 16-01-2013
x3 17-01-2013
x5 24-01-2013
Now i want to find out the number of incidents took place in a given period(the Calendar type may change on runtime like)
the query should look something like this
select .......
from ......
where CalendarType=1
which should return
CalendarType PeriodNumber Incidents
1 1 2
1 2 1
1 3 3
1 4 0
can someone suggest me an approach or any method how this can be achieved.
Note:each period is variable in size.peroid1 may have 10 days period2 may have 5 days etc.
I think this does what you want, although I don't understand how you arrived at your sample output:
select t.CalenderType, t.periodNumber, count(*) as Incidents
from Table1 t
inner join (
select t2.Date, t2.Incidents, max(t1.periodstartdate) as PeriodStartDate
from Table2 t2
inner join Table1 t1 on t2.Date >= t1.periodstartdate
where CalenderType = 1
group by t2.Date, t2.Incidents
) a on t.periodstartdate = a.PeriodStartDate
where CalenderType=1
group by t.CalenderType, t.periodNumber
SQL Fiddle Example
Try this, a bit more general solution,SQLFiddle (Thanks RedFilter for schema):
SELECT t1.CalenderType, t1.periodNumber, count(Incidents)
FROM Table1 t1, Table1 t11, Table2
WHERE
(
(
t1.CalenderType = t11.CalenderType
AND t1.periodNumber = t11.periodNumber - 1
AND Date BETWEEN t1.periodstartdate AND t11.periodstartdate
)
OR
(
t1.periodNumber = (SELECT MAX(periodNumber) FROM Table1 WHERE t1.CalenderType = CalenderType)
AND Date > t1.periodstartdate
)
)
GROUP BY t1.CalenderType, t1.periodNumber
ORDER BY t1.CalenderType, t1.periodNumber
I have 3 tables;
Table 1
id date
1 1132123123
2 1232342341
etc
Table 2
id date
1 1132123123
2 1232342341
etc
Table 3
id date
1 1132123123
2 1232342341
etc
All "date" columns are unix timestamps.
I am trying to join these 3 tables and count totals for each table respectively grouped by:
FROM_UNIXTIME(date, '%m-%d-%Y')
Ideally, I'd like this result:
formatteddate t1count t2count t3count
04-12-2011 2 2 2
04-13-2011 1 2 3
NOTE: The result doesn't match up to the example data, but I think it's pretty straight-forward.
Here's what I've tried so far:
SELECT
FROM_UNIXTIME(t1.date, '%m-%d-%Y') as t1date,
FROM_UNIXTIME(t2.date, '%m-%d-%Y') as t2date,
FROM_UNIXTIME(t3.date, '%m-%d-%Y') as t3date,
count(t1.id) as t1count,
count(t2.id) as t2count,
count(t3.id) as t3count
FROM
t1,t2,t3
GROUP BY
t1date
The query doesn't even load. t3 contains lots of data (1 million + records). t1 & t2, not so much.
select from_unixtime(date,'%m-%d-%Y') as d,
sum(tb=1) as tb1,
sum(tb=2) as tb2,
sum(tb=3) as tb3
from (
select date,1 as tb from t1
union all
select date,2 from t2
union all
select date,3 from t3) as t
group by d