Get distinct values in union all in hive

Get distinct values in union all in hive - mysql

I have a table in hive that looks something like this
cust_id prod_id timestamp
1 11 2011-01-01 03:30:23
2 22 2011-01-01 03:34:53
1 22 2011-01-01 04:21:03
2 33 2011-01-01 04:44:09
3 33 2011-01-01 04:54:49
so on and so forth.
For each record I want to check that how many unique products has this customer bought within the last 24 hrs excluding the current transaction. So the output should look something like this -
1 0
2 0
1 1
2 1
3 0
My hive query looks something like this
select * from(
select t1.cust_id, count(distinct t1.prod_id) as freq from temp_table t1
left outer join temp_table t2 on (t1.cust_id=t2.cust_id)
where t1.timestamp>=t2.timestamp
and unix_timestamp(t1.timestamp)-unix_timestamp(t2.timestamp) < 24*60*60
group by t1.cust_id
union all
select t.cust_id, 0 as freq from temp_table t2
)unioned;

Just get all the rows for last 24 hours do a group by on custid and count(distinct productid) -1 as the output. Overall query would look something like this.
select cust_id, COUNT(distinct prod_id) - 1 from table_name where
unix_timestamp(t1.timestamp)-unix_timestamp(t2.timestamp) < 24*60*60
GROUP BY cust_id
*I am subtracting 1 here to exclude the latest transactionid of the user. (hope this is what you meant)

You can join to a derived table that contains the distinct # of products purchased in the past 24 hours for each customer/timestamp pair.
select t1.cust_id, t1.prod_id, t1.timestamp, t2.count_distinct_prod_id - 1
from mytable t1
join (
select t2.cust_id, t2.timestamp, count(distinct t3.prod_id) count_distinct_prod_id
from mytable t2
join mytable t3 on t3.cust_id = t2.cust_id
where unix_timestamp(t2.timestamp) - unix_timestamp(t3.timestamp) < 24*60*60
group by t2.cust_id, t2.timestamp
) t2 on t1.cust_id = t2.cust_id and t1.timestamp = t2.timestamp

Related

How to get number of same id's where the item was found in mysql?

Let's say I have table like this:
some_id
date
1
2022-02-01
2
2022-02-02
3
2022-02-03
3
2022-02-04
3
2022-02-05
3
2022-02-06
I want to get the number of rows based on the id where the date was found?
I tried this but it's not working:
SELECT COUNT(id) FROM dates WHERE date = '2022-02-04'
Expected output should be 4 rows since there are 4 same id's where the 2022-02-04 was found.

This should do the job:
SELECT COUNT(*) FROM tbl
WHERE id IN (
SELECT id FROM tbl WHERE `date`='2022-02-04'
)

An exists query should do it:
SELECT id, COUNT(*)
FROM t
WHERE EXISTS (
SELECT 1
FROM t AS x
WHERE x.id = t.id
AND x.date = '2022-02-04'
)
GROUP BY id

Using exists logic we can try:
SELECT COUNT(*)
FROM dates d1
WHERE EXISTS (SELECT 1 FROM dates d2
WHERE d2.some_id = d1.some_id AND
d2.date = '2022-02-04');

How to select All the rows from first table and get count of all the matching rows from other table for each row retrived from first table

Following are the tables
Table 1
price col1 col2 time
10 1 1 10
100 1 1 13
150 1 1 15
Table 2
id startTm endTm col1 col2
1 12 20 1 1
2 15 26 1 1
3 11 13 1 1
I want all the rows from table 2 satisfying startTm >= x and endTm <= y. And for each row in result I want to find count of all the records in table 1 where table1.time lies in startTm and endTm for that particular row
Something like this-
SELECT (#sTime:=T2.startTm) AS startTm,JT.totalNo, JT.totalPrice,
(#eTime:=T2.endTm) AS endTm
some more columns FROM table 2 AS T2
LEFT JOIN (SELECT COUNT(id) AS totalNo,col1, col2 SUM(price) AS
totalPrice FROM table 1 WHERE time BETWEEN #sTime AND #eTime GROUP
BY col1, col2)
AS JT ON JT.col1 = T2.col1
WHERE T2.startTm >= some value AND T2.endTm <= some value.
There are no related foreign keys.I Am not getting proper results. How is it done?
Edit
I want all the records from table 2 within specified time range suppose startTm >= 10 to endTm<=20
so output table will be
startTm endTm totalNo totalPrice some more col
12 20 2 250 ...
11 13 1 100 ...
to calculate total Price and total number I want to consider startTm and endTime of that particular row.

Is this what you want?
SELECT startTm, endTm, COUNT(price), SUM(price), t3_others, t1_others
FROM
(
SELECT T3.startTm AS startTm, T3.endTm AS endTm, T3.others AS t3_others, T1.price AS price, T1.others AS t1_others
FROM T1
RIGHT JOIN
(
SELECT T2.startTm, T2.endTm, T2.others
FROM T2
WHERE T2.startTm >= 10 AND T2.endTm <= 20 AND T2.col1 = col1_value AND T2.col2 = col2_value
) AS T3
ON T1.time >= T3.startTm AND T1.time <= T3.endTm
) AS T4
GROUP BY startTm, endTm;
Add other more fields as you need.

Try Following Query:
Select t1.* , t2.* from Table 1 as t1 RIGHT JOIN Table 2 as t2 ON t1.col1 = t2.col1 WHERE t2.startTm >= 'your value' AND t2.endTm <= 'your value'

Mysql SELECT value compare?

table like this:
-----------------------------
id pid key value
-----------------------------
1 3 all 120
2 3 today 180
3 9 all 200
4 9 today 150
5 9 others 0
-----------------------------
how to
select * from table if all(120) < today(180) and if they have same pid(3)
I hope the result should be:
---------------------------
id pid key value
---------------------------
1 3 all 120
2 3 today 180

Assuming you want all entries for which another entry with all < today exists for the same pid
select t from table t where exists(
select * from table t2 where t.all < t2.today and t.pid = t2.pid
)

Try this:
select t1.*
from yourtable t1
inner join (
select *
from yourtable
group by pid
having sum(if(`key` = 'all', `value`, 0)) < sum(if(`key` = 'today', `value`, 0))
) t2 on t1.pid = t2.pid
SQLFiddle Demo

Group dates based on variable periods

i have two tables as follows------
table-1
CalenderType periodNumber periodstartdate
1 1 01-01-2013
1 2 11-01-2013
1 3 15-01-2013
1 4 25-01-2013
2 1 01-01-2013
2 2 15-01-2013
2 3 20-01-2013
2 4 25-01-2013
table2
Incidents Date
xyz 02-01-2013
xxyyzz 03-01-2013
ccvvb 12-01-2013
vvfg 16-01-2013
x3 17-01-2013
x5 24-01-2013
Now i want to find out the number of incidents took place in a given period(the Calendar type may change on runtime like)
the query should look something like this
select .......
from ......
where CalendarType=1
which should return
CalendarType PeriodNumber Incidents
1 1 2
1 2 1
1 3 3
1 4 0
can someone suggest me an approach or any method how this can be achieved.
Note:each period is variable in size.peroid1 may have 10 days period2 may have 5 days etc.

I think this does what you want, although I don't understand how you arrived at your sample output:
select t.CalenderType, t.periodNumber, count(*) as Incidents
from Table1 t
inner join (
select t2.Date, t2.Incidents, max(t1.periodstartdate) as PeriodStartDate
from Table2 t2
inner join Table1 t1 on t2.Date >= t1.periodstartdate
where CalenderType = 1
group by t2.Date, t2.Incidents
) a on t.periodstartdate = a.PeriodStartDate
where CalenderType=1
group by t.CalenderType, t.periodNumber
SQL Fiddle Example

Try this, a bit more general solution,SQLFiddle (Thanks RedFilter for schema):
SELECT t1.CalenderType, t1.periodNumber, count(Incidents)
FROM Table1 t1, Table1 t11, Table2
WHERE
(
(
t1.CalenderType = t11.CalenderType
AND t1.periodNumber = t11.periodNumber - 1
AND Date BETWEEN t1.periodstartdate AND t11.periodstartdate
)
OR
(
t1.periodNumber = (SELECT MAX(periodNumber) FROM Table1 WHERE t1.CalenderType = CalenderType)
AND Date > t1.periodstartdate
)
)
GROUP BY t1.CalenderType, t1.periodNumber
ORDER BY t1.CalenderType, t1.periodNumber

Join ON Date Group

I have 3 tables;
Table 1
id date
1 1132123123
2 1232342341
etc
Table 2
id date
1 1132123123
2 1232342341
etc
Table 3
id date
1 1132123123
2 1232342341
etc
All "date" columns are unix timestamps.
I am trying to join these 3 tables and count totals for each table respectively grouped by:
FROM_UNIXTIME(date, '%m-%d-%Y')
Ideally, I'd like this result:
formatteddate t1count t2count t3count
04-12-2011 2 2 2
04-13-2011 1 2 3
NOTE: The result doesn't match up to the example data, but I think it's pretty straight-forward.
Here's what I've tried so far:
SELECT
FROM_UNIXTIME(t1.date, '%m-%d-%Y') as t1date,
FROM_UNIXTIME(t2.date, '%m-%d-%Y') as t2date,
FROM_UNIXTIME(t3.date, '%m-%d-%Y') as t3date,
count(t1.id) as t1count,
count(t2.id) as t2count,
count(t3.id) as t3count
FROM
t1,t2,t3
GROUP BY
t1date
The query doesn't even load. t3 contains lots of data (1 million + records). t1 & t2, not so much.

select from_unixtime(date,'%m-%d-%Y') as d,
sum(tb=1) as tb1,
sum(tb=2) as tb2,
sum(tb=3) as tb3
from (
select date,1 as tb from t1
union all
select date,2 from t2
union all
select date,3 from t3) as t
group by d

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Get distinct values in union all in hive - mysql

Related

How to get number of same id's where the item was found in mysql?

How to select All the rows from first table and get count of all the matching rows from other table for each row retrived from first table

Mysql SELECT value compare?

Group dates based on variable periods

Join ON Date Group

Categories

Resources