mysql LEFT JOIN followed by GROUP BY with BETWEEN Ranges

mysql LEFT JOIN followed by GROUP BY with BETWEEN Ranges - mysql

I am stumped with how I should proceed. Here is my current LEFT JOIN command which works just fine:
SELECT t1.avg_temperature as T_aver,
t2.new_confirmed as count
FROM 3_day_avg as t1
LEFT JOIN table2 as t2 on t1.date = t2.date
And this works great to make this table:
T_aver |count|
-----------------
-0.2 | 2 |
3 | 2 |
5 | 1 |
-2.3 | 4 |
22 | 0 |
But now I want to take it one step further and group by ranges of T_aver (bins like 0-5, 6-10, 11-15, etc) and SUM() the count column. For example, If I was to place the range on the LEFT JOIN table example above of -10 to 0, and 0 to 30, the final table would look like this:
Trange |count|
-----------------
-10 - 0 | 6 |
0 - 30 | 3 |
This above transformation is where I am stumped and I fear to make my life simpler I just need to create one big table instead...
Thanks in advance

You were very close! Just start with your existing query and then wrap it.
For your bins use another lookup table if you don't want to be bound to a fixed interval:
bins:
mintemp | maxtemp
-10 | 0
0 | 30
I will use num instead of count, as I never use reserved words as columns:
SELECT
CONCAT(mintemp, ' - ', maxtemp) AS Trange,
SUM(baseview.num) AS num
FROM bins
INNER JOIN (
SELECT
t1.avg_temperature as T_aver,
t2.new_confirmed as num
FROM 3_day_avg as t1
LEFT JOIN table2 as t2 on t1.date = t2.date
) AS baseview
ON baseview.T_aver>bins.mintemp AND baseview.T_aver<=bins.maxtemp
GROUP BY bins.mintemp;

SELECT ranges.caption Trange,
SUM(t2.new_confirmed) as `count`
FROM 3_day_avg as t1
LEFT JOIN table2 as t2 on t1.date = t2.date
JOIN ( SELECT -10 t_from, 0 t_to, '-10 - 0' caption
UNION ALL
SELECT 0, 30, '0 - 30' ) ranges ON t1.avg_temperature >= ranges.t_from
AND t1.avg_temperature < ranges.t_to
GROUP BY ranges.caption;
It is better to create static ranges table instead of dynamically generated. This allows to create and to store a lot of pre-defined sets of ranges.

You can group by a CASE expression which contains your bins:
SELECT
CASE
WHEN t1.avg_temperature >= -10 and t1.avg_temperature <= 0 THEN '-10 - 0'
WHEN t1.avg_temperature > 0 and t1.avg_temperature <=30 THEN '0 - 30'
END AS Trange,
SUM(t2.new_confirmed) AS count
FROM 3_day_avg AS t1 LEFT JOIN table2 AS t2
ON t1.date = t2.date
GROUP BY Trange
You may add more bins in the CASE expression, change the ranges and the inequality signs to suit your requirement.

Related

Get distinct values in union all in hive

I have a table in hive that looks something like this
cust_id prod_id timestamp
1 11 2011-01-01 03:30:23
2 22 2011-01-01 03:34:53
1 22 2011-01-01 04:21:03
2 33 2011-01-01 04:44:09
3 33 2011-01-01 04:54:49
so on and so forth.
For each record I want to check that how many unique products has this customer bought within the last 24 hrs excluding the current transaction. So the output should look something like this -
1 0
2 0
1 1
2 1
3 0
My hive query looks something like this
select * from(
select t1.cust_id, count(distinct t1.prod_id) as freq from temp_table t1
left outer join temp_table t2 on (t1.cust_id=t2.cust_id)
where t1.timestamp>=t2.timestamp
and unix_timestamp(t1.timestamp)-unix_timestamp(t2.timestamp) < 24*60*60
group by t1.cust_id
union all
select t.cust_id, 0 as freq from temp_table t2
)unioned;

Just get all the rows for last 24 hours do a group by on custid and count(distinct productid) -1 as the output. Overall query would look something like this.
select cust_id, COUNT(distinct prod_id) - 1 from table_name where
unix_timestamp(t1.timestamp)-unix_timestamp(t2.timestamp) < 24*60*60
GROUP BY cust_id
*I am subtracting 1 here to exclude the latest transactionid of the user. (hope this is what you meant)

You can join to a derived table that contains the distinct # of products purchased in the past 24 hours for each customer/timestamp pair.
select t1.cust_id, t1.prod_id, t1.timestamp, t2.count_distinct_prod_id - 1
from mytable t1
join (
select t2.cust_id, t2.timestamp, count(distinct t3.prod_id) count_distinct_prod_id
from mytable t2
join mytable t3 on t3.cust_id = t2.cust_id
where unix_timestamp(t2.timestamp) - unix_timestamp(t3.timestamp) < 24*60*60
group by t2.cust_id, t2.timestamp
) t2 on t1.cust_id = t2.cust_id and t1.timestamp = t2.timestamp

Column calculated by column with grouping

I have a simple table -
id | date | type | value
-------------------------
1 1/1/14 A 1
2 1/1/14 A 10
3 2/1/14 A 10
4 2/1/14 A 15
5 2/1/14 B 15
6 2/1/14 B 20
I would like to create a new column which calculates the minimum value per day per type. So giving the following results -
id | date | type | value | min_day
-----------------------------------
1 1/1/14 A 1 1
2 1/1/14 A 10 1
3 2/1/14 A 10 10
4 2/1/14 A 15 10
5 2/1/14 B 15 15
6 2/1/14 B 20 15
Is this possible? If so how would I go about it? I've been looking into triggers.
Thanks for any help.

First create a field named min_day in your table. Then you can use JOIN in an UPDATE query.
Try this:
UPDATE TableName T1 JOIN
(SELECT date,type,MIN(value) as MinValue
FROM TableName
GROUP BY type,date) T2 ON T1.type=T2.type AND T1.date=T2.date
SET T1.min_day = T2.MinValue
An example in SQL Fiddle.
EDIT:
For day-wise grouping:
UPDATE TableName T1 JOIN
(SELECT MONTH(date) as mon,type,MIN(value) as MinValue
FROM TableName
GROUP BY type,MONTH(date)) T2 ON T1.type=T2.type AND MONTH(T1.date)=T2.mon
SET T1.min_day = T2.MinValue
Result in SQL Fiddle.

Assuming that your table's name is mytable, try this:
SELECT mt.id,
mt.date,
mt.type,
mt.value,
mt.min_day,
md.min_value
FROM mytable mt
LEFT JOIN
(SELECT date, MIN(value) min_value FROM mytable GROUP BY DATE
) md
ON mt.date=md.date;

SELECT t1.*,
t2.min_day
FROM Table1 AS t1
JOIN
(SELECT date,TYPE,
min(value) AS min_day
FROM table1
GROUP BY date,TYPE) AS t2 ON t1.TYPE = t2.TYPE
AND t1.date = t2.date

Count value variation from 0 to 1 in mysql table

I have a column with two columns. one is TIMESTAMP and the other DIGITAL_BIT.
The value digital bit can be either 0 or 1 and changes a few times during the day. Every minute of the day is stored in this table. I would need to read somehow how many times a day this value changed from 0 to 1.
Is it possible to make a query that returns the count of this changes? What I have in mind is something like this:
select * from mytable where digital_bit = 1 and digital_bit (of previous row) = 0 order by timestamp
Can this be done with a query or do i have to process all data in my program?
Thanks
SAMPLE
timestamp | digital_bit
100000 | 0
100001 | 0
100002 | 1
100003 | 1
100004 | 0
100005 | 1
100006 | 0
100007 | 0
100008 | 1
the above should return 3 because for 3 times the value digital passed from 0 to 1. i need to count how often the value digital CHANGES from 0 to 1.

Here you go. This will get you a count of how many times digital_bit switched from 0 to 1 (in your example, this will return 3).
SELECT COUNT(*)
FROM mytable curr
WHERE curr.digital_bit = 1
AND (
SELECT digital_bit
FROM mytable prev
WHERE prev.timestamp < curr.timestamp
ORDER BY timestamp DESC
LIMIT 1
) = 0
SQLFiddle link
(Original answer relied on the timestamps being sequential: e.g. no jumps from 100001 to 100003. Answer has now been updated not to have that restriction.)

IF you have a result once per minte, you can simple join the table with itself, and
use timestamp+1 as well as leftbit != rightbit as join condition.
http://sqlfiddle.com/#!8/791c0/6
ALL Changes:
SELECT
COUNT(*)
FROM
test a
INNER JOIN
test b
ON
a.digital_bit != b.digital_bit
AND b.timestamp = a.timestamp+1;
Changes from 0 to 1
SELECT
COUNT(*)
FROM
test a
INNER JOIN
test b
ON
a.digital_bit = 0 AND
a.digital_bit != b.digital_bit
AND b.timestamp = a.timestamp+1;
Changes from 1 to 0
SELECT
COUNT(*)
FROM
test a
INNER JOIN
test b
ON
a.digital_bit = 1 AND
a.digital_bit != b.digital_bit
AND b.timestamp = a.timestamp+1;

Adapted from: How do I query distinct values within multiple sub record sets
select count(*)
from (select t1.*,
(select digital_bit
from table t2
where t2.timestamp < t1.timestamp
order by timestamp desc LIMIT 1
) as prevvalue
from table t1
) t1
where prevvalue <> digital_bit and digital_bit = 1;

This isn't likely to be efficient with a lot of data, but you can get all the rows and calculate a sequence number for them, then do the same again but with the sequence number offset by 1. Then join the 2 lots together where those calculated sequence numbers match but the first one has a digital bit of 0 and the other a digital bit of 1:-
SELECT COUNT(*)
FROM
(
SELECT mytable.timestamp, mytable.digital_bit, #aCount1:=#aCount1+1 AS SeqCount
FROM mytable
CROSS JOIN (SELECT #aCount1:=1) sub1
ORDER BY timestamp
) a
INNER JOIN
(
SELECT mytable.timestamp, mytable.digital_bit, #aCount2:=#aCount2+1 AS SeqCount
FROM mytable
CROSS JOIN (SELECT #aCount2:=0) sub1
ORDER BY timestamp
) b
ON a.SeqCount = b.SeqCount
AND a.digital_bit = 0
AND b.digital_bit = 1
EDIT - alternative solution and I would be interested to see how this performs. It avoids the need for adding a sequence number and also avoids a correlated sub query:-
SELECT COUNT(*)
FROM
(
SELECT curr.timestamp, MAX(curr2.timestamp) AS MaxTimeStamp
FROM mytable curr
INNER JOIN mytable curr2
ON curr.timestamp > curr2.timestamp
AND curr.digital_bit = 1
GROUP BY curr.timestamp
) Sub1
INNER JOIN mytable curr
ON Sub1.MaxTimeStamp = curr.timestamp
AND curr.digital_bit = 0

As I understood you have one query every minute. So you have no problem with performance.
You can add flag:
timestamp | digital_bit | changed
100000 | 0 | 0
100001 | 0 | 0
100002 | 1 | 1
100003 | 1 | 0
100004 | 0 | 1
100005 | 1 | 1
100006 | 0 | 1
100007 | 0 | 0
100008 | 1 | 1
And make check before insert:
SELECT digital_bit
FROM table
ORDER BY timestamp DESC
LIMIT 1
and if digital_bit is different insert new row with flag.
And then you just can take COUNT of flags:
SELECT COUNT(*)
FROM table
WHERE DATE BETWEEN (start, end)
AND changed = 1
Hope will see in answers better solution.

Group dates based on variable periods

i have two tables as follows------
table-1
CalenderType periodNumber periodstartdate
1 1 01-01-2013
1 2 11-01-2013
1 3 15-01-2013
1 4 25-01-2013
2 1 01-01-2013
2 2 15-01-2013
2 3 20-01-2013
2 4 25-01-2013
table2
Incidents Date
xyz 02-01-2013
xxyyzz 03-01-2013
ccvvb 12-01-2013
vvfg 16-01-2013
x3 17-01-2013
x5 24-01-2013
Now i want to find out the number of incidents took place in a given period(the Calendar type may change on runtime like)
the query should look something like this
select .......
from ......
where CalendarType=1
which should return
CalendarType PeriodNumber Incidents
1 1 2
1 2 1
1 3 3
1 4 0
can someone suggest me an approach or any method how this can be achieved.
Note:each period is variable in size.peroid1 may have 10 days period2 may have 5 days etc.

I think this does what you want, although I don't understand how you arrived at your sample output:
select t.CalenderType, t.periodNumber, count(*) as Incidents
from Table1 t
inner join (
select t2.Date, t2.Incidents, max(t1.periodstartdate) as PeriodStartDate
from Table2 t2
inner join Table1 t1 on t2.Date >= t1.periodstartdate
where CalenderType = 1
group by t2.Date, t2.Incidents
) a on t.periodstartdate = a.PeriodStartDate
where CalenderType=1
group by t.CalenderType, t.periodNumber
SQL Fiddle Example

Try this, a bit more general solution,SQLFiddle (Thanks RedFilter for schema):
SELECT t1.CalenderType, t1.periodNumber, count(Incidents)
FROM Table1 t1, Table1 t11, Table2
WHERE
(
(
t1.CalenderType = t11.CalenderType
AND t1.periodNumber = t11.periodNumber - 1
AND Date BETWEEN t1.periodstartdate AND t11.periodstartdate
)
OR
(
t1.periodNumber = (SELECT MAX(periodNumber) FROM Table1 WHERE t1.CalenderType = CalenderType)
AND Date > t1.periodstartdate
)
)
GROUP BY t1.CalenderType, t1.periodNumber
ORDER BY t1.CalenderType, t1.periodNumber

Select distinct values from two columns

I have a table with the following structure:
itemId | direction | uid | created
133 0 17 1268497139
432 1 140 1268497423
133 0 17 1268498130
133 1 17 1268501451
I need to select distinct values for two columns - itemId and direction, so the output would be like this:
itemId | direction | uid | created
432 1 140 1268497423
133 0 17 1268498130
133 1 17 1268501451
In the original table we have two rows with the itemId - 133 and direction - 0, but we need only one of this rows with the latest created time.
Thank you for any suggestions!

Use:
SELECT t.itemid,
t.direction,
t.uid,
t.created
FROM TABLE t
JOIN (SELECT a.itemid,
MAX(a.created) AS max_created
FROM TABLE a
GROUP BY a.itemid) b ON b.itemid = t.itemid
AND b.max_created = t.created
You have to use an aggregate (IE: MAX) to get the largest created value per itemid, and join that onto an unaltered copy of the table to get the values associated with the maximum created value for each itemid.

select t1.itemid, t1.direction, t1.uid, t1.created
from (select t2.itemid, t2.direction, t2.created as maxdate
from tbl t2
group by itemid, direction) x
inner join tbl t1
on t1.itemid = x.itemid
and t1.direction = x.direction
and t1.created = x.maxdate

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

mysql LEFT JOIN followed by GROUP BY with BETWEEN Ranges - mysql

Related

Get distinct values in union all in hive

Column calculated by column with grouping

Count value variation from 0 to 1 in mysql table

Group dates based on variable periods

Select distinct values from two columns

Categories

Resources