I am looking at a case in which we have a number of tanks filled with liquid. The amount of liquid is measured and information is stored in a database. This update is done every 5 minutes. Here the following information is stored:
tankId
FillLevel
TimeStamp
Each tank is categorized in one of the following 'fill-level' ranges:
Range A: 0 - 40%
Range B: 40 - 75%
Range C: 75 - 100%
Per range I count the amount of events per tankId.
SELECT sum(
CASE
WHEN filllevel>=0 and filllevel<40
THEN 1
ELSE 0
END) AS 'Range A',
sum(
CASE
WHEN filllevel>=40 and filllevel<=79
THEN 1
ELSE 0
END) AS 'Range B',
sum(
CASE
WHEN filllevel>79 and filllevel<=100
THEN 1
ELSE 0
END) AS 'Range C'
FROM TEST ;
The challenge is to ONLY count the latest record for each tank. So for each tankId there is only one count (and that must be the record with the latest time stamp).
For the following data:
insert into tank_db1.`TEST` (ts, tankId, fill_level) values
('2017-08-11 03:31:18', 'tank1', 10),
('2017-08-11 03:41:18', 'tank1', 45),
('2017-08-11 03:51:18', 'tank1', 95),
('2017-08-11 03:31:18', 'tank2', 20),
('2017-08-11 03:41:18', 'tank2', 30),
('2017-08-11 03:51:18', 'tank2', 80),
('2017-08-11 03:31:18', 'tank3', 30),
('2017-08-11 03:41:18', 'tank3', 45),
('2017-08-11 03:51:18', 'tank4', 55);
I would expect the outcome to be (only the records with the latest timestamp per tankId are counted):
- RANGE A: 0
- RANGE B: 1 (tankdId 3)
- RANGE C: 2 (tankId 1 and tankId2)
Probably easy if you are an expert, but for me it is real hard to see what the options are.
Thanks
You can use the following query to get the latest per group timestamp value:
select tankId, max(ts) as max_ts
from test
group by tankId;
Output:
tankId max_ts
--------------------------------
1 tank1 11.08.2017 03:51:18
2 tank2 11.08.2017 03:51:18
3 tank3 11.08.2017 03:41:18
4 tank4 11.08.2017 03:51:18
Using the above query as a derived table you can extract the latest per group fill_level value. This way you can apply the logic that computes each range level:
select sum(
CASE
WHEN t1.fill_level>=0 and t1.fill_level<40
THEN 1
ELSE 0
END) AS 'Range A',
sum(
CASE
WHEN t1.fill_level>=40 and t1.fill_level<=79
THEN 1
ELSE 0
END) AS 'Range B',
sum(
CASE
WHEN t1.fill_level>79 and t1.fill_level<=100
THEN 1
ELSE 0
END) AS 'Range C'
from test as t1
join (
select tankId, max(ts) as max_ts
from test
group by tankId
) as t2 on t1.tankId = t2.tankId and t1.ts = t2.max_ts
Output:
Range A Range B Range C
---------------------------
1 0 2 2
Demo here
I get a different result (oh, well, same result as GB):
SELECT GROUP_CONCAT(CASE WHEN fill_level < 40 THEN x.tankid END) range_a
, GROUP_CONCAT(CASE WHEN fill_level BETWEEN 40 AND 75 THEN x.tankid END) range_b
, GROUP_CONCAT(CASE WHEN fill_level > 75 THEN x.tankid END) range_c
FROM test x
JOIN (SELECT tankid,MAX(ts) ts FROM test GROUP BY tankid) y
ON y.tankid = x.tankid AND y.ts = x.ts;
+---------+-------------+-------------+
| range_a | range_b | range_c |
+---------+-------------+-------------+
| NULL | tank3,tank4 | tank1,tank2 |
+---------+-------------+-------------+
EDIT:
If I was solving this problem, and wanted to include the tank names in the result, then I'd probably execute the following...
SELECT x.*
FROM test x
JOIN
( SELECT tankid,MAX(ts) ts FROM test GROUP BY tankid) y
ON y.tankid = x.tankid
AND y.ts = x.ts
...and handle all the other problems, concerning counts, ranges, and missing/'0' values in application code.
Related
How do you rewrite this code correctly in Snowflake?
select account_code, date,
sum(box_revenue_recognition_amount) as box_revenue_recognition_amount
, sum(case when box_flg = 1 then box_sku_quantity end) as box_sku_quantity
, sum(box_revenue_recognition_refund_amount) as box_revenue_recognition_refund_amount
, sum(box_discount_amount) as box_discount_amount
, sum(box_shipping_amount) as box_shipping_amount
, sum(box_cogs) as box_cogs
, max(invoice_number) as invoice_number
, max(order_number) as order_number
, min(box_refund_date) as box_refund_date
, first (case when order_season_rank = 1 then box_type end) as box_type
, first (case when order_season_rank = 1 then box_order_season end) as box_order_season
, first (case when order_season_rank = 1 then box_product_name end) as box_product_name
, first (case when order_season_rank = 1 then box_coupon_code end) as box_coupon_code
, first (case when order_season_rank = 1 then revenue_recognition_reason end) as revenue_recognition_reason
from dedupe_sub_user_day
group by account_code, date
I have tried to apply window rule has explained in first_value Snowflake documentation to no avail with the SQLCompilation Error: ... is not a valid group by expression
select account_code, date,
first_value(case when order_season_rank = 1 then box_type end) over (order by box_type ) as box_type
first_value(case when order_season_rank = 1 then box_order_season end) over (order by box_order_season ) as box_order_season,
first_value(case when order_season_rank = 1 then box_product_name end) over (order by box_product_name ) as box_product_name,
first_value(case when order_season_rank = 1 then box_coupon_code end) over (order by box_coupon_code ) as box_coupon_code,
first_value(case when order_season_rank = 1 then revenue_recognition_reason end) over (order by revenue_recognition_reason ) as revenue_recognition_reason
, sum(box_revenue_recognition_amount) as box_revenue_recognition_amount
, sum(case when box_flg = 1 then box_sku_quantity end) as box_sku_quantity
, sum(box_revenue_recognition_refund_amount) as box_revenue_recognition_refund_amount
, sum(box_discount_amount) as box_discount_amount
, sum(box_shipping_amount) as box_shipping_amount
, sum(box_cogs) as box_cogs
, max(invoice_number) as invoice_number
, max(order_number) as order_number
, min(box_refund_date) as box_refund_date
from dedupe_sub_user_day
group by 1,2
First_value is not an aggregate function. But an window function, thus you get an error when you use it in relation to a GROUP BY. If you want to use it with a group up put an ANY_VALUE around it.
here is some data I will use below in a CTE:
with data(id, seq, val) as (
select * from values
(1, 1, 10),
(1, 2, 11),
(1, 3, 12),
(1, 4, 13),
(2, 1, 20),
(2, 2, 21),
(2, 3, 22)
)
So to show FIRST_VALUE is a window function we can just use it
select *
,first_value(val)over(partition by id order by seq) as first_val
from data
ID
SEQ
VAL
FIRST_VAL
1
1
10
10
1
2
11
10
1
3
12
10
1
4
13
10
2
1
20
20
2
2
21
20
2
3
22
20
So if we GROUP BY id, to avoid an error we have to wrap the FIRST_VALUE by an aggregate value, as given the are all equal, ANY_VALUE is a good pick, and it seems it needs to be in another layer of SQL:
select id
,count(*) as count
,any_value(first_val) as first_val
from (
select *
,first_value(val)over(partition by id order by seq) as first_val
from data
)
group by 1
order by 1;
ID |COUNT |FIRST_VAL
1 |4 |10
2 |3 |20
now MAX can be fun to use where used in relation to ROW_NUMBER() to pick the best value:
select id
,count(*) as count
,max(first_val) as first_val
from (
select *
,row_number() over (partition by id order by seq) as rn
,iff(rn=1, val, null) as first_val
from data
)
group by 1
order by 1;
but this is almost more complex than the ANY_VALUE solution, but I feel the performance would be better, but if they have the same magnitude of performance, I would always choose readable to you and your team, over a smaller performance difference.
With the way you've written your case statement, it leads me to believe that there is only one row with order_season_rank = 1 when grouping by account_code and date.
If that is true, then you can use several of Snowflake's aggregate functions and you will get what you want. Rather than trying to get the first value, you could use min, max, any_value, mode (or really any aggregate function that will ignore nulls) to return the only non-null value in the aggregation.
first() this link suggests first is only supported by MS ACCESS however you've tagged the question with MYSQL, Snowflake. Could you confirm the DBMS's you are using?
by moving the first_value() function outside the aggregation it seems to work fine
I have two tables that I am trying to LEFT join but I am not getting the expected results.
Rooms have multiple Children on different days, however Children are only counted in a Room after they have started and if they have hours allocated on that day. The output I am trying to achieve is this.
Room | MaxNum | Mon(Week1) | Tue(Week1) | Mon(Week2) | Tue(Week2)
Blue | 5 | 4 | 4 | 3 | 2
Green | 10 | 10 | 10 | 9 | 9
Red | 15 | 15 | 15 | 15 | 15
Here is the schema and some data...
create table Rooms(
id INT,
RoomName VARCHAR(10),
MaxNum INT
);
create table Children (
id INT,
RoomID INT,
MonHrs INT,
TueHrs INT,
StartDate DATE
);
INSERT INTO Rooms VALUES (1, 'Blue', 5);
INSERT INTO Rooms VALUES (2, 'Green', 10);
INSERT INTO Rooms VALUES (3, 'Red', 15);
INSERT INTO Children VALUES (1, 1, 5, 0, '2018-12-02');
INSERT INTO Children VALUES (2, 1, 0, 5, '2018-12-02');
INSERT INTO Children VALUES (3, 1, 5, 5, '2018-12-09');
INSERT INTO Children VALUES (4, 1, 0, 5, '2018-12-09');
INSERT INTO Children VALUES (5, 2, 5, 0, '2018-12-09');
INSERT INTO Children VALUES (6, 2, 0, 5, '2018-12-09');
The SQL I am having trouble with is this. It may not be the correct approach.
SELECT R.RoomName, R.MaxNum,
R.MaxNum - SUM(CASE WHEN C1.MonHrs > 0 THEN 1 ELSE 0 END) AS Mon1,
R.MaxNum - SUM(CASE WHEN C1.TueHrs > 0 THEN 1 ELSE 0 END) AS Tue1,
R.MaxNum - SUM(CASE WHEN C2.MonHrs > 0 THEN 1 ELSE 0 END) AS Mon2,
R.MaxNum - SUM(CASE WHEN C2.TueHrs > 0 THEN 1 ELSE 0 END) AS Tue2
FROM Rooms R
LEFT JOIN Children C1
ON R.id = C1.RoomID
AND C1.StartDate <= '2018-12-02'
LEFT JOIN Children C2
ON R.id = C2.RoomID
AND C2.StartDate <= '2018-12-09'
GROUP BY R.RoomName;
There is a double up happening on the Rows in the LEFT JOINs that is throwing the counts way off and I don't know how to prevent them. You can see the effect if you replace the SELECT with *
Any suggestions would help a lot.
This sort of problem usually surfaces from doing an aggregation in a too broad point in the query, which then results in duplicate counting of records. Try aggregating the Children table in a separate subquery:
SELECT
R.RoomName,
R.MaxNum,
R.MaxNum - C.Mon1 AS Mon1,
R.MaxNum - C.Tue1 AS Tue1,
R.MaxNum - C.Mon2 AS Mon2,
R.MaxNum - C.Tue2 AS Tue2
FROM Rooms R
LEFT JOIN
(
SELECT
RoomID,
COUNT(CASE WHEN MonHrs > 0 AND StartDate <= '2018-12-02'
THEN 1 END) AS Mon1,
COUNT(CASE WHEN TueHrs > 0 AND StartDate <= '2018-12-02'
THEN 1 END) AS Tue1,
COUNT(CASE WHEN MonHrs > 0 AND StartDate <= '2018-12-09'
THEN 1 END) AS Mon2,
COUNT(CASE WHEN TueHrs > 0 AND StartDate <= '2018-12-09'
THEN 1 END) AS Tue2
FROM Children
GROUP BY RoomID
) C
ON R.id = C.RoomID;
Note that we can avoid the double left join in your original query by instead using conditional aggregation on the start date.
Late edit: You probably don't even need a subquery at all, q.v. the answer by #Salman. But either of our answers should resolve the double counting problem.
You need to use one LEFT JOIN and move the date filter from JOIN condition to the aggregate:
SELECT R.id, R.RoomName, R.MaxNum
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-02' AND C.MonHrs > 0 THEN 1 END) AS Mon1
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-02' AND C.TueHrs > 0 THEN 1 END) AS Tue1
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-09' AND C.MonHrs > 0 THEN 1 END) AS Mon2
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-09' AND C.TueHrs > 0 THEN 1 END) AS Tue2
FROM Rooms R
LEFT JOIN Children C ON R.id = C.RoomID
GROUP BY R.id, R.RoomName, R.MaxNum
I have this table in mysql:
| player1 | player2 | date | fs_1 | fs_2 |
Jack Tom 2015-03-02 10 2
Mark Riddley 2015-05-02 3 1
...
I need to know how many aces (fs_1) player 1 have done BEFORE the match reported in date_g (10 days before for example).
This is what i tried without success:
OPTION 1
SELECT
players_atp.name_p AS 'PLAYER 1',
P.name_p AS 'PLAYER 2',
DATE(date_g) AS 'DATE',
result_g AS 'RESULT',
FS_1,
FS_2,
SUM(IF(date_sub(date_g, interval 10 day)< date_g, FS_1, 0)) AS 'last 10 days'
FROM
stat_atp stat_atp
JOIN
backup3.players_atp ON ID1 = id_P
JOIN
backup3.players_atp P ON P.id_p = id2
JOIN
backup3.games_atp ON id1_g = id1 AND id2_g = id2
AND id_t_g = id_t
AND id_r_g = id_r
WHERE
date_g > '2015-01-01'
GROUP BY ID1;
OPTION 2
SELECT
players_atp.name_p AS 'PLAYER 1',
P.name_p AS 'PLAYER 2',
DATE(date_g) AS 'DATE',
result_g AS 'RESULT',
FS_1,
FS_2,
SUM(CASE WHEN date_g between date_g and date_sub(date_g, interval 10 day) then fs_1 else 0 end) AS 'last 10 days'
FROM
stat_atp stat_atp
JOIN
backup3.players_atp ON ID1 = id_P
JOIN
backup3.players_atp P ON P.id_p = id2
JOIN
backup3.games_atp ON id1_g = id1 AND id2_g = id2
AND id_t_g = id_t
AND id_r_g = id_r
WHERE
date_g > '2015-01-01'
GROUP BY ID1;
I have edited the code, now is more easy to read and understand.
SELECT
id1 AS 'PLAYER 1',
id2 AS 'PLAYER 2',
DATE(date_g) AS 'DATE',
result_g AS 'RESULT',
FS_1,
FS_2,
SUM(CASE
WHEN date_g BETWEEN date_g AND DATE_SUB(date_g, INTERVAL 10 DAY) THEN fs_1
END) AS 'last 20 days' FROM
stat_atp stat_atp
JOIN
backup3.games_atp ON id1_g = id1 AND id2_g = id2
AND id_t_g = id_t
AND id_r_g = id_r GROUP BY ID1;
Thanx in advance.
Maybe this could help you:
SELECT
id1,
SUM(fs_1)
FROM
stat_atp
WHERE
date_g <= DATE_SUB('2015-03-02', INTERVAL 1 DAY) AND date_g >= DATE_SUB('2015-03-02', INTERVAL 10 DAY)
AND
id1='Jack'
GROUP BY id1;
Remember that RDBMS are used to construct rigorous data sets that are linked between each others by clear ids (keys when talking about SQL). It's easier to respect the three first normal forms. That's why you should use keys to identify your match itself. By this way you could use subqueries (subsets) to achieve your goal.
Then, keep in mind that SQL is STRUCTURED. It's its force and weakness cause you won't be able to use it as a Turing complete programming langage with loops and conditions. But in any situation you will be able to find the same structure for a query. However you can interact with a SQL query result with another langage and use loops and condition on the result set itself. That's up to you.
Anyway, you may want to read about the MySQL GROUP BY clause which is different from the ISO SQL form : https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
I need to calculate the average of occurrences in a dataset for a given value in a column. I made an easy example but in my current database contains around 2 inner joins to reduce it to 100k records. I need to perform the following select distinct statement for 10 columns.
My current design forces an inner join for each column. Another constraint is that I need to perform it at least 50-100 rows for each name in this example.
I need to figure out an efficient way to calculate this values without using too many resources while making the query fast.
http://sqlfiddle.com/#!9/c2378/3
My expected Result is:
Name | R Avg dir | L Avg dir 1 | L Avg dir 2 | L Avg dir 3
A 0 .5 .25 .25
Create table query:
CREATE TABLE demo
(`id` int, `name` varchar(10),`hand` varchar(1), `dir` int)
;
INSERT INTO demo
(`id`, `name`, `hand`, `dir`)
VALUES
(1, 'A', 'L', 1),
(2, 'A', 'L', 1),
(3, 'A', 'L', 2),
(4, 'A', 'L', 3),
(5, 'A', 'R', 3),
(6, 'A', 'R', 3)
;
Example Query:
SELECT distinct name,
COALESCE(( (Select count(id) as 'cd' from demo where hand = 'L' AND dir = 1) /(Select count(id) as 'fd' from demo where hand = 'L')),0) as 'L AVG dir'
FROM
demo
where hand = 'L' AND dir = 1 AND name = 'A'
One option is to use conditional aggregation:
SELECT name,
count(case when hand = 'L' and dir = 1 then 1 end) /
count(case when hand = 'L' then 1 end) L1Avg,
count(case when hand = 'L' and dir = 2 then 1 end) /
count(case when hand = 'L' then 1 end) L2Avg,
count(case when hand = 'L' and dir = 3 then 1 end) /
count(case when hand = 'L' then 1 end) L3Avg,
count(case when hand = 'R' and dir = 3 then 1 end) /
count(case when hand = 'R' then 1 end) RAvg
FROM demo
WHERE name = 'A'
GROUP BY name
Updated Fiddle Demo
Please note, I wasn't 100% sure why you wanted your RAvg to be 0 -- I assumed you meant 100%. If not, you can adjust the above accordingly.
I have a table
id, location, status
1, london, 1
2, london, 0
3, boston, 1
4, boston, 1
I'd like my query to generate something like this: -
location, status_1, status_0
london, 1, 1
boston, 2, 0
so far I have: -
select count(id) as freq, location from my_table
group by location order by freq desc;
I'm completely lost as to where to go from here.
That sort of transformation is better done in whatever client is issuing the query, but if you have to do it in the database, then
select location,
sum(status = 1) AS status_1,
sum(status = 0) AS status_0
from my_table
group by location
it's a bit hackish, but the 'status = 1' when status really is 1 will evaluate to a boolean true, which MySQL will politely cast to an integer '1', which then gets summed up. Same goes for when status is 0 and status = 0 evaluates to true.
So you want to count the records for each status per city?
In the query below, I group by location (like your did), and then add a sum for each state. Inside the sum is a case, that either returns 1 if the record matches the desired state, or 0 otherwise. That way, you effectively count the number of records for each state per location.
select
a.location,
sum(case when a.status = 1 then 1 else 0 end) as status_1,
sum(case when a.status = 0 then 1 else 0 end) as status_0
from
YourTable a
group by
a.location
select location,
sum(case when status = 1 then 1 else 0 end) as status_1,
sum(case when status = 0 then 1 else 0 end) as status_0,
from my_table
group by location;