I need to calculate the average of occurrences in a dataset for a given value in a column. I made an easy example but in my current database contains around 2 inner joins to reduce it to 100k records. I need to perform the following select distinct statement for 10 columns.
My current design forces an inner join for each column. Another constraint is that I need to perform it at least 50-100 rows for each name in this example.
I need to figure out an efficient way to calculate this values without using too many resources while making the query fast.
http://sqlfiddle.com/#!9/c2378/3
My expected Result is:
Name | R Avg dir | L Avg dir 1 | L Avg dir 2 | L Avg dir 3
A 0 .5 .25 .25
Create table query:
CREATE TABLE demo
(`id` int, `name` varchar(10),`hand` varchar(1), `dir` int)
;
INSERT INTO demo
(`id`, `name`, `hand`, `dir`)
VALUES
(1, 'A', 'L', 1),
(2, 'A', 'L', 1),
(3, 'A', 'L', 2),
(4, 'A', 'L', 3),
(5, 'A', 'R', 3),
(6, 'A', 'R', 3)
;
Example Query:
SELECT distinct name,
COALESCE(( (Select count(id) as 'cd' from demo where hand = 'L' AND dir = 1) /(Select count(id) as 'fd' from demo where hand = 'L')),0) as 'L AVG dir'
FROM
demo
where hand = 'L' AND dir = 1 AND name = 'A'
One option is to use conditional aggregation:
SELECT name,
count(case when hand = 'L' and dir = 1 then 1 end) /
count(case when hand = 'L' then 1 end) L1Avg,
count(case when hand = 'L' and dir = 2 then 1 end) /
count(case when hand = 'L' then 1 end) L2Avg,
count(case when hand = 'L' and dir = 3 then 1 end) /
count(case when hand = 'L' then 1 end) L3Avg,
count(case when hand = 'R' and dir = 3 then 1 end) /
count(case when hand = 'R' then 1 end) RAvg
FROM demo
WHERE name = 'A'
GROUP BY name
Updated Fiddle Demo
Please note, I wasn't 100% sure why you wanted your RAvg to be 0 -- I assumed you meant 100%. If not, you can adjust the above accordingly.
Related
I have two tables that I am trying to LEFT join but I am not getting the expected results.
Rooms have multiple Children on different days, however Children are only counted in a Room after they have started and if they have hours allocated on that day. The output I am trying to achieve is this.
Room | MaxNum | Mon(Week1) | Tue(Week1) | Mon(Week2) | Tue(Week2)
Blue | 5 | 4 | 4 | 3 | 2
Green | 10 | 10 | 10 | 9 | 9
Red | 15 | 15 | 15 | 15 | 15
Here is the schema and some data...
create table Rooms(
id INT,
RoomName VARCHAR(10),
MaxNum INT
);
create table Children (
id INT,
RoomID INT,
MonHrs INT,
TueHrs INT,
StartDate DATE
);
INSERT INTO Rooms VALUES (1, 'Blue', 5);
INSERT INTO Rooms VALUES (2, 'Green', 10);
INSERT INTO Rooms VALUES (3, 'Red', 15);
INSERT INTO Children VALUES (1, 1, 5, 0, '2018-12-02');
INSERT INTO Children VALUES (2, 1, 0, 5, '2018-12-02');
INSERT INTO Children VALUES (3, 1, 5, 5, '2018-12-09');
INSERT INTO Children VALUES (4, 1, 0, 5, '2018-12-09');
INSERT INTO Children VALUES (5, 2, 5, 0, '2018-12-09');
INSERT INTO Children VALUES (6, 2, 0, 5, '2018-12-09');
The SQL I am having trouble with is this. It may not be the correct approach.
SELECT R.RoomName, R.MaxNum,
R.MaxNum - SUM(CASE WHEN C1.MonHrs > 0 THEN 1 ELSE 0 END) AS Mon1,
R.MaxNum - SUM(CASE WHEN C1.TueHrs > 0 THEN 1 ELSE 0 END) AS Tue1,
R.MaxNum - SUM(CASE WHEN C2.MonHrs > 0 THEN 1 ELSE 0 END) AS Mon2,
R.MaxNum - SUM(CASE WHEN C2.TueHrs > 0 THEN 1 ELSE 0 END) AS Tue2
FROM Rooms R
LEFT JOIN Children C1
ON R.id = C1.RoomID
AND C1.StartDate <= '2018-12-02'
LEFT JOIN Children C2
ON R.id = C2.RoomID
AND C2.StartDate <= '2018-12-09'
GROUP BY R.RoomName;
There is a double up happening on the Rows in the LEFT JOINs that is throwing the counts way off and I don't know how to prevent them. You can see the effect if you replace the SELECT with *
Any suggestions would help a lot.
This sort of problem usually surfaces from doing an aggregation in a too broad point in the query, which then results in duplicate counting of records. Try aggregating the Children table in a separate subquery:
SELECT
R.RoomName,
R.MaxNum,
R.MaxNum - C.Mon1 AS Mon1,
R.MaxNum - C.Tue1 AS Tue1,
R.MaxNum - C.Mon2 AS Mon2,
R.MaxNum - C.Tue2 AS Tue2
FROM Rooms R
LEFT JOIN
(
SELECT
RoomID,
COUNT(CASE WHEN MonHrs > 0 AND StartDate <= '2018-12-02'
THEN 1 END) AS Mon1,
COUNT(CASE WHEN TueHrs > 0 AND StartDate <= '2018-12-02'
THEN 1 END) AS Tue1,
COUNT(CASE WHEN MonHrs > 0 AND StartDate <= '2018-12-09'
THEN 1 END) AS Mon2,
COUNT(CASE WHEN TueHrs > 0 AND StartDate <= '2018-12-09'
THEN 1 END) AS Tue2
FROM Children
GROUP BY RoomID
) C
ON R.id = C.RoomID;
Note that we can avoid the double left join in your original query by instead using conditional aggregation on the start date.
Late edit: You probably don't even need a subquery at all, q.v. the answer by #Salman. But either of our answers should resolve the double counting problem.
You need to use one LEFT JOIN and move the date filter from JOIN condition to the aggregate:
SELECT R.id, R.RoomName, R.MaxNum
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-02' AND C.MonHrs > 0 THEN 1 END) AS Mon1
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-02' AND C.TueHrs > 0 THEN 1 END) AS Tue1
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-09' AND C.MonHrs > 0 THEN 1 END) AS Mon2
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-09' AND C.TueHrs > 0 THEN 1 END) AS Tue2
FROM Rooms R
LEFT JOIN Children C ON R.id = C.RoomID
GROUP BY R.id, R.RoomName, R.MaxNum
I am looking at a case in which we have a number of tanks filled with liquid. The amount of liquid is measured and information is stored in a database. This update is done every 5 minutes. Here the following information is stored:
tankId
FillLevel
TimeStamp
Each tank is categorized in one of the following 'fill-level' ranges:
Range A: 0 - 40%
Range B: 40 - 75%
Range C: 75 - 100%
Per range I count the amount of events per tankId.
SELECT sum(
CASE
WHEN filllevel>=0 and filllevel<40
THEN 1
ELSE 0
END) AS 'Range A',
sum(
CASE
WHEN filllevel>=40 and filllevel<=79
THEN 1
ELSE 0
END) AS 'Range B',
sum(
CASE
WHEN filllevel>79 and filllevel<=100
THEN 1
ELSE 0
END) AS 'Range C'
FROM TEST ;
The challenge is to ONLY count the latest record for each tank. So for each tankId there is only one count (and that must be the record with the latest time stamp).
For the following data:
insert into tank_db1.`TEST` (ts, tankId, fill_level) values
('2017-08-11 03:31:18', 'tank1', 10),
('2017-08-11 03:41:18', 'tank1', 45),
('2017-08-11 03:51:18', 'tank1', 95),
('2017-08-11 03:31:18', 'tank2', 20),
('2017-08-11 03:41:18', 'tank2', 30),
('2017-08-11 03:51:18', 'tank2', 80),
('2017-08-11 03:31:18', 'tank3', 30),
('2017-08-11 03:41:18', 'tank3', 45),
('2017-08-11 03:51:18', 'tank4', 55);
I would expect the outcome to be (only the records with the latest timestamp per tankId are counted):
- RANGE A: 0
- RANGE B: 1 (tankdId 3)
- RANGE C: 2 (tankId 1 and tankId2)
Probably easy if you are an expert, but for me it is real hard to see what the options are.
Thanks
You can use the following query to get the latest per group timestamp value:
select tankId, max(ts) as max_ts
from test
group by tankId;
Output:
tankId max_ts
--------------------------------
1 tank1 11.08.2017 03:51:18
2 tank2 11.08.2017 03:51:18
3 tank3 11.08.2017 03:41:18
4 tank4 11.08.2017 03:51:18
Using the above query as a derived table you can extract the latest per group fill_level value. This way you can apply the logic that computes each range level:
select sum(
CASE
WHEN t1.fill_level>=0 and t1.fill_level<40
THEN 1
ELSE 0
END) AS 'Range A',
sum(
CASE
WHEN t1.fill_level>=40 and t1.fill_level<=79
THEN 1
ELSE 0
END) AS 'Range B',
sum(
CASE
WHEN t1.fill_level>79 and t1.fill_level<=100
THEN 1
ELSE 0
END) AS 'Range C'
from test as t1
join (
select tankId, max(ts) as max_ts
from test
group by tankId
) as t2 on t1.tankId = t2.tankId and t1.ts = t2.max_ts
Output:
Range A Range B Range C
---------------------------
1 0 2 2
Demo here
I get a different result (oh, well, same result as GB):
SELECT GROUP_CONCAT(CASE WHEN fill_level < 40 THEN x.tankid END) range_a
, GROUP_CONCAT(CASE WHEN fill_level BETWEEN 40 AND 75 THEN x.tankid END) range_b
, GROUP_CONCAT(CASE WHEN fill_level > 75 THEN x.tankid END) range_c
FROM test x
JOIN (SELECT tankid,MAX(ts) ts FROM test GROUP BY tankid) y
ON y.tankid = x.tankid AND y.ts = x.ts;
+---------+-------------+-------------+
| range_a | range_b | range_c |
+---------+-------------+-------------+
| NULL | tank3,tank4 | tank1,tank2 |
+---------+-------------+-------------+
EDIT:
If I was solving this problem, and wanted to include the tank names in the result, then I'd probably execute the following...
SELECT x.*
FROM test x
JOIN
( SELECT tankid,MAX(ts) ts FROM test GROUP BY tankid) y
ON y.tankid = x.tankid
AND y.ts = x.ts
...and handle all the other problems, concerning counts, ranges, and missing/'0' values in application code.
DISCLAIMER : I Know this has been asked numerous times, but all I want is an alternative.
The table is as below :
create table
Account
(Name varchar(20),
TType varchar(5),
Amount int);
insert into Account Values
('abc' ,'c', 500),
('abc', 'c', 700),
('abc', 'd', 100),
('abc', 'd', 200),
('ab' ,'c', 300),
('ab', 'c', 700),
('ab', 'd', 200),
('ab', 'd', 200);
Expected result is simple:
Name Balance
------ -----------
ab 600
abc 900
The query that worked is :
select Name, sum(case TType when 'c' then Amount
when 'd' then Amount * -1 end) as balance
from Account a1
group by Name.
All I want is, is there any query sans the 'case' statement (like subquery or self join ) for the same result?
Sure. You can use a second query with a where clause and a union all:
select name
, sum(Amount) balance
from Account a1
where TType when 'c'
group
by Name
union
all
select name
, sum(Amount * -1) balance
from Account a1
where TType when 'd'
group
by Name
Or this, using a join with an inline view:
select name
, sum(Amount * o.mult) balance
from Account a1
join ( select 'c' cd
, 1 mult
from dual
union all
select 'd'
, -1
from dual
) o
on o.cd = a1.TType
group
by Name
To be honest, I would suggest to use case...
Use the ASCII code of the char and try to go from there. It is 100 for 'd' and 99 for 'c'. Untested example:
select Name, sum((ASCII(TType) - 100) * Amount * (-1)) + sum((ASCII(TType) - 99) * Amount * (-1)))) as balance from Account a1 group by Name.
I would not recommend using this method but it is a way of achieving what you want.
select t.Name, sum(t.cr) - sum(t.dr) as balance from (select Name, case TType when 'c' then sum(Amount) else 0 end as cr, case TType when 'd' then sum(Amount) else 0 end as dr from Account group by Name, TType) t group by t.Name;
This will surely help you!!
The following worked for me on Microsoft SQL server. It has the Brought Forward balance as well
WITH tempDebitCredit AS (
Select 0 As Details_ID, null As Creation_Date, null As Reference_ID, 'Brought
Forward' As Transaction_Kind, null As Amount_Debit, null As Amount_Credit,
isNull(Sum(Amount_Debit - Amount_Credit), 0) 'diff'
From _YourTable_Name
where Account_ID = #Account_ID
And Creation_Date < #Query_Start_Date
Union All
SELECT a.Details_ID, a.Creation_Date, a.Reference_ID, a.Transaction_Kind,
a.Amount_Debit, a.Amount_Credit, a.Amount_Debit - a.Amount_Credit 'diff'
FROM _YourTable_Name a
where Account_ID = #Account_ID
And Creation_Date >= #Query_Start_Date And Creation_Date <= #Query_End_Date
)
SELECT a.Details_ID, a.Creation_Date, a.Reference_ID, a.Transaction_Kind,
a.Amount_Debit, a.Amount_Credit, SUM(b.diff) 'Balance'
FROM tempDebitCredit a, tempDebitCredit b
WHERE b.Details_ID <= a.Details_ID
GROUP BY a.Details_ID, a.Creation_Date, a.Reference_ID, a.Transaction_Kind,
a.Amount_Debit, a.Amount_Credit
Order By a.Details_ID Desc
I want to be able to select the amount of times the data in columns Somedata_A and Somedata_B has changed from the from the previous row within its column. I've tried using DISTINCT and it works to some degree. {1,2,3,2,1,1} will show 3 when I want it to show 4 course there's 5 different values in sequence.
Example:
A,B,C,D,E,F
{1,2,3,2,1,1}
A compare to B gives a difference, B compare to C gives a difference . . . E compare to F gives not difference. All in all it gives 4 differences within a set of 6 values.
I have gotten DISTINCT to work but it does not really do the trick for me. And to add more to the question I'm really not interested it the whole range, lets say just the 2 last days/entries per Title.
Second I'm concern about performance issues. I tried the query below on a real set of data and it got interrupted probably due to timeout.
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE testdata(
Title varchar(10),
Date varchar(10),
Somedata_A int(5),
Somedata_B int(5));
INSERT INTO testdata (Title, Date, Somedata_A, Somedata_B) VALUES
("Alpha", '123', 1, 2),
("Alpha", '234', 2, 2),
("Alpha", '345', 1, 2),
("Alpha", '349', 1, 2),
("Alpha", '456', 1, 2),
("Omega", '123', 1, 1),
("Omega", '234', 2, 2),
("Omega", '345', 3, 3),
("Omega", '349', 4, 3),
("Omega", '456', 5, 4),
("Delta", '123', 1, 1),
("Delta", '234', 2, 2),
("Delta", '345', 1, 3),
("Delta", '349', 2, 3),
("Delta", '456', 1, 4);
Query 1:
SELECT t.Title, (SELECT COUNT(DISTINCT Somedata_A) FROM testdata AS tt WHERE t.Title = tt.Title) AS A,
(SELECT COUNT(DISTINCT Somedata_B) FROM testdata AS tt WHERE t.Title = tt.Title) AS B
FROM testdata AS t
GROUP BY t.Title
Results:
| TITLE | A | B |
|-------|---|---|
| Alpha | 2 | 1 |
| Delta | 2 | 4 |
| Omega | 5 | 4 |
Something like this may work: it uses a variable for row number, joins on an offset of 1 and then counts differences for A and B.
http://sqlfiddle.com/#!2/3bbc8/9/2
set #i = 0;
set #j = 0;
Select
A.Title aTitle,
sum(Case when A.SomeData_A <> B.SomeData_A then 1 else 0 end) AVar,
sum(Case when A.SomeData_B <> B.SomeData_B then 1 else 0 end) BVar
from
(SELECT Title, #i:=#i+1 as ROWID, SomeData_A, SomeData_B
FROM testdata
ORDER BY Title, date desc) as A
INNER JOIN
(SELECT Title, #j:=#j+1 as ROWID, SomeData_A, SomeData_B
FROM testdata
ORDER BY Title, date desc) as B
ON A.RowID= B.RowID + 1
AND A.Title=B.Title
Group by A.Title
This works (see here) (FYI: Your results in the question do not match your data - for instance, for Alpha, ColumnA: it never changes from 1. The answer should be 0)
Hopefully you can adapt this Statement to your actual data model
SELECT t1.title, SUM(t1.Somedata_A<>t2.Somedata_a) as SomeData_A
,SUM(t1.Somedata_b<>t2.Somedata_b) as SomeData_B
FROM testdata AS t1
JOIN testdata AS t2
ON t1.title = t2.title
AND t2.date = DATE_ADD(t1.date, INTERVAL 1 DAY)
GROUP BY t1.title
ORDER BY t1.title;
I have a table like this:
ID Type
----------
1 sent
1 sent
1 open
1 bounce
1 click
2 sent
2 sent
2 open
2 open
2 click
I want a query to return results like this:
ID sent open bounce click
1 2 1 1 1
2 2 2 0 1
Just can't work out how to do it. Thanks.
try PIVOT
SELECT ID,[sent],[open],[bounce],[click]
FROM your_table
PIVOT (COUNT([Type])
FOR [Type] in ([sent],[open],[bounce],[click]))p
SQL Fiddle Demo
Select Id,
count(case When type='sent' then 1 else 0 end) as sent,
count(case when type='open' then 1 else 0 end) as open
From table
Group by Id
If that won't give you the exact answer then try count (distinct case....) :)
You can get such result by using PIVOT or GROUP BY, you can even get results if you have variable values in Type column:
Test data:
CREATE TABLE #t(ID INT, Type VARCHAR(100))
INSERT #t
VALUES
(1, 'sent'),
(1, 'sent'),
(1, 'open'),
(1, 'bounce'),
(1, 'click'),
(2, 'sent'),
(2, 'sent'),
(2, 'open'),
(2, 'open'),
(2, 'click')
PIVOT approach:
SELECT pvt.*
FROM #t
PIVOT
(
COUNT(Type) FOR Type IN ([sent], [open], [bounce], [click])
) pvt
If there are other possible values for Type and you don't know them in advance use dynamic PIVOT:
DECLARE #cols NVARCHAR(1000) = STUFF(
(
SELECT DISTINCT ',[' + Type + ']'
FROM #t
FOR XML PATH('')
), 1, 1, '')
DECLARE #query NVARCHAR(2000) =
'
SELECT pvt.*
FROM #t
PIVOT
(
COUNT(Type) FOR Type IN ('+#cols+')
) pvt
'
EXEC(#query)
If you have known fixed values for Type, you can also use:
SELECT ID,
COUNT(CASE WHEN Type = 'sent' THEN 1 END) [sent],
COUNT(CASE WHEN Type = 'open' THEN 1 END) [open],
COUNT(CASE WHEN Type = 'bounce' THEN 1 END) bounce,
COUNT(CASE WHEN Type = 'click' THEN 1 END) click
FROM #t
GROUP BY ID