mysql crosstab wrong sum total - mysql

I have a problem sum total in mysql crosstab.
my coding as the following:
SELECT IFNULL(Prtype,''Total'') as Prtype,sum(t.data) AS Total,',
SUM(IF(office ='A',`data`, NULL)) AS 'A',
SUM(IF(office ='B',`data`, NULL)) AS 'B',
SUM(IF(office ='C',`data`, NULL)) AS 'C',
FROM((SELECT Prtype, office,`data` as data
FROM TBLGETDATAALL_1 GROUP BY office,Prtype,data) t) GROUP BY Prtype
The problem is total not equal sum of all office.
Simple data:
Type Total A B C
P1 3 2 1 1
P2 6 2 2 1
P3 6 3 1 1
Simple data 2:
Total: 50,455
(1,333 1,352 1,216 2,127 1,520 2,700 1,174 1,250 2,458 1,374 2,877 970 2,458 2,930 1,365 2,655 1,184 3,001 2,421 2,689 2,220 1,590 2,678 2,212 1,329)=49083
why total=50,455 and sum each office=49083 ?
---------
table name
Prtype office data
p1 A 2
P2 B 3
P3 C 1
... ... .... ....
----------
Regards,

try this
select Prtype , A , B , C , sum( A +B +C) as total from (
SELECT IFNULL(Prtype,'Total') as Prtype ,
SUM(IF(office ='A',`data`, 0)) AS A,
SUM(IF(office ='B',`data`, 0)) AS B,
SUM(IF(office ='C',`data`, 0)) AS C
FROM TBLGETDATAALL_1
GROUP BY Prtype ) t
GROUP BY Prtype
DEMO HERE

Use 0 instead of null
SELECT IFNULL(Prtype,''Total'') as Prtype,sum(t.data) AS Total,',
SUM(IF(office ='A',`data`, 0)) AS 'A',
SUM(IF(office ='B',`data`, 0)) AS 'B',
SUM(IF(office ='C',`data`, 0)) AS 'C',
FROM((SELECT Prtype, office,`data` as data
FROM TBLGETDATAALL_1 GROUP BY office,Prtype,data) t) GROUP BY Prtype

Related

Pivot data in snowflake like pandas pivot

Lately, I have been trying to pivot a table in snowflake and replicate a transformation operation in snowflake which is presently being done in pandas like the following:
I have a dataframe like the below:
I have been able to convert this into the following format:
Using code below:
dd = pd.pivot(df[['customerid', 'filter_', 'sum', 'count', 'max']], index='customerid', columns='filter_')
dd = dd.set_axis(dd.columns.map('_'.join), axis=1, inplace=False).reset_index()
I have been trying to do this in snowflake but am unable to get the same format. Here's what I have tried:
with temp as (
SELECT $1 as customerid, $2 as perfiosid, $3 as filter_, $4 as sum_, $5 as count_, $6 as max_
FROM
VALUES ('a', 'b', 'c', 10, 100, 1000),
('a', 'b', 'c1', 9, 900, 9000),
('a', 'b', 'c2', 80, 800, 8000),
('x', 'b', 'c', 10, 100, 1000),
('x', 'b', 'c1', 9, 900, 9000),
('x', 'b', 'c2', 80, 800, 8000))
,
cte as (
select *, 'SUM_' as idx
from temp pivot ( max(sum_) for filter_ in ('c', 'c1', 'c2'))
union all
select *, 'COUNT_' as idx
from temp pivot ( max(count_) for filter_ in ('c', 'c1', 'c2'))
union all
select *, 'MAX_' as idx
from temp pivot ( max(max_) for filter_ in ('c', 'c1', 'c2'))
order by customerid, perfiosid
)
-- select * from cte;
select customerid, perfiosid, idx, max("'c'") as c, max("'c1'") as c1, max("'c2'") as c2
from cte
group by 1, 2, 3
order by 1, 2, 3
The output I get from this is:
Note: I have 3k fixed filters per customerid and 18 columns like sum, count, max, min, stddev, etc. So the final output must be 54k columns for each customerid. How can I achieve this while being within the limits of 1 MB statement execution of snowflake?
Using conditional aggregation:
with temp as (
SELECT $1 as customerid, $2 as perfiosid, $3 as filter_, $4 as sum_, $5 as count_, $6 as max_
FROM
VALUES ('a', 'b', 'c', 10, 100, 1000),
('a', 'b', 'c1', 9, 900, 9000),
('a', 'b', 'c2', 80, 800, 8000),
('x', 'b', 'c', 10, 100, 1000),
('x', 'b', 'c1', 9, 900, 9000),
('x', 'b', 'c2', 80, 800, 8000)
)
SELECT customerid,
SUM(CASE WHEN FILTER_ = 'c' THEN SUM_ END) AS SUM_C,
SUM(CASE WHEN FILTER_ = 'c1' THEN SUM_ END) AS SUM_C1,
SUM(CASE WHEN FILTER_ = 'c2' THEN SUM_ END) AS SUM_C2,
SUM(CASE WHEN FILTER_ = 'c' THEN COUNT_ END) AS COUNT_C,
SUM(CASE WHEN FILTER_ = 'c1' THEN COUNT_ END) AS COUNT_C1,
SUM(CASE WHEN FILTER_ = 'c2' THEN COUNT_ END) AS COUNT_C2,
MAX(CASE WHEN FILTER_ = 'c' THEN MAX_ END) AS MAX_C,
MAX(CASE WHEN FILTER_ = 'c1' THEN MAX_ END) AS MAX_C1,
MAX(CASE WHEN FILTER_ = 'c2' THEN MAX_ END) AS MAX_C2
FROM temp
GROUP BY customerid;
Output:
To match the 1MB query limit the output could be splitted and materialized in temporary table first like:
CREATE TEMPORARY TABLE t_SUM
AS
SELECT customer_id,
SUM(...)
FROM tab;
CREATE TEMPORARY TABLE t_COUNT
AS
SELECT customer_id,
SUM(...)
FROM tab;
CREATE TEMPORARY TABLE t_MAX
AS
SELECT customer_id,
SUM(...)
FROM tab;
Combined query:
SELECT *
FROM t_SUM AS s
JOIN t_COUNT AS c
ON s.customer_id = c.customer_id
JOIN t_MAX AS m
ON m.customer_id = c.customer_id
-- ...
you cannot ask 54k sets of three column3 in a query, because:
the 50,000th set looks like (if precomputed into tables like Lukasz suggests)
s.s_50000 as sum_50000,
c.c_50000 as count_50000,
m.m_50000 as max_50000,
is 75 bytes, and 54K * 75 = 4,050,000 so even asking for 54K columns (you are have 18K sets of 3 columns) would 1.3MB so too larger.
Which means you have to build your temp tables, as suggested by Lukasz, you would have to use:
select s.customer_id, s.*, c.*, m.*
from sums as s
join counts as c on s.customer_id = c.customer_id
join maxs c on m.customer_id = c.customer_id
but building those temp tables has 18K columns of
SUM(IFF(FILTER_='c18000',SUM_,null)) AS SUM_18000
is 50 bytes, thus 18K of those lines takes 90K, so that might work.
But you then have problems like this person with their 8K columns started having prbolems:
https://community.snowflake.com/s/question/0D50Z00007CZcqmSAD/what-is-limit-on-number-of-columns-how-to-do-a-sparse-table
which is to all say, this thing you are doing seems of very low value, what system is going to make sense of 50K+ columns of data that can not handling processing many rows. It just feels like a, Tool A we know how to do Z and not Y, so Tool B must produce answers in Z format verse the natural concepts of Y..

Insert new row for binary systems with SQL

I have the following table
(cl1 , cl2)
---- ----
(a , 1)
(a , 2)
(b , 2)
(c , 1)
(c , 2)
each a , b ,c can take two values (1 or 2 or both).
My question is :
How to insert a new row (with 0 on cl2) for all the cl1 that have only 1 or 2 and NOT the both in the example. I would like to insert the following row :
----
(b , 0)
----
I'm sure there are better ways, but here is one way to do it using group by and a having clause to enforce your rules (I'm assuming Oracle syntax):
insert into tbl (cl1, cl2)
(select cl1, 0
from tbl
group by cl1
having count(case when cl2 in (1, 2) then 'X' end) != 0 -- contains 1 or 2
and (count(case when cl2 = 1 then 'X' end) = 0 -- but not both
or count(case when cl2 = 2 then 'X' end) = 0)
)
EDIT
A much simpler way:
insert into tbl (cl1, cl2)
(select cl1, 0
from tbl
where cl2 in (1, 2)
group by cl1
having count(distinct cl2) = 1
)
I am assuming that the BD is Oracle. Hope the below snippet helps.
SELECT B.CL1,
0
FROM
(SELECT A.CL1,
CASE
WHEN WMSYS.WM_CONCAT(A.CL2) LIKE '%1%'
AND WMSYS.WM_CONCAT(A.CL2) LIKE '%2%'
THEN 'both'
ELSE 'one'
END rnk
FROM
(SELECT 'a' cl1,1 cl2 FROM dual
UNION ALL
SELECT 'a' cl1,2 cl2 FROM dual
UNION ALL
SELECT 'b' cl1,2 cl2 FROM dual
UNION ALL
SELECT 'c' cl1,1 cl2 FROM dual
UNION ALL
SELECT 'c' cl1,2 cl2 FROM dual
)A
GROUP BY A.CL1
)B
WHERE B.rnk = 'one';
CREATE TABLE TestTable (cl1 VARCHAR(2), cl2 INT);
INSERT INTO TestTable (cl1, cl2) VALUES ('a', 1), ('a', 2), ('b', 1), ('c', 1), ('c', 2);
INSERT INTO TestTable (cl1, cl2)
SELECT cl1, 0
FROM TestTable
WHERE cl1 NOT IN (
SELECT cl1
FROM TestTable
WHERE cl2 IN (1, 2)
GROUP BY cl1
HAVING COUNT(DISTINCT cl2) = 2
);
MySQL Demo: http://rextester.com/XWHGF50183
The below block returns the cl1 those have the cl2 is 1 and 2. Based on the result using NOT IN you can achieve the result.
SELECT cl1
FROM TestTable
WHERE cl2 IN (1, 2)
GROUP BY cl1
HAVING COUNT(DISTINCT cl2) = 2
Help from this answer
Here you go:
insert into [YOUR TABLE NAME]
select cl1,0 from [YOUR TABLE NAME]
group by cl1 having count(distinct cl2)<> 2
;

Condition on multiple values in the same column in SQL

Firstly, thanks in advance for helping. This will be my first question on SOF.
I have the following SQL database tables.
qualificationTable:
QualId studentNo CourseName Percentage
1 1 A 91
2 1 B 81
3 1 C 71
4 1 D 61
5 2 A 91
6 2 B 81
7 2 C 71
8 2 D 59
testTable:
TestId studentNo testNo Percentage dateTaken
1 1 1 91 2016-05-02
2 1 2 41 2015-05-02
3 1 3 71 2016-04-02
4 1 1 95 2014-05-02
5 1 2 83 2016-01-02
6 1 3 28 2015-05-02
7 2 1 90 2016-05-02
8 2 2 99 2016-05-02
9 2 3 87 2016-05-02
I have the minimum percentages specified for courses A, B, C and D individually. I need to search for students, meeting the minimum criteria for ALL the courses.
Part-2:
That student should also match the criteria (minimum percentages specified individually for the three tests- 1,2 and 3) in testTable.
In other words, if a student matches the minimum criteria (percentage) specified individually for all the courses, he should be selected. Now, same goes for the testTable, that particular student (who got selected in qualificationTable) should have the minimum criteria (percentage) specified individually for the three tests (1,2 and 3) in testNo column.
Edit:
I have updated the testTable, now there are multiple tests for a particular student. I need to check if the student meets the minimum required percentage specified for all the 3 tests, however, only the most recently taken test in each no (1,2 and 3) should count. If the student does not meet the minimum criteria specified for the most recent test, he should not be included.
Test Case:
Minimum qualification percentage required:
Course A: 90 Course B: 80 Course C: 70 Course D: 60
Minimum tests percentage required:
Test 1: 90 Test 2: 80 Test 3: 70
Expected Output
studentNo
1
Cheers
I've just figured it out for your sample data and Test Case:
Minimum qualification percentage required:
Course A: 90 Course B: 80 Course C: 70 Course D: 60
Minimum tests percentage required:
Test 1: 90 Test 2: 80 Test 3: 70
Try this, may help for you;)
SQL Fiddle
MySQL Schema:
CREATE TABLE qualificationTable
(`QualId` int, `studentNo` int, `CourseName` varchar(1), `Percentage` int)
;
INSERT INTO qualificationTable
(`QualId`, `studentNo`, `CourseName`, `Percentage`)
VALUES
(1, 1, 'A', 91),
(2, 1, 'B', 81),
(3, 1, 'C', 71),
(4, 1, 'D', 61),
(5, 2, 'A', 91),
(6, 2, 'B', 81),
(7, 2, 'C', 71),
(8, 2, 'D', 50)
;
CREATE TABLE testTable
(`TestId` int, `studentNo` int, `testNo` int, `Percentage` int)
;
INSERT INTO testTable
(`TestId`, `studentNo`, `testNo`, `Percentage`)
VALUES
(1, 1, 1, 91),
(2, 1, 2, 81),
(3, 1, 3, 71),
(4, 2, 1, 80),
(5, 2, 2, 99),
(6, 2, 3, 87)
;
Query 1:
select t1.studentNo
from
(
select studentNo from qualificationTable
where (CourseName = 'A' and Percentage >= 90)
or (CourseName = 'B' and Percentage >= 80)
or (CourseName = 'C' and Percentage >= 70)
or (CourseName = 'D' and Percentage >= 60)
group by studentNo
having count(1) = 4
) t1 join
( select studentNo from testTable
where (testNo = '1' and Percentage >= 90)
or (testNo = '2' and Percentage >= 80)
or (testNo = '3' and Percentage >= 70)
group by studentNo
having count(1) = 3
) t2 on t1.studentNo = t2.studentNo
I just pick t1 one of these two subquery to explain how it works:
GROUP BY can get us a result like this,
| studentNo |
|-----------|
| 1 |
| 2 |
COUNT will get us total count of each group, for your sample data, studentNo(1) is 4, studentNo(2) is 4 as well, but we also has where clause here, so by these criteria, we can find which matched are following record,
(1, 1, 'A', 91),
(2, 1, 'B', 81),
(3, 1, 'C', 71),
(4, 1, 'D', 61),
(5, 2, 'A', 91),
(6, 2, 'B', 81),
(7, 2, 'C', 71)
And this means COUNT will give us studentNo(1) to 4, studentNo(2) to 3, so when mysql run having count(1) = 4, this subquery only return us studentNo(1)
Subquery t2 works like that, and when join these two subquery by studentNo, it will return what you expected result.
Results:
| studentNo |
|-----------|
| 1 |
Edited:
select t1.studentNo
from
(
select studentNo from qualificationTable
where (CourseName = 'A' and Percentage >= 90)
or (CourseName = 'B' and Percentage >= 80)
or (CourseName = 'C' and Percentage >= 70)
or (CourseName = 'D' and Percentage >= 60)
group by studentNo
having count(1) = 4
) t1 join
( select studentNo
from (
select *
from testTable
where (testNo, dateTaken) in (
select testNo, Max(dateTaken) from testTable group by testNo
)
) tmp
where (testNo = '1' and Percentage >= 90)
or (testNo = '2' and Percentage >= 80)
or (testNo = '3' and Percentage >= 70)
group by studentNo
having count(1) = 3
) t2 on t1.studentNo = t2.studentNo
First find students who does not qualify the minimum percentage.
select distinct studentNo
from stdqualificationmaster
where case when CourseName='A' and Percentage<90 then 'F'
when CourseName='B' and Percentage<80 then 'F'
when CourseName='C' and Percentage<70 then 'F'
when CourseName='D' and Percentage<60 then 'F'
end='F'
As a second step we can use above unqualified students result set as filter for required result set.
select * from stdqualificationmaster where studentNo not in
( select distinct studentNo
from stdqualificationmaster
where case when CourseName='A' and Percentage<90 then 'F'
when CourseName='B' and Percentage<80 then 'F'
when CourseName='C' and Percentage<70 then 'F'
when CourseName='D' and Percentage<60 then 'F'
end='F')

Select change column value if in list

I am trying to query my table to count the number of votes and if the voting method is in list ['C', 'M', 'S', 'L', 'T', 'V', 'B', 'E'] then count it as one and replace the voting_method to 'L'.
Right now I have the following query which returns the right results but doesn't take care of the duplicates.
select `election_lbl`, `voting_method`, count(*) as numVotes
from `gen2014` group by `election_lbl`, `voting_method` order by `election_lbl` asc
election_lbl voting_method numVotes
2014-09-04 M 1
2014-09-05 M 2
2014-09-05 S 1
2014-09-08 C 16
2014-09-08 M 5
2014-09-08 S 9
2014-09-09 10 5
2014-09-09 C 46
2014-09-09 M 4
2014-09-09 S 5
2014-09-10 C 92
2014-0g-10 M 3
2014-09-10 S 7
2014-09-11 C 96
2014-09-11 M 3
2014-09-11 S 2
2014-09-12 C 104
2014-09-12 M 10
2014-09-12 S 3
2014-09-15 C 243
2014-09-15 M 18
2014-09-15 S 3
2014-09-16 10 1
2014-09-16 C 161
2014-09-16 M 4
2014-09-16 S 3
2014-09-17 C 157
2014-09-17 M 5
2014-09-17 S 12
You can see that for 2014-09-05 I have two voting_method M and S both of which is in the list. I want the ideal result to remove the duplicate date field if the values are in the list. So it would be 2014-09-05 'L' 3. I don't want the vote for that date to disappear so the results should count them as one.
Changed the query to this but mysql says wrong syntax.
select `election_lbl`, `voting_method`, count(*) as numVotes from `gen2014`
(case `voting_method` when in ('C', 'M', 'S', 'L', 'T', 'V', 'B', 'E')
then 'L' END) group by `election_lbl`, `voting_method` order by `election_lbl` asc
Table Schema
CREATE TABLE `gen2014` (
`voting_method` varchar(255) DEFAULT NULL,
`election_lbl` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SELECT election_lbl
, CASE WHEN voting_method IN ('C','M','S','L','T','V','B','E')
THEN 'L'
ELSE voting_method END my_voting_method
, COUNT(*)
FROM my_table
GROUP
BY my_voting_method -- or vice
, election_lbl; -- versa
If you just want the total votes using those methods for each date, listed as method 'L', then do not include method in the group by, and have the SELECT select 'L' as voting_method
select `election_lbl`, 'L' AS `voting_method`, count(*) as numVotes
from `gen2014`
where voting_method IN ('C', 'M', 'S', 'L', 'T', 'V', 'B', 'E')
group by `election_lbl`
order by `election_lbl` asc
select x.`election_lbl`, x.`voting_method`, count(*) as numVotes
from (
select `election_lbl`,
CASE when `voting_method` in ('C', 'M', 'S', 'L', 'T', 'V', 'B', 'E')
then 'L'
else `voting_method`
END as `voting_method`
from `gen2014`) x
group by x.`election_lbl`, x.`voting_method`
order by x.`election_lbl` asc

MySQL Count frequency of records

Table:
laterecords
-----------
studentid - varchar
latetime - datetime
reason - varchar
students
--------
studentid - varchar -- Primary
class - varchar
I would like to do a query to show the following:
Sample Report
Class No of Students late 1 times 2 times 3 times 4 times 5 & more
Class A 3 1 0 2 0 0
Class B 1 0 1 0 0 0
My query below can show the first column results:
SELECT count(Distinct studentid), class FROM laterecords, students
WHERE students.studenid=laterecords.studentid AND
GROUP BY class
I can only think of getting the results for each column and store them into php arrays. Then echo them to table in HTML.
Is there any better SQL way to do the above? How to do up the mysql query ?
Try this:
SELECT
a.class,
COUNT(b.studentid) AS 'No of Students late',
SUM(b.onetime) AS '1 times',
SUM(b.twotime) AS '2 times',
SUM(b.threetime) AS '3 times',
SUM(b.fourtime) AS '4 times',
SUM(b.fiveormore) AS '5 & more'
FROM
students a
LEFT JOIN
(
SELECT
aa.studentid,
IF(COUNT(*) = 1, 1, 0) AS onetime,
IF(COUNT(*) = 2, 1, 0) AS twotime,
IF(COUNT(*) = 3, 1, 0) AS threetime,
IF(COUNT(*) = 4, 1, 0) AS fourtime,
IF(COUNT(*) >= 5, 1, 0) AS fiveormore
FROM
students aa
INNER JOIN
laterecords bb ON aa.studentid = bb.studentid
GROUP BY
aa.studentid
) b ON a.studentid = b.studentid
GROUP BY
a.class
How about :
SELECT numlates, `class`, count(numlates)
FROM
(SELECT count(laterecords.studentid) AS numlates, `class`, laterecords.studentid
FROM laterecords,
students
WHERE students.studentid=laterecords.studentid
GROUP BY laterecords.studentid, `class`) aliastbl
GROUP BY `class`, numlates