I am trying to make a query, in that I take the average of the value for that group, and mark it in Average Column for that group. Now if for a group in input, has blanks, it should not calculate the average, and the output should be just left blank.
How should I do this so that even those blanks get handled?
I tried this:
select avg(value) over (partition by "Group") from table
AVG calculates the average of a set of numbers. So there can be no blanks (white space) in that column, but NULL.
AVG ignores all NULL values, which is not what you want it to do, because for a set of 2, 4, NULL, you want the result NULL and not (2 + 4) / 2 = 2.
But you can check whether there appears a NULL in the set or not. E.g.:
select
grp,
value,
case when count(value) over(partition by grp) = count(*) over (partition by grp) then
avg(value) over (partition by grp)
else
null
end as average_value
from mytable
order by grp;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e5783cd10c5d26e1798c2e1b1e022189
Related
I have a quick question in relation to windowing in MySQL
SELECT
Client,
User,
Date,
Flag,
lag(Date) over (partition by Client,User order by Date asc) as last_date,
lag(Flag) over (partition by Client,User order by Date asc) as last_flag,
case when Flag = 1 and last_flag = 1 then 1 else 0 end as consecutive
FROM db.tbl
This query returns something like the below. I am trying to work out the number of consecutive times that the Flag column was 1 for each user most recently, if they had 11110000111 then we should take the final three occurences of 1 to determine that they had a consecutive flag of 3 times.
I need to extract the start and end date for the consecutive flag.
How would I go about doing this, can anyone help me :)
If we use the example of 11110000111 then we should extract only 111 and therefore the 3 most recent dates for that customer. So in the below, we would need to take 10.01.2023 as the first date and 24.01.2023 as the last date. The consecutive count should be 3
Output:
Use aggregation and string functions:
WITH cte AS (
SELECT Client, User,
GROUP_CONCAT(CASE WHEN Flag THEN Date END ORDER BY Date) AS dates,
CHAR_LENGTH(SUBSTRING_INDEX(GROUP_CONCAT(Flag ORDER BY Date SEPARATOR ''), '0', '-1')) AS consecutive
FROM tablename
GROUP BY Client, User
)
SELECT Client, User,
NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(dates, ',', -consecutive), ',', 1), '') AS first_date,
CASE WHEN consecutive > 0 THEN SUBSTRING_INDEX(dates, ',', -1) END AS last_date,
consecutive
FROM cte;
Another solution with window functions and conditional aggregation:
WITH
cte1 AS (SELECT *, SUM(NOT Flag) OVER (PARTITION BY Client, User ORDER BY Date) AS grp FROM tablename),
cte2 AS (SELECT *, MAX(grp) OVER (PARTITION BY Client, User) AS max_grp FROM cte1)
SELECT Client, User,
MIN(CASE WHEN Flag THEN Date END) AS first_date,
MAX(CASE WHEN Flag THEN Date END) AS last_date,
SUM(Flag) AS consecutive
FROM cte2
WHERE grp = max_grp
GROUP BY Client, User;
See the demo.
Made an attempt to get the result with more simpler queries and here is my approach taking advantage of lastDate and lastFlag column too.
Run here
WITH eTT
AS
( SELECT Client, User, NULLIF(MAX(Date),
(SELECT MAX(Date) FROM tt t2 WHERE t1.Client=t2.Client AND t1.User=t2.User)) as endDate
FROM tt t1 WHERE LastFlag=0 OR LastFlag IS NULL GROUP BY Client, User
)
SELECT Client, User,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE MIN(Date) END) as first_date,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE MAX(Date) END) as last_date,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE COUNT(endDate) END) as consecutive
FROM tt LEFT JOIN eTT USING (Client, User)
WHERE Date >= endDate OR endDate IS null GROUP BY Client, User;
EDIT
The original table doesn't have LastDate and LastFlag columns and were created using OP's initial query.
Since the method used is not apparantly supported but I get an impression that OP somehow manages to do that on their side.
Hence another cte called tt can be added before eTT containing that query.
I'm facing the following problem...
Given this data:
table : votes
=========
value
=========
10
25
38
90
92
93
98
100
120
I would like to return the value only, if the difference between next and previously accepted value is bigger than 10% of the first one:
if abs(int(a)-int(b))*100/int(a) < 10:
return True
So the end list should be (I have added % difference in square brackets):
==========
result
==========
10 ()
25 (150%)
38 (52%)
90 (136%)
100 (11%)
120 (20%)
The query should also sort those values first.
I'm able to do it with code (as shown above), but haven't got any chance in coming even close to a direct query.
MySQL v.8.0.19
You don't mention what version of MySQL you are using, so I'll assume it's a mordern one (8.x). You can use LAG(). For example:
select
concat('', value,
case when prev_value is null then ''
else concat('', 100 * (value - prev_value) / prev_value, '%')
end
) as result
from (
select
value,
lag(value) over (order by value) as prev_value
from t
) x
where prev_value is null or value > prev_value * 1.1
order by value
In MySQL 8.0, you can do this with lag(). Assuming that you want to sort rows by value, that would be:
select value
from (
select
value,
lag(value, 1, 0) over(order by value) lag_value
from mytable t
) t
where value > lag_value * 1.10
If you want to use an different ordering column, then you can change the order by clause to use the relevant column.
In earlier versions, one option is a correlated subquery:
select value
from mytable t
where value > 1.10 * coalesce(
(
select t1.value
from mytable t1
where t1.value < t.value
order by t1.value desc
limit 1
),
0
)
To use another ordering column here, you need to change the where clause and the order by clause of the subquery.
On the other hand, if you want to select the next row according to the ratio against the previously selected row, then that's a different question. You need some kind of iterative process: in SQL, one approach is a recursive query:
with
data as (
select value, row_number() over(order by value) rn
from mytable t
) d,
cte as (
select 1 is_valid, value, rn from data where rn = 1
union all
select
(d.value > 1.1 * c.value),
case when d.value > 1.1 * c.value then d.value else c.value end,
d.rn
from cte c
inner join data d on d.rn = c.rn + 1
)
select value from cte where is_valid order by value
The query enumerates the values, then walks the dataset sequentially while keeping track of the last selected value, and setting flags on records that should appear in the final resultset.
Thank you for coming in.
I have a table like this:
And here is what I want to do: Segregated by id, I want to sum up the Val based on the condition.
For example, for id=1, I want the sum of Val till condition A firstly appear, then another sum of Val between the first A and the second A, and sum of Val between the second and the third A... The sum of Val when condition = B follows the same logic, but should not be influenced by A. Finally, each sum of Val only sums the same id.
How should I do this? I tried group by and partition by, but unable to obtain an ideal result. The ideal output would be like the Sum column in the picture.
Thank you very much.
Assuming that there is a column that specifies the ordering, then you can do what you want. SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.
You seem to want to define groups for As and Bs. You can do this using window functions. This is a little strange, because you want different groupings -- a case expression can handle that. Here is the idea:
select t.*,
(case when condition = 'A'
then sum(val) over (partition by id, grp_a order by <ordering col>)
when condition = 'B'
then sum(val) over (partition by id order by <ordering col>)
end) as calculation
from (select t.*,
sum(case when condition = 'A' then 1 else 0 end) over (partition by id order by <ordering col> desc) as grp_a
from t
) t;
You are looking for something like this:
WITH somatory as (SELECT id, cond, v,
SUM(v) OVER (PARTITION BY id ORDER BY id ROWS UNBOUNDED PRECEDING) as sumV
from foo),
conditional_somatory as (
select id, cond, v, sumV,
case when cond = 'A' then sumV end as somatoryA,
case when cond = 'B' then sumV end as somatoryB
from somatory
),
last_somatories as (
select id, cond, v, sumV,
SomatoryA,
max(coalesce(somatoryA,0)) over (partition by id order by id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as lastSomatoryA,
SomatoryB,
max(coalesce(somatoryB,0)) over (partition by id order by id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as lastSomatoryB
from conditional_somatory)
select id, cond, v,
case when cond = 'A' then sumV - lastSomatoryA
when cond = 'B' then sumV - lastSomatoryB
end as somatory
from last_somatories
See working fiddle.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=2b2d23b1421aa8fb45aa82ed8bad8b32
Can someone show me how to represent the following SQL statements without the use of aggregate functions?
SELECT COUNT(column) FROM table;
SELECT AVG(column) FROM table;
SELECT MAX(column) FROM table;
SELECT MIN(column) FROM table;
MIN() and MAX() can be done with simple subqueries:
select (select column from table order by column is not null desc, column asc limit 1) as "MIN",
(select column from table order by column is not null desc, column desc limit 1) as "MAX"
COUNT() and AVG() require the use of variables, if you don't allow any aggregations:
select rn as "COUNT", sumcol / rnaas "AVG"
from (select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
This latter formulation only works in MySQL.
EDIT:
The empty table is a challenge. Let's do this with a left outer join:
select cast(coalesce(rn, 0) as int) as "COUNT",
(case when rna > 0 then sumcol / rna end) as "AVG"
from (select 1 as n
) n left outer join
(select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
on n.n = 1;
Notes. This will return 0 for the count if the table is empty. That is correct. If the table is empty, it will return NULL for the average, and that is also correct.
If the table is not empty, but the values are all NULL, then it will also return NULL. The types for the count are always integers, so that should be ok. The type of the average is more problematic, but the variables will return some sort of generic numeric type, which seems compatible in spirit.
min/max can be replaced with something like this:
select t1.pk_column,
t1.some_column
from the_table t1
where t1.some_column < ALL (select t2.some_column
from the_table t2
where t2.pk_column <> t2.pk_column);
For getting the max you need to replace < with >. pk_column is the primary key column of the table and is needed to avoid comparing each row to itself (it doesn't have to be a PK it only needs to be unique)
I don't think there is an alternative for count() or avg() (at least I can't think of one)
I used the_column and the_table because column and table are reserved words
SET #t1=0, #t2=0, #t3=0,#T4=0;
COUNT:
Select #t1:=#t1+1 as CNT from table
order by #t1:=#t1+1 DESC
LIMIT 1
Similar methods could be put together for Avg and max/min using limits...
Still thinking about Min/Max...
Not to supersede the excellent answer from Gordon Linoff, but there's a little more work involved to accurately emulate the AVG(), COUNT(), and SUM() functions. (The answer for the MIN and MAX functions in Gordon's answer are spot on.)
There's a corner case when the table is empty. In order to emulate the SQL aggregate functions, we need our query to return a single row. But at the same time, we need a test of whether or not the table contains at least one row.
Here's a query that is a more precise emulation:
-- create an empty table
CREATE TABLE `foo` (col INT);
-- TRUNCATE TABLE `foo`;
SELECT IF(s.ne IS NULL,0,s.rn) AS `COUNT(*)`
, IF(s.cc>0,s.tc,NULL) AS `SUM(col)`
, IF(s.cc>0,s.tc/s.cc,NULL) AS `AVG(col)`
FROM ( SELECT v.rn
, v.cc
, v.tc
, e.ne
FROM ( SELECT #rn := #rn + 1 AS rn
, #cc := #cc + (t.col IS NOT NULL) AS cc
, #tc := #tc + IFNULL(t.col,0) AS tc
FROM (SELECT #rn := 0, #cc := 0, #tc := 0) c
LEFT
JOIN `foo` t
ON 1=1
) v
LEFT
JOIN (SELECT 1 AS ne FROM `foo` z LIMIT 1) e
ON 1=1
ORDER BY v.rn DESC
LIMIT 1
) s
NOTES:
The purpose of the inline view aliased as e is to give us a way to determine whether or not the table contains any rows. If the table contains at least one row, we'll get a value of 1 returned as column ne (not empty). If the table is empty, that query won't return a row, and e.ne will be NULL, which is something we can test in the outer query.
In order to return a row, so we can return a value, like a 0 for a COUNT, we need to insure that we return at least one row from the inline view v. Since we are guaranteed exactly one row from the inline view aliased as c (which initializes our user defined variables), we'll use that as the "driving" table for a LEFT [OUTER] JOIN operation.
But, if the table is empty, our our row counter (#rn) coming out of v is going to have a value of 1. But we'll deal with that, we have the e.ne we can check to know if the count should really be returned as 0.
In order to calculate the average, we can't divide by the row counter, we have to divide by the number of rows where col was not null. We use the #cc user defined variable to keep track of the count of those rows.
Similarly, for the SUM (and the average) we need to accumulate only the non-NULL values. (If we were to add a NULL, it would turn the whole total to NULL, basically wiping out are accumulation. So, we're going to do a conditional test to check if t.col IS NULL, to avoid accidentally wiping out the accumulation. And our accumulator is going to be a 0 if there aren't any rows that are not null. But that's not a problem, because we'll make sure we check our #cc to see if there were any rows that were included. We're going to need to check it anyway, to avoid a "divide by zero" issue.
To test, run against the empty table foo. It will return a count of 0, and NULL for SUM and AVG, equivalent to the result we get from:
SELECT COUNT(*), SUM(col), AVG(col) FROM foo;
We can also test the query against a table containing only NULL values for col:
INSERT INTO `foo` (col) VALUES (NULL);
As well as some non-NULL values:
INSERT INTO `foo` (col) VALUES (2),(3),(5),(7),(11),(13),(17),(19);
And compare the results of the two queries.
This essentially the same as the answer from Gordon Linoff, with just a little more precision to work around the corner cases of NULL values and the empty table.
I have this mysql table:
DATE | VALUE
and I wish to become a select which shows me this information as:
DATE | COUNT TOTAL | COUNT VAL=1 | COUNT VAL=2
Any ideas how I can achieve this?
SELECT date,
COUNT(*),
COUNT( IF( value = 1, 1, NULL ) ),
COUNT( IF( value = 2, 1, NULL ) )
FROM my_table
I think with SUM() you can get neater code. Since it sums the values respective expression for row.
SELECT date,
COUNT(*),
SUM( value = 1 ),
SUM( value = 2 )
FROM my_table
Official Documentation can be found here.
SELECT `date`, COUNT(*) AS `COUNT TOTAL`,
COUNT(CASE `value` WHEN 1 THEN `value` END) AS `COUNT VAL=1`
COUNT(CASE `value` WHEN 2 THEN `value` END) AS `COUNT VAL=2`
FROM mytable
GROUP BY `date`
The CASE expressions will be null when there is no match. Nulls are not counted by COUNT().
I imagine you might want a dynamic number of columns, one column for each value found in the data. This is not possible in SQL. The columns must be known at the time you write the query.
So you have two options to get subtotals per value:
First query the distinct values from all rows of value and construct an SQL query dynamically, appending one column in the select-list for each distinct value you want to report. Then run this SQL query.
Alternatively, fetch all the rows as rows. Count the subtotals per value in application code.
One further alternative is to count subtotals by groups, and include totals:
SELECT `value`, `date, COUNT(*) AS `COUNT SUBTOTAL`
FROM mytable
GROUP BY `value`, `date` WITH ROLLUP
But that doesn't report the subtotals in columns as you requested, it reports the subtotals in rows.