MySQL aggregate sum of count - mysql

I have a simple group by query:
SELECT timestamp, COUNT(users)
FROM my_table
GROUP BY users
How do I add a sum_each_day column that will sum the users count of each row and will aggregate it forward to the next row and so on
The output should be like this:
timestamp | users | sum_each_day
2015-11-27 1 1
2015-11-28 5 6
2015-11-29 3 9
2015-11-30 7 16
Thanks in advance

You could use a sub-query, like this:
SELECT timestamp,
num_users,
(SELECT COUNT(users)
FROM my_table
WHERE timestamp <= main.timestamp) sum_users
FROM (
SELECT timestamp,
COUNT(users) num_users
FROM my_table
GROUP BY timestamp
) main

If you really need this in mysql it'll cost some performance but i believe a sub query with a count will solve it:
SELECT t1.timestamp, count (), select count () from my_table t2 where t2.timestamp <= t1.timestamp From my_table t1 Group by users
If you display this data through a scripting language like PHP it would be easier to keep a counter and display the aggregate per row.

I would do this using variable:
SET #total := 0;
SELECT timestamp, DayCount, (#total := #total + DayCount) AS Total
FROM
(SELECT timestamp, COUNT(users) AS DayCount
FROM my_table
GROUP BY timestamp) AS t1
Fiddler: I am not using your table structure here, but you can get idea

If I understand correclty, this will work:
set #c=0;
SELECT `timestamp`,sum(`users`),(select #c:=#c+sum(`users`))
FROM `my_table`
group by `timestamp`;

Related

get the most common value for each column

I'm attempting to create an SQL query that retrieves the total_cost for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA and columnB, with their respective values.
For example, with the following table contents:
cost
columnA
columnB
target
250
Foo
Bar
XYZ
200
Foo
Bar
XYZ
150
Bar
Bar
ABC
250
Foo
Bar
ABC
The result would need to be:
total_cost
columnA_dominant
columnB_dominant
columnA_value
columnB_value
850
Foo
Bar
250
400
Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA using this answer. But after this, I'm not sure how to also get the dominant value for columnB and the values too.
This is my current SQL:
SELECT
SUM(`cost`) AS `total_cost`,
COUNT(`columnA`) AS `columnA_dominant`
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC
WHERE `target` = "ABC"
UPDATE: Thanks to #Barmar for the idea of using a subquery, I managed to get the dominant values for columnA and columnB:
SELECT
-- Retrieve total cost.
SUM(`cost`) AS `total_cost`,
-- Get dominant values.
(
SELECT `columnA`
FROM `table`
GROUP BY `columnA`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnA_dominant`,
(
SELECT `columnB`
FROM `table`
GROUP BY `columnB`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnB_dominant`
FROM `table`
WHERE `target` = "XYZ"
However, I'm still having issues figuring out how to calculate the respective values.
You might get close, if we want to get percentage values we can try to add COUNT(*) at subquery to get max count by columnA and columnB then do division by total count
SELECT
SUM(cost),
(
SELECT tt.columnA
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnA_dominant,
(
SELECT tt.columnB
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnB_dominant,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnA_percentage,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnB_percentage
FROM T t1
If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery
SELECT SUM(cost) OVER(),
FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY columnA) counter1,
COUNT(*) OVER (PARTITION BY columnB) counter2
FROM T
) t1
LIMIT 1
sqlfiddle
try this query
select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
so if you want to get the percent and dominant value you must make their own query like this
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
then you can join with the sum query to get all value you want
hope this can help you

SQL Query about percentage selection

I am trying to write a query for a condition:
If >=80 percent (4 or more rows as 4/5*100=80%) of the top 5 recent rows(by Date Column), for a KEY have Value =A or =B, then change the flag from fail to pass for the entire KEY.
Here is the input and output sample:
I have highlighted recent rows with green colour in the sample.
Can someone help me in this?
I tried till finding the top 5 recent rows by the foll code:
select * from(
select *, row_number() over (partition by "KEY") as 'RN' FROM (
select * from tb1
order by date desc))
where "RN"<=5
Couldnt figure what to be done after this
Test this:
WITH
-- enumerate rows per key group
cte1 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY `key` ORDER BY `date` DESC) rn
FROM sourcetable ),
-- take 5 recent rows only, check there are at least 4 rows with A/B
cte2 AS ( SELECT `key`
FROM cte1
WHERE rn <= 5
GROUP BY `key`
HAVING ( SUM(`value` = 'A') >= 4
OR SUM(`value` = 'B') >= 4 )
-- AND SUM(rn = 5) )
-- update rows with found key values
UPDATE sourcetable
JOIN cte2 USING (`key`)
SET flag = 'PASS';
5.7 version – Ayn76
Convert CTEs to subqueries. Emulate ROW_NUMBER() using user-defined variable.

Compute 2 subqueries then group by date

I have this Table
I want to run subqueries first then add them together grouped by date
Expected Result should be like this:
I am running this query
(
SELECT DATE_FORMAT(dd1.modified_datetime,'%Y-%m-%d') as date, (v1+v2) as value FROM
(SELECT modified_datetime, Sum(data->"$.amount") as v1
FROM transactions
GROUP BY modified_datetime) as dd1 ,
(SELECT modified_datetime, MAX(data->"$.amount") as v2
FROM transactions
GROUP BY modified_datetime) as dd2
GROUP BY dd1.modified_datetime, value
)
and getting this result:
Use JOIN between subqueries and every next one:
(SELECT modified_datetime, Sum(data->"$.amount") as v1
FROM transactions
GROUP BY modified_datetime) as dd1 JOIN
(SELECT modified_datetime, MAX(data->"$.amount") as v2
FROM transactions
GROUP BY modified_datetime) as dd2 ON dd1.modified_datetime=dd2.modified_datetime
If I followed you correctly, you can use union all and aggregation:
select date_format(dt, '%Y-%m-%d') dt_day, sum(amount) value
from (
select modified_datetime dt, data ->> '$.amount' amount from transactions
union all
select created_datetime, data ->> '$.amount' from transactions
) t
group by dt_day
order by dt_day

Mysql - Accumulatively count the total on a row by row basis

I'm trying in MySql to count the number of users created each day and then get an accumulative figure on a row by row basis. I have followed other suggestions on here, but I cannot seem to get the accumulation to be correct.
The problem is that it keeps counting from the base number of 200 and not taking account of previous rows.
Where was I would expect it to return
My Sql is as follows;
SELECT day(created_at), count(*), (#something := #something+count(*)) as value
FROM myTable
CROSS JOIN (SELECT #something := 200) r
GROUP BY day(created_at);
To create the table and populate it you can use;
CREATE TABLE myTable (
id INT AUTO_INCREMENT,
created_at DATETIME,
PRIMARY KEY (id)
);
INSERT INTO myTable (created_at)
VALUES ('2018-04-01'),
('2018-04-01'),
('2018-04-01'),
('2018-04-01'),
('2018-04-02'),
('2018-04-02'),
('2018-04-02'),
('2018-04-03'),
('2018-04-03');
You can view this on SqlFiddle.
Use a subquery:
SELECT day, cnt, (#s := #s + cnt)
FROM (SELECT day(created_at) as day, count(*) as cnt
FROM myTable
GROUP BY day(created_at)
) d CROSS JOIN
(SELECT #s := 0) r;
GROUP BY and variables have not worked together for a long time. In more recent versions, ORDER BY also needs a subquery.

How to get rows with max(of a column) when using group by

When I tried:
select *
from some_table
group by table_id
order by timestamp desc;
I am gettings rows which have least timestamp values for that particular table_id(which I use for grouping)
How can I get the rows which have highest timestamp values for that particular table_id.
I also tried:
select *
from some_table
group by table_id
having max(timestamp)
order by timestamp desc;
which gives the same result as in the 1st case.
Thank You
select *
from your_table t
inner join
(
select table_id, max(created_timestamp) as mts
from your_table
group by table_id
) x on x.table_id = t.table_id
and x.mts = t.created_timestamp
In MySQL you can do
select *, max(created_timestamp) as mts
from your_table
group by table_id
but that will not make sure you get the corresponding data to your max(created_timestamp) but only to your table_id
SELECT * FROM (SELECT * FROM some_table ORDER BY timestamp) t1 GROUP BY t1.id