Cannot cumulatively sum `COUNT(*)` - mysql

The second section of this answer uses variables to create a cumulative sum of another column. I'm doing the same thing, except that I am using a GROUP BY statement, and summing COUNT(*) instead of a column. Here is my code to create a minimal table and insert values:
CREATE TABLE `test_group_cumulative` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`group_id` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `test_group_cumulative` (`id`, `group_id`)
VALUES
(1, 1),
(2, 2),
(3, 3);
And here is the code that is failing:
SELECT
`group_id`,
COUNT(*) AS `count`,
#count_cumulative := #count_cumulative + COUNT(*) AS `count_cumulative`
FROM `test_group_cumulative` AS `tgc`
JOIN (SELECT #count_cumulative := 0) AS `_count_cumulative`
GROUP BY `group_id`
ORDER BY `id`;
Here is the result:
group_id count count_cumulative
1 1 1
2 1 1
3 1 1
As you can see, count_cumulative is NOT summing correctly. However, here's the weird part. If I replace the COUNT(*) in count_cumulative with it's value, 1, the query works correctly.
#count_cumulative := #count_cumulative + 1 AS `count_cumulative`
Here is the correct result:
group_id count count_cumulative
1 1 1
2 1 2
3 1 3
Obviously, in my app, there will be more than one item in each group, so COUNT(*) won't always be 1. I know there are ways to do this with joins or subqueries, and I'll do that if I have to, but in my mind this SHOULD work. So why isn't COUNT(*) working inside of a cumulative sum?

I agree with #Ashalynd, the value of count(*) is not evaluated yet. Here is a little experiment I did :
1.
SELECT
GROUP_ID,
#COUNTER := #COUNTER + COUNT(*) GROUPCOUNT,
#COUNTER COUNTER
FROM
TEST_GROUP_CUMULATIVE,
(SELECT #COUNTER := 0) R
GROUP BY
GROUP_ID;
-- RESULT
============
GROUP_ID GROUPCOUNT COUNTER
------------------------------------
1 1 0
2 1 0
3 1 0
2.
SELECT #COUNTER;
-- RESULT
=============
#COUNTER
--------
1
For each group the variable is being initialized as 0. This means COUNT(*) has not been evaluated yet.
Also, when you do:
1.
SELECT
GROUP_ID,
#COUNTER := #COUNTER + 1 GROUPCOUNT,
#COUNTER COUNTER
FROM
TEST_GROUP_CUMULATIVE,
(SELECT #COUNTER := 0) R
GROUP BY
GROUP_ID;
-- RESULT
============
GROUP_ID GROUPCOUNT COUNTER
------------------------------------
1 1 1
2 1 2
3 1 3
2.
SELECT #COUNTER;
-- RESULT
=============
#COUNTER
--------
3
It does not have to evaluate 1. It directly sums it up and it gives you the cumulative sum.

This is a problem I often face when doing time series analysis. My preferred way to tackle this is to wrap it into a second select and introduce the counter in the last layer. And you can adapt this technique to more complicated data flows using temporary tables, if reqiured.
I did this small sqlfiddle using the schema you present: http://sqlfiddle.com/#!2/cc97e/21
And here is the query to get the cumulative count:
SELECT
tgc.group_id, #count_cumulative := #count_cumulative + cnt as cum_cnt
FROM (
SELECT
group_id, COUNT(*) AS cnt
FROM `test_group_cumulative`
group by group_id
order by id) AS `tgc`,
(SELECT #count_cumulative := 0) AS `temp_var`;
This is the result I get:
GROUP_ID CUM_CNT
1 1
2 2
3 3
The reason your attempt did not work:
When you do a group by with the temporary variable, mysql executes individual groups independently, and at the time each group is assigned the temporary variable current value, which in this case is 0.
If, you ran this query:
SELECT #count_cumulative;
immediately after
SELECT
`group_id`,
COUNT(*) AS `count`,
#count_cumulative := #count_cumulative + COUNT(*) AS `count_cumulative`
FROM `test_group_cumulative` AS `tgc`
JOIN (SELECT #count_cumulative := 0) AS `_count_cumulative`
GROUP BY `group_id`
ORDER BY `id`;
you would get the value 1. For each of your groups, the #count_cumulative is being reset to 0.
Hence, in my proposed solution, I circumvent this issue by generating the 'group-counts' first and then doing the accumulation.

Related

User-defined variable in ranking statement

I will appreaciate any help on this issue. I already spent hours without any real solution.
I have a SQL
SELECT to_place, rank
FROM
(SELECT g1.to_place as to_place, g1.pcount as pcount,
#rank := IF(#current_to_place = g1.to_place, #rank + 1, 1) AS rank,
#current_to_place := g1.to_place
FROM
(select
to_place, count(*) as pcount
from temp_workflows
group by to_place
order by to_place,pcount desc) g1
ORDER BY g1.to_place, g1.pcount DESC) ranked
In table g1, I am grouping my data to find the most common occurrence of to_place.And then I want to rand those occurrences in ascending order (so I can later select top 3 of the most common occurrences per each to_place category.
The issue is that the user-defined variable is unpredictable (#rank is sometimes always 1) which probably is related to the fact that in one statement, I should not reference the same variable (current_to_place). I read a lot about using separate statements etc. but I could find a way to write my statement in a different way. How can I define #current_to_place elsewhere so the result is the same?
Thanks in advance for your help.
I think you should be testing pcount to get rank and you should initialise variables
DROP TABLE IF EXISTS T;
CREATE TABLE T
(to_place int);
insert into t values (1),(2),(2),(3),(3),(3);
SELECT to_place, rank
FROM
(
SELECT g1.to_place as to_place, g1.pcount as pcount,
#rank := IF(#current_to_place <> pcount, #rank + 1, 1) AS rank,
#current_to_place := pcount
FROM
(select
to_place, count(*) as pcount
from t
group by to_place
order by to_place,pcount desc) g1
cross join(select #rank:=0,#current_to_place:=0) r
ORDER BY g1.pcount DESC
)
ranked
+----------+------+
| to_place | rank |
+----------+------+
| 3 | 1 |
| 2 | 2 |
| 1 | 3 |
+----------+------+
3 rows in set (0.016 sec)

SELECT without duplicates unless interupted by other values

Given the following data:
name | temp
-----------
hoi | 15
hoi | 15
hoi | 16
hoi | 15
hej | 13
hoi | 13
I would like to select the data in the given two columns without duplicates, However I do want to keep duplicates that are duplicates if they where interrupted by another value:
name | temp
-----------
hoi | 15 // selected
hoi | 15 // ignored duplicate
hoi | 15 // ignored duplicate
hoi | 16 // selected
hoi | 15 // selected because while not being unique it follows a different value
hoi | 15 // ignored duplicate
hej | 13 // selected
hoi | 13 // selected
hoi | 13 // ignored duplicate
hoi | 14 // selected
hoi | 13 // selected because while not being unique it follows a different value
This question was hard to formulate for me given English is not my native tongue, Feel free to edit the question or ask for clarifications.
Edit:
There is an id field and a datetime field.
Edit 2:
I use mySQL 5.7
Since you are using MySQL 5.7, which doesn't support analytical functions, you will need to use variables to store the values of temp and name, from the previous row:
SELECT t.ID,
t.Name,
t.Temp
FROM ( SELECT t.*,
IF(#temp = t.temp AND #name = t.Name, 1, 0) AS IsDuplicate,
#temp:= t.temp,
#name:= t.Name
FROM YourTable AS t
CROSS JOIN (SELECT #temp := 0, #name := '') AS v
ORDER BY t.ID
) AS t
WHERE t.IsDuplicate = 0
ORDER BY ID;
Example on DB<>Fiddle
The key parts are (not in the order in which they appear, but in the order in which it is logical to think about it).
(1) Initialise the variables, and order by ID (or whatever field(s) you like) to ensure variables are assigned in the correct order
CROSS JOIN (SELECT #temp := 0, #name := '') AS v
ORDER BY t.ID
(2) Check if the values stored in the variables matches the current row, and flag with a 1 or a 0
IIF(#temp = t.temp AND #name = t.Name, 1, 0) AS IsDuplicate
(3) Assign the values of temp and name in the current row to the variables, so they can be checked against the next row:
#temp:= t.temp,
#name:= t.Name
(4) Remove duplicates from the final data set:
WHERE t.IsDuplicate = 0;
To go one further, you could change the IsDuplicate flag to be a group marker, and use GROUP BY, so you can find out how many records there were in total, while still not displaying duplicates:
SELECT MIN(ID) AS FirstID,
t.Name,
t.Temp,
COUNT(*) AS Records,
MAX(ID) AS LastID
FROM ( SELECT t.*,
#group:= IF(#temp = t.temp AND #name = t.Name, #group, #group + 1) AS GroupID,
#temp:= t.temp,
#name:= t.Name
FROM YourTable AS t
CROSS JOIN (SELECT #temp := 0, #name := '', #group:= 0) AS v
ORDER BY t.ID
) AS t
GROUP BY t.GroupID, t.Name, t.Temp
ORDER BY t.GroupID;
Example on DB<>Fiddle
This may be surplus to requirements, but it can be useful as you are able to extract a lot more information than when just identifying duplicate rows.
Finally if/when you upgrade to version 8.0 or newer, you will be able to use ROW_NUMBER(), or if you move to any other DBMS that supports ROW_NUMBER() (which is most nowadays), then you can use the following:
SELECT MIN(ID) AS FirstID,
t.Name,
t.Temp,
COUNT(*) AS Records,
MAX(ID) AS LastID
FROM ( SELECT t.*,
ROW_NUMBER() OVER(ORDER BY ID) -
ROW_NUMBER() OVER(PARTITION BY Temp, Name ORDER BY ID) AS GroupID
FROM YourTable AS t
ORDER BY t.ID
) AS t
GROUP BY t.GroupID, t.Name, t.Temp
ORDER BY t.GroupID;
Example on DB<>Fiddle
Generic Solution
You can use the following query to do this on any DBMS:
select nd.*
from dedup nd
inner join (
-- find the previous id for each id
select id, (select max(id) from dedup where id < o.id) prev_id
from dedup o
) id_to_prev on id_to_prev.id = nd.id
-- join with the prev row to check for dups
left join dedup d on d.id = id_to_prev.prev_id
and d.name = nd.name
and d.temp = nd.temp
where d.id is null -- if no prev row found with same name+temp, include this row
order by nd.id
SQL Fiddle: http://sqlfiddle.com/#!9/0584ca3/9
If You are using Oracle:
select name, temp from (
select id,
name,
temp,
lag(temp,1,-99999) over (order by id) as temp_prev
from table
order by id) t
where t.temp != t.temp_prev
might work for You (depending on Your Oracle version!), it uses the LAG analytics function to look into previous rows values, creates a temp table then filters it.
create table #temp (name varchar(3),temp int)
insert into #temp values ('hoi',15)
insert into #temp values ('hoi',15)
insert into #temp values ('hoi',15)
insert into #temp values ('hoi',16)
insert into #temp values ('hoi',15)
insert into #temp values ('hoi',15)
insert into #temp values ('hej',13)
insert into #temp values ('hoi',13)
insert into #temp values ('hoi',13)
insert into #temp values ('hoi',14)
insert into #temp values ('hoi',13)
;with FinalResult as (
select ROW_NUMBER()Over(partition by name,temp order by name) RowNumber,*
from #temp
)
select * from FinalResult where RowNumber =1
drop table #temp
You want to look at the previous row in order to decide whether to show a row or not. This would be easy with LAG, available as of MySQL 8. With MySQL 5.7 you need a correlated subquery with LIMIT instead to get the previous row.
select *
from mytable
where not (name, temp) <=>
(
select prev.name, prev.temp
from mytable prev
where prev.id < mytable.id
order by prev.id desc
limit 1
);
Demo: https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=4c775dbee12298cd93c5087d7085982f

MYSQL count distinct datas depends on if condition

I have really different problem about database query. There is a little bit different scenarios:
I have a table created with 3 columns. They have ID, ItemId, TypeId columns. I need a count query, it should count ItemId and TypeId together but except duplicate columns. For example;
Id ItemId TypeId
-- ------ ------
1 1 1 -> count +1
2 1 1 -> ignore
3 1 2 -> count -1
4 1 2 -> ignore
5 1 1 -> count +1
result count = 1
In the end, if distinct row repeated, count ignore that row. But TypeId data changed for one specific Item it should increase or decrease count. TypeId equals to 1 count +=1, equals to 2 count -=1.
In MySQL, you would seemingly use count(distinct):
select count(distinct itemId, typeId)
from t;
However, you really have a gaps-and-islands problem. You are looking at the ordering to see where things change.
If I trust that the id has no gaps, you can do:
select count(*)
from t left join
t tprev
on t.id = tprev.id + 1
where not ((t.itemId, t.typeid) <=> (tprev.itemId, t.prev.id))
Try the following query. This employs User-defined session variables. It will work in all the cases (including gaps in Id):
SELECT
SUM(dt.factor) AS total_count
FROM
( SELECT
#factor := IF(#item = ItemId AND
#type = TypeId,
0,
IF(TypeID = 2, -1, 1)
) AS factor,
#item := ItemId,
#type := TypeId
FROM your_table
CROSS JOIN (SELECT #item := 0,
#type := 0,
#factor := 0) AS user_init_vars
ORDER BY Id
) AS dt
DB Fiddle DEMO

Update all names in a column incremental

I have a column "Customer". I would like to update all the names of the rows as such "Name1", "Name2", ... , "NameX".
If I do
UPDATE Customers
SET ContactName='Name1';
It sets every row to 'Name1'. How can I do this incremental? +1 for every name.
Try this
update Customers,(SELECT #n := 0) m set ContactName =concat('Name',#n := #n + 1);
set #i = 0;
update Customer
set ContactName=concat('Name', #i := #i+1)
SQL Fiddle
I'm not too versed in SQL but you should be able to use a WHILE loop to simulate a FOR loop. Generally,
DECLARE #cnt INT = 0;
WHILE #cnt < cnt_total
BEGIN
{...statements...}
SET #cnt = #cnt + 1;
END;
where 'cnt_total' is the total number of loops you want perform and 'statements' is what actions you want to perform during each iteration.. should be able to adapt this to your problem.
Here's my solution in PostgreSQL:
In our case, we had some sales simulations with repeated names and our client asked us to prevent same user_id with repeated simulation name.
We had something like the following (sample simulation table):
id uid name
1 1 simu
2 1 simu
3 2 test
4 2 simu
5 2 test
6 2 simu
which had to be turned into this:
id uid name
1 1 simu - 1
2 1 simu - 2
3 2 test - 1
4 2 simu - 1
5 2 test - 2
6 2 simu - 2
-- Create a table to store duplicated names plus an index
-- row_number() over (partition by s1.name) will create a name with a number
-- for each duplicated name
create table sim_tmp_name as
select *
from (select s1.id, s1.name || ' - ' || row_number() over (partition by s1.name) as name
from simulation s1
where s1.id in (
select t1.id
from (
select *,
row_number() over (order by s.name)
from simulation s
where s.name in (
select t.name
from (select app_user_id, name, count(1)
from simulation
group by 1, 2
having count(1) > 1
order by 3 desc) as t)
order by s.name) as t1
)) as sn;
-- Here we update our real table with the brand-new generated names in sim_tmp_name
update simulation s1
set name = (select s2.name from sim_tmp_name s2 where s2.id = s1.id)
where s1.id in (select s2.id from sim_tmp_name s2);
-- Here we create an index to avoid duplications
create unique index simulation_id_name_idx on simulation (app_user_id, upper(name));
-- Here we drop the temporary table
drop table sim_tmp_name;

MySQL: how to select the Nth value of each group with GROUP BY

I want to select the 2nd response column value of each new_threads group, with a zero as the value if it is a group of 1 row.
new_treads|response
------------------
1 | 0
1 | 1
2 | 0
2 | 0
2 | 1
... | ...
9 | 0
9 | 1
9 | 0
10 | 0
The output being:
new_treads|response
------------------
1 | 1
2 | 0
... | ...
9 | 1
10 | 0
So far, I understand how to get the first with MIN, but I need the 2nd
SELECT
thread,
min(response)
FROM messages
GROUP BY thread;
I would like to use GROUP BY because I'm using GROUP BY for other SELECTs as well
Thanks!
Since the rows are not "numbered", you need to create a number for each group and then select it. I'd do that with user variables:
select thread, response
from (
select #n := (case
when m.thread = #prev_thread then #n
else 0
end) + 1 as n -- If the current thread is the same as the
-- previous row, then increase the counter,
-- else, reset it
, #prev_thread := m.thread as thread -- Update the value of
-- #prev_thread
, m.response
from
(select #n := 0, #prev_thread := 0) as init
-- The 'init' subquery initializes the
-- temp variables:
-- #n is a counter
-- #prev_thread is an identifier for the
-- previous thread id
, messages as m
order by m.thread -- You need to add a second column to order
-- each response (such as "response_id", or
-- something like that), otherwise the returned
-- row may be a random one
) as a
where n = 2; -- Select the rows from the subquery where the counter equals 2
The above works quite fine to find the 2nd row of each group, but only if there's one. So now: how to get a NULL value if there isn't a second row?
The easiest way to do this would be to use a left join:
select t.thread, b.response
from (select distinct thread from messages) as t
left join (
-- Put the above query here
) as b on t.thread = b.thread;
SELECT
thread,
min(response)
FROM messages
GROUP BY thread
HAVING response > min(response)
try this just want to know if it works
I would like to elaborate on the answer above. While it worked well for me, it took some time to piece together the context and generalize it so I could apply it to my code. I hope that this answer will better generalize what is laid out above...
SELECT *
FROM (SELECT distinct keyField --- keyField is the field the query is grouping by
FROM TABLE
-- Add keyField Constraint --
-- Add non-keyField Constraint --
INNER JOIN (SELECT *,
#n:=(CASE -- Iterating through...
WHEN keyField = #prev_keyField -- When keyField value == previous keyField value
THEN #n:=#n+1 -- Define n as the row in the group
ELSE 1 -- When keyField value != previous keyField value, then n is the 1st row in the group
END) as n,
#prev_keyField:= keyField -- Define previous keyField value for the next iteration
FROM (SELECT #n:=0,#prev_keyField:=0) r,TABLE as p
-- Add non-keyField Constraint--
ORDER BY keyField,sortField DESC -- Order by keyField and the field you are sorting by
-- ei. keyField could be `thread`,
-- and sort field could be `timestamp` if you are sorting by time
) s ON s.keyField = p.keyField
WHERE s.n = 2 -- Define which row in the group you want in the query