How to increment count of occurences of column value in MySQL - mysql

I have the following column names:
customer_email
increment_id
other_id (psuedo name)
created_at
increment_id and other_id will be unique, customer_email will have duplicates. As the results are returned I want to know what number of occurance of the email it is.
For each row, I want to know how many times thecustomer_email value has shown up so far. There will be an order by clause at the end for the created_at field and I plan to also add a where clause of where occurrences < 2
I am querying > 5 million rows but performance isn't too important because I'll be running this as a report on a read-replica database from production. In my use case, I will sacrifice performance for robustness.
| customer_email | incremenet_id | other_id | created_at | occurances <- I want this |
|----------------|---------------|----------|---------------------|---------------------------|
| joe#test.com | 1 | 81 | 2019-11-00 00:00:00 | 1 |
| sue#test.com | 2 | 82 | 2019-11-00 00:01:00 | 1 |
| bill#test.com | 3 | 83 | 2019-11-00 00:02:00 | 1 |
| joe#test.com | 4 | 84 | 2019-11-00 00:03:00 | 2 |
| mike#test.com | 5 | 85 | 2019-11-00 00:04:00 | 1 |
| sue#test.com | 6 | 86 | 2019-11-00 00:05:00 | 2 |
| joe#test.com | 7 | 87 | 2019-11-00 00:06:00 | 3 |

You can use variables in earlier versions of MySQL:
select t.*,
(#rn := if(#ce = customer_email, #rn + 1,
if(#ce := customer_email, 1, 1)
)
) as occurrences
from (select t.*
from t
order by customer_email, created_at
) t cross join
(select #ce := '', #rn := 0) params;
In MyQL 8+, I would recommend row_number():
select t.*,
row_number() over (partition by customer_email order by created_at) as occurrences
from t;

If you are running MySQL 8.0, you can just do a window count:
select
t.*,
count(*) over(partition by customer_email order by created_at) occurences
from mytable t
You don't need an order by clause at the end of the query for this to work (but you need one if you want to order the results).
If you need to filter on the results of the window count, an additional level is needed, since window functions cannot be used in the where clause of a query:
select *
from (
select
t.*,
count(*) over(partition by customer_email order by created_at) occurences
from mytable t
) t
where occurences < 2

Related

How to select all enteties grouped by common propertie without loop?

Is there any way to fatch all entities from table grouped by common property while loop?
Table storage looks like this
id | product_id | category_id
-----------+-----------------------+--------------------------
1 | 1 | 15
2 | 2 | 17
3 | 3 | 18
4 | 4 | 17
5 | 5 | 15
6 | 6 | 17
7 | 7 | 18
and final result supposed to look like this
id | product_id | category_id
-----------+-----------------------+--------------------------
1 | 1 | 15
2 | 2 | 17
3 | 3 | 18
4 | 5 | 15
5 | 4 | 17
6 | 7 | 18
7 | 6 | 15
What i want is this:
Select each record grouped by category id. It means, if table size is 3200, i need to select all of 3200 records grouped by category id in ASC order
You seem to want the values interleaved. You can use row_number() in the order by:
select s.*
from storage s
order by row_number() over (partition by storage_id order by id),
storage_id;
Here is a db<>fiddle.
EDIT:
In older versions of MySQL, you can assign a sequential number within each group using variables and then use that for ordering:
select s.*
from (select s.*,
(#rn := if(#sid = storage_id, #rn + 1,
if(#sid := storage_id, 1, 1)
)
) as seqnum
from (select s.* from storage s order by storage_id, id) s cross join
(select #rn := 0, #sid := -1) params
) s
order by seqnum, id;
The SQL Fiddle has both methods.

How do I use row_number() with partitioning and without ordering?

My table looks like :
table_1
| Id | Num |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |
I want a row_number next to 'num' column, but as soon as the num changes it's value, the row_number resets.
I want my table to look like:
| Id | Num | row_num |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 1 | 1 |
| 6 | 2 | 1 |
| 7 | 2 | 2 |
One way to get your desired output is to use lag and a conditional sum to flag when the number changes, then you can use row_number and partition by this flag:
with lagNum as (
select id, num, Lag(num) over(order by id) as v
from t
), changed as (
select id, num,
Sum(case when num = v then 0 else 1 end) over(order by id rows unbounded preceding) as v
from lagNum
)
select id, num, row_number() over(partition by v order by id) as row_num
from changed
Example Fiddle
This does require at least MySql 8 which added support for window functions
This is a type of gaps-and-islands problem. For this version, the simplest solution is probably to identify the islands using the difference of row numbers:
select t.*,
row_number() over (partition by seqnum - seqnum_2 order by id) as row_num
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by num order by id) as seqnum_2
from table_1 t
) t;
If you run the subquery, you will see how the difference identifies the "adjacent" values of num.
Note: If (as in your example) the ids are sequential with no gaps, you can simplify this to:
select t.*,
row_number() over (partition by id - seqnum_2 order by id) as row_num
from (select t.*,
row_number() over (partition by num order by id) as seqnum_2
from table_1 t
) t;

MySQL Query with distinct on a column value

I have this table (with 100k+ rows):
room_id | emote_id | count | since
----------------------------------------
1 | 22 | 718| 1577135778
1 | 23 | 124| 1577135178
1 | 24 | 842| 1577135641
2 | 22 | 124| 1577135748
2 | 23 | 345| 1577136441
2 | 24 | 43| 1577543578
3 | 22 | 94| 1572135778
3 | 23 | 4718| 1577135641
3 | 24 | 18| 1577134661
4 | 22 | 78| 1577125641
4 | 23 | 128| 1577135778
4 | 24 | 278| 1577132577
I want to get for each emote_id the row where count is the highest for this emote_id
So for this example I'd like to get this as response:
room_id | emote_id | count | since
----------------------------------------
1 | 22 | 718| 1577135778
3 | 23 | 4718| 1577135641
1 | 24 | 842| 1577135641
I'm stuck at building the query and need help. :(
You can use nested subquery with max() aggregation
select t1.*
from tab t1
join (select emote_id, max(count) as count
from tab
group by emote_id ) t2
on t1.emote_id = t2.emote_id
and t1.count = t2.count
for db version 8+ you can use window analytic functions such as dense_rank():
select room_id, emote_id, count, since
from
(
select t.*, dense_rank() over (partition by emote_id order by count desc) as dr
from tab t
) tt
where tt.dr = 1
Demo
All matching maximum values for count return through use of dense_rank() in case of tie( having equal valued count for any of emote_id ). If analytic function was row_number(), then only one row would return even if tie occurs.

How can i sum every N rows on maria db SQL?

i'm having trouble of finding a way to perform simple addition on a table of maria db, SQL.
i'm having a table called Traffic:
| start_time | end_time | col1 | col2 |
| 1485075600.000000 | 1485075900.000000 | 10 | 20 |
| 1485075900.000000 | 1485076200.000000 | 20 | 30 |
| 1485076200.000000 | 1485076500.000000 | 40 | 50 |
| 1485076500.000000 | 1485076800.000000 | 50 | 60 |
How can i sum every N columns (over col1, and col2) ?
i mean, to merge rows and sum the values of col1, and col2.
assuming the given table, And N = 2,
the result will be:
| start_time | end_time | col1 | col2|
| 1485075600.000000 | 1485076200.000000 | 30 | 50 |
| 1485076200.000000 | 1485076800.000000 | 90 | 110 |
if the table size isn't a multiple of of N, take all you can.
Any one have any idea? i don't have id's to group by on.
You would do this by enumerating the rows. To be correct, you need a column to specify the ordering -- SQL tables represent unordered sets, so they need a column for the ordering.
Let me assume it is start_time. The rest is just aggregation and arithmetic:
select min(start_time) as start_time, max(end_time) as end_time,
sum(col1) as col1, sum(col2) as col2
from (select t.*, (#rn := #rn + 1) as rn
from traffic t cross join
(select #rn := 0) params
order by start_time
) t
group by floor( (rn - 1) / #N);
The #N value is the size of the groups.

Get the top n results per group [duplicate]

This question already has answers here:
Get top n records for each group of grouped results
(12 answers)
Using LIMIT within GROUP BY to get N results per group?
(14 answers)
Closed 5 years ago.
I am using the sql to retrieve the last 20 rows from the table grouped by date. I would like to limit it so that within each post_day group only the top 10 rows votes DESC are selected.
SELECT *, DATE(timestamp) as post_day
FROM stories
ORDER BY post_day DESC, votes DESC
LIMIT 0, 20
This is what the table looks like:
STORYID TIMESTAMP VOTES
1 2015-03-10 1
2 2015-03-10 2
3 2015-03-9 5
4 2015-03-9 3
Schema
create table stories
( storyid int auto_increment primary key,
theDate date not null,
votes int not null
);
insert stories(theDate,votes) values
('2015-03-10',1),
('2015-03-10',2),
('2015-03-09',5),
('2015-03-09',3),
('2015-03-10',51),
('2015-03-10',26),
('2015-03-09',75),
('2015-03-09',2),
('2015-03-10',12),
('2015-03-10',32),
('2015-03-09',51),
('2015-03-09',63),
('2015-03-10',1),
('2015-03-10',11),
('2015-03-09',5),
('2015-03-09',21),
('2015-03-10',1),
('2015-03-10',2),
('2015-03-09',5),
('2015-03-09',3),
('2015-03-10',51),
('2015-03-10',26),
('2015-03-09',75),
('2015-03-09',2),
('2015-03-10',12),
('2015-03-10',44),
('2015-03-09',11),
('2015-03-09',7),
('2015-03-10',19),
('2015-03-10',7),
('2015-03-09',51),
('2015-03-09',79);
The Query
set #rn := 0, #thedate := '';
select theDate, votes
from
(
select storyid, theDate, votes,
#rn := if(#thedate = theDate, #rn + 1, 1) as rownum,
#thedate := theDate as not_used
from stories
order by theDate, votes desc
) A
where A.rownum <= 10;
The Results
+------------+-------+
| theDate | votes |
+------------+-------+
| 2015-03-09 | 79 |
| 2015-03-09 | 75 |
| 2015-03-09 | 75 |
| 2015-03-09 | 63 |
| 2015-03-09 | 51 |
| 2015-03-09 | 51 |
| 2015-03-09 | 21 |
| 2015-03-09 | 11 |
| 2015-03-09 | 7 |
| 2015-03-09 | 5 |
| 2015-03-10 | 51 |
| 2015-03-10 | 51 |
| 2015-03-10 | 44 |
| 2015-03-10 | 32 |
| 2015-03-10 | 26 |
| 2015-03-10 | 26 |
| 2015-03-10 | 19 |
| 2015-03-10 | 12 |
| 2015-03-10 | 12 |
| 2015-03-10 | 11 |
+------------+-------+
20 rows in set, 1 warning (0.00 sec)
Usually you should use ROW_NUMBER() per group to order records inside of each group and then select records with ROW_NUMBER <= 10. In MySQL there is no ROW_NUMBER() aggregate function but you can use User-Defined variables in MySQL to emulate ROW_NUMBER()
select storyId, post_day , votes
from (
select storyId,
DATE(timestamp) as post_day,
votes,
#num := if(#grp = DATE(timestamp), #num + 1, 1) as row_number,
#grp := DATE(timestamp) as dummy
from stories,(select #num := 0, #grp := null) as T
order by DATE(timestamp) DESC, votes DESC
) as x where x.row_number <= 10;
SQLFiddle demo
Also look at:
How to select the first/least/max row per group in SQL