Handling MySql Group By - mysql

Below is the mysql code
CREATE TABLE pricing
(
`id` INT NOT NULL AUTO_INCREMENT, `cost` FLOAT NOT NULL,
`valid_on` TIMESTAMP NOT NULL, `quantity` INT NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO pricing (`id`, `cost`, `valid_on`, `quantity`) VALUES
(NULL, '4', '2017-01-01 00:00:00', '1'),
(NULL, '4', '2017-01-02 00:00:00', '1'),
(NULL, '4', '2017-01-03 00:00:00', '1'),
(NULL, '5', '2017-01-04 00:00:00', '2'),
(NULL, '5', '2017-01-05 00:00:00', '2'),
(NULL, '4', '2017-01-06 00:00:00', '2'),
(NULL, '4', '2017-01-07 00:00:00', '3'),
(NULL, '5', '2017-01-08 00:00:00', '3'),
(NULL, '5', '2017-01-09 00:00:00', '3'),
(NULL, '4', '2017-01-10 00:00:00', '3'),
(NULL, '4', '2017-01-11 00:00:00', '3'),
(NULL, '4', '2017-01-12 00:00:00', '2'),
(NULL, '5', '2017-01-13 00:00:00', '2'),
(NULL, '5', '2017-01-14 00:00:00', '2');
So when Group By is done on quantity following results are displayed.
select quantity, sum(cost) from pricing GROUP BY quantity
1 - 12
2 - 28
3 - 22
But actually I need something like the below results.
quantity start_date end_date cost
1 2017-01-01 00:00:00 2017-01-03 00:00:00 12
2 2017-01-04 00:00:00 2017-01-06 00:00:00 14
3 2017-01-07 00:00:00 2017-01-11 00:00:00 22
2 2017-01-12 00:00:00 2017-01-14 00:00:00 14
SQL Fiddle Link
Can someone please help me solve this issue...

Try this:
SELECT quantity,
MIN(valid_on) AS start_date, MAX(valid_on) AS end_date,
SUM(cost)
FROM (
SELECT id, cost, valid_on, quantity,
#rn := #rn + 1 AS rn,
#grn := IF(#q = quantity, #grn + 1,
IF(#q := quantity, 1, 1)) AS grp
FROM pricing
CROSS JOIN (SELECT #rn := 0, #q := 0, #grn := 0) AS vars
ORDER BY valid_on, quantity) AS t
GROUP BY rn - grp, quantity
The query uses variables in order to identify islands of consecutive records having the same quantity value. Using the computed grp value, it groups separately each island and calculates start/end dates, as well as the sum of cost.
Demo here

This is a pain to do in MySQL. You need to identify the groups. One method -- which is not particularly efficient -- uses a trick. For each row it counts the number of previous rows where the quantity is different from the given row. This identifies adjacent groups with the same value.
select quantity, sum(cost), min(valid_on) as start_valid_on
from (select p.*,
(select count(*)
from pricing p2
where p2.valid_on < p.valid_on and p2.quantity <> p.quantity
) as grp
from pricing p
) p
group by grp, quantity;

Related

How to get difference or delta of counts entries of each days with window functions?

I have a table with few fields like id, country, ip, created_at. Then I am trying to get the deltas between total entry of one day and total entry of the next day.
CREATE TABLE session (
id int NOT NULL AUTO_INCREMENT,
country varchar(50) NOT NULL,
ip varchar(255),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id)
);
INSERT INTO `session` (`id`, `country`, `ip`, `created_at`) VALUES
('1', 'IN', '10.100.102.11', '2021-04-05 20:26:02'),
('2', 'IN', '10.100.102.11', '2021-04-05 19:26:02'),
('3', 'US', '10.120.102.11', '2021-04-17 10:26:02'),
('4', 'US', '10.100.112.11', '2021-04-16 12:26:02'),
('5', 'AU', '10.100.102.122', '2021-04-12 19:36:02'),
('6', 'AU', '10.100.102.122', '2021-04-12 18:20:02'),
('7', 'AU', '10.100.102.122', '2021-04-12 23:26:02'),
('8', 'US', '10.100.102.2', '2021-04-16 21:33:01'),
('9', 'AU', '10.100.102.122', '2021-04-18 20:46:02'),
('10', 'AU', '10.100.102.111', '2021-04-04 13:19:12'),
('11', 'US', '10.100.112.11', '2021-04-16 12:26:02'),
('12', 'IN', '10.100.102.11', '2021-04-05 15:26:02'),
('13', 'IN', '10.100.102.11', '2021-04-05 19:26:02');
Now I have written this query to get the delta
SELECT T1.date1 as date, IFNULL(T1.cnt1-T2.cnt2, T1.cnt1) as delta from (
select TA.dateA as date1, MAX(TA.countA) as cnt1 from (
select DATE(created_at) AS dateA, COUNT(*) AS countA
FROM session
GROUP BY DATE(created_at)
UNION
select DISTINCT DATE(DATE(created_at)+1) AS dateA, 0 AS countA
FROM session
) as TA
group by TA.dateA
) as T1
LEFT OUTER JOIN (
select DATE(DATE(created_at)+1) AS date2,
COUNT(*) AS cnt2
FROM session
GROUP BY DATE(created_at)
) as T2
ON T1.date1=T2.date2
ORDER BY date;
http://sqlfiddle.com/#!9/4f5fd26/60
Then I am getting the results as
date delta
2021-04-04 1
2021-04-05 3
2021-04-06 -4
2021-04-12 3
2021-04-13 -3
2021-04-16 3
2021-04-17 -2
2021-04-18 0
2021-04-19 -1
Now, is there any place of improvements/optimizes on it with/or window functions? (I am zero with SQL, still playing around).
Try a shorter version
with grp as (
SELECT t.dateA, SUM(t.cnt) AS countA
FROM session,
LATERAL (
select DATE(created_at) AS dateA, 1 as cnt
union all
select DATE(DATE(created_at)+1), 0 as cnt
) t
GROUP BY dateA
)
select t1.dateA as date, IFNULL(t1.countA-t2.countA, t1.countA) as delta
from grp t1
left join grp t2 on DATE(t2.dateA + 1) = t1.dateA
order by t1.dateA
db<>fiddle

Ties on Hall of Fame (group player, max level then max score for each game when month is...)

Need to list a Hall of Fame of best players, the database contains each single game player in different games.
The level has the priority, if the level are the same, check the highest score.
I've a database with user_id, level, score, game and data. Schema here:
CREATE TABLE IF NOT EXISTS `docs` (`user_id` int(6) unsigned NOT NULL,
`level` int(3) unsigned NOT NULL,`game` varchar(30) NOT NULL,
`score` int(5) unsigned NOT NULL,
`data` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `docs` (`user_id`, `level`, `game`, `score`,`data`) VALUES
('1', '7', 'pacman', '8452','2018-02-14 15:00:00'),
('1', '9', 'pacman', '9999','2018-02-10 16:30:00'),
('2', '8', 'pacman', '8500','2018-02-24 17:30:00'),
('1', '10', 'pacman', '9100','2018-02-15 18:30:00'),
('1', '10', 'pacman', '8800','2018-02-15 18:11:00'),
('1', '11', 'snake', '9600','2018-02-14 15:00:00'),
('1', '6', 'snake', '7020','2018-02-11 11:30:00'),
('2', '8', 'snake', '8500','2018-02-24 14:00:00'),
('2', '12', 'snake', '9200','2018-02-25 19:00:00'),
('2', '12', 'snake', '9800','2018-02-25 19:20:00'),
('1', '4', 'pacman', '2452','2018-03-11 15:00:00'),
('1', '6', 'pacman', '4999','2018-03-07 16:30:00'),
('2', '7', 'pacman', '5500','2018-03-02 17:30:00'),
('1', '7', 'pacman', '5100','2018-03-01 18:30:00'),
('1', '3', 'snake', '3600','2018-03-03 15:00:00'),
('1', '5', 'snake', '4220','2018-03-01 11:30:00'),
('2', '5', 'snake', '3900','2018-03-04 14:00:00'),
('2', '5', 'snake', '5200','2018-03-05 19:00:00');
i want retrieve the hall of fame for selected month and game,
for example if i choose pacman on march the result should be:
user level score
2 7 5500
1 7 5100
i tryed this how suggest in other similar topic
select d1.*
from docs d1
left outer join docs d2
on (d1.user_id = d2.user_id and d1.level < d2.level)
where d2.user_id is null
order by level desc;
but i've duplicate levels for same user, then i cant choose the game or the month.
here there is the SQL Fiddle
SELECT x.* FROM docs x
JOIN
(select user_id
, game
, MONTH(data) month
, MAX(score) score
from docs
where game = 'pacman'
and MONTH(data) = 3
group
by user_id
, game
, MONTH(data)
) y
ON y.user_id = x.user_id
AND y.game = x.game
AND y.month = MONTH(x.data)
AND y.score = x.score;
or something like that
after a long work, study and research this is the best solution for me:
SELECT user_id, level, score, game
FROM (
SELECT *,
#rn := IF(user_id = #g, #rn + 1, 1) rn,
#g := user_id
FROM (select #g := null, #rn := 0) x,
docs where game='pacman'
ORDER BY user_id, level desc, score desc, game
) X
WHERE rn = 1 order by level desc, score desc;
the explanation is in this topic Select one value from a group based on order from other columns

MYSQL self join get row based on MAX and having

I need to COUNT rows from 2 tables and only get the rows with highest g_event_id if events.event_id IN (30, 31, 32, 33). Only take rows in account where events.event_id is 30-33.
SQL Fiddle: Fiddle
My tables:
CREATE TABLE event_parties
(`g_event_id` int, `agent_id` int)
;
INSERT INTO event_parties
(`g_event_id`, `agent_id`)
VALUES
(2917, '2'),
(2918, '2'),
(2919, '2'),
(3067, '3'),
(3078, '3'),
(3079, '1'),
(3082, '1'),
(3917, '2'),
(3918, '2'),
(3919, '2'),
(4067, '3'),
(4078, '3'),
(4079, '1'),
(5067, '3'),
(5078, '3'),
(5079, '1'),
(6067, '3'),
(6078, '3'),
(6079, '1'),
(7067, '3'),
(7078, '3'),
(7079, '1'),
(8067, '3'),
(8078, '3'),
(8079, '1')
;
CREATE TABLE events
(`g_event_id` int, `event_id` int, `event_time` datetime)
;
INSERT INTO events
(`g_event_id`, `event_id`, `event_time`)
VALUES
(2917, '29', '2016-10-19 15:24:25'),
(2918, '31', '2016-10-19 15:24:28'),
(2919, '21', '2016-10-19 15:29:46'),
(3067, '29', '2016-10-20 15:33:46'),
(3078, '23', '2016-10-21 15:29:46'),
(3079, '29', '2016-10-20 15:34:46'),
(3082, '30', '2016-10-21 15:42:46'),
(3917, '29', '2016-10-19 15:24:25'),
(3918, '31', '2016-10-19 15:24:28'),
(3919, '21', '2016-10-19 15:29:46'),
(4067, '29', '2016-10-20 15:33:46'),
(4078, '23', '2016-10-21 15:29:46'),
(4079, '29', '2016-10-20 15:34:46'),
(5067, '29', '2016-10-20 15:33:46'),
(5078, '23', '2016-10-21 15:29:46'),
(5079, '29', '2016-10-20 15:34:46'),
(6067, '29', '2016-10-20 15:33:46'),
(6078, '23', '2016-10-21 15:29:46'),
(6079, '29', '2016-10-20 15:34:46'),
(7067, '29', '2016-10-20 15:33:46'),
(7078, '23', '2016-10-21 15:29:46'),
(7079, '29', '2016-10-20 15:34:46'),
(8067, '29', '2016-10-20 15:33:46'),
(8078, '23', '2016-10-21 15:29:46'),
(8079, '29', '2016-10-20 15:34:46')
;
The select is suppose to give me the status of an Callcenter Agent, i want to count how many agents (agent_id) in each state (event_id). As the table "events" is just events of the agents i only need to count the latest (with highest value) g_event_id of each agent_id and tricky part is that i only want to count where event_id IN (30, 31, 32, 32, 33).
So basicly, select rows with highest g_event_id (and event_id IN (30, 31, 32, 33)) for each agent_id.
I need an JOIN between these two tables with g_event_id as the ID. The field g_event_id is the key and only appears once. I need all fields in table events and I need the row with highest g_event_id or with highest event_time.
Like this:
event_id N_events
-------- ----------
31 1
30 1
I have this select so far:
SELECT event_id,
COUNT(events.event_id) AS N_events
FROM event_parties
INNER JOIN events USING (g_event_id)
LEFT JOIN event_parties AS later_event
ON (later_event.agent_id = event_parties.agent_id
AND later_event.g_event_id > event_parties.g_event_id)
WHERE later_event.g_event_id IS NULL AND event_parties.agent_id != 0 AND events.`event_id` IN (30,31,32,33)
GROUP BY events.event_id
Problem with this select above is that it will only give me the rows with highest g_event_id, i want to only select rows having events.event_id = (30, 31, 32, 33) and then count the rows with highest g_event_id
Been trying to use having after the GROUP (HAVING events.event_id IN (30,31,32,33)) without any success.
This query should give you your result:
select e.event_id, count(stats.agent_id) as N_count
from (
select max(p.g_event_id) as g_event_id, p.agent_id
from events e
join event_parties p
on e.g_event_id = p.g_event_id
where e.event_id in (30,31,32,33)
group by p.agent_id
) as stats
join events e
on e.g_event_id = stats.g_event_id
group by e.event_id;
The inner query (stats) first retrieves the latest relevant status of each agent: it will get the largest g_event_id for each agent_id with an event_id in the given range (so at most one row for each agent).
It will then be joined with the events-table to retrieve the actual event_id for this g_event_id; then it counts the number of agents per event_id.
As worked out in the comments, this assumes that g_event_id is the primary key for both tables (but especially for events), and that the newest status is given by the largest g_event_id, not the event_time.

Use nested subquery to fetch value with multiple condition

I have a table named test_plan (id, unit, num)
I inserted some values
INSERT INTO `test_plan` (`id`, `unit`, `num`) VALUES
('1', '1', '12'),
('2', '1', '13'),
('3', '1', '14'),
('4', '1', '10'),
('5', '2', '10'),
('6', '2', '9'),
('7', '2', '-1'),
('8', '2', '-1'),
('9', '2', '-1'),
('10', '3', '-1'),
('11', '3', '-1'),
('12', '3', '-1');
I have to fetch unit what is fraction of each unit to total unit when num is not equals to -1
i.e.after run the query it display as unit 1 is 100% completed, unit 2 is 40% completed, unit 3 is 0% completed as row wise. I can count the number of each unit but not the how much it completed.
I tried JOIN for this
SELECT a.unit, numb / count(*) as frac FROM test_plan as a
LEFT OUTER JOIN (SELECT unit, count(num) as numb FROM test_plan where num != -1 group by unit) as b
ON a.unit = b.unit group by a.unit;
try this:
select unit,
(sum(case when num = -1 then 0 else 1 end) / count(*)) * 100 as pct_complete
from lecture_plan group by unit;
there's no need for a nested sub query, the combination of aggregation and the case statement is sufficient

SQL query to show top x records with evenly distributed values

I have a database of contacts at companies. Multiple contacts per company in different departments. Each company has turnover and industry data attached to it.
I need to write a query that shows the top 10 most recently added contacts (unix timestamp) but i don't want it to be all Marketing contacts (even if the top 10 are), i would like to look at the top 100 instead and get 10 contacts out that are from different departments. So instead of the top 10 being all marketing, there might be 2 marketing, 2 I.T, 2 HR, 2 Personnel.
So my query basically is this:
SELECT DISTINCT `surname`, `job_title`, `company_name`
FROM (`company_database`)
WHERE `employee_code` IN ('6', '7', '8', '9', '10', '11', '12', '13')
AND `turnover_code` IN ('5', '6', '7', '8')
AND `contact_code` IN ('16', '17', '26', '27', '9', '10', '30', '31', '23', '24', '12', '13') AND `industry_code` NOT IN ('22', '17', '35', '36') LIMIT 10
But that simply returns a unique row. What i need is one contact per company and no more than 1 contact_code type. I also only want 10 rows returned, but obviously to get this 1 per contact code per row, the query will need to look at more than 10.
Is this possible in just a query? Or should i do something programatically to apply the logic needed to whittle down the results of a query.
you can work with a temporary table using the myisam engine and a trick.
If you create the following temporary table:
create table tmp_company_sequence
( surname varchar(255)
,job_title varchar(255)
,company_name varchar(255)
,date_added date
,contact_code int
,counter int auto_increment
,primary key (contact_code,counter)
);
Now
insert into `tmp_company_sequence`( `surname`, `job_title`, `company_name`,`contact_code`,`date_added`)
SELECT DISTINCT `surname`, `job_title`, `company_name`,`contact_code`,`date_added`
FROM (`company_database`)
WHERE `employee_code` IN ('6', '7', '8', '9', '10', '11', '12', '13')
AND `turnover_code` IN ('5', '6', '7', '8')
AND `contact_code` IN ('16', '17', '26', '27', '9', '10', '30', '31', '23', '24', '12', '13') AND `industry_code` NOT IN ('22', '17', '35', '36')
order by contact_code, added_date desc;
Your temporary table will now hold all the contacts with a counter. The counter is increased for every contact of the same contact_code. SO the newest contact with a certain contact code will have counter = 1, the next recent will have counter = 2 and so on.
You can now do a
select *
from tmp_company_sequence
order by counter asc, date_added desc
limit 10;
This will give you a list of the latest contacts added over all contact_codes.
Edit:
I just realised this could be done with a single query, but it is even more ugly:
SELECT `surname`
, `job_title`
, `company_name`
, `contact_code`
FROM(
SELECT
`surname`
, `job_title`
, `company_name`
, `contact_code`
, `date_added`
, IF(contact_code = #prev_contact_code,#i:=#i+1,#i:=1) AS counter
, #prev_contact_code = contact_code
FROM
(`company_database`)
,(SELECT #i := 1)
WHERE `employee_code` IN ('6', '7', '8', '9', '10', '11', '12', '13')
AND `turnover_code` IN ('5', '6', '7', '8')
AND `contact_code` IN (
'16'
, '17'
, '26'
, '27'
, '9'
, '10'
, '30'
, '31'
, '23'
, '24'
, '12'
, '13'
)
AND `industry_code` NOT IN ('22', '17', '35', '36')
ORDER BY contact_code
, added_date DESC) sub
WHERE counter = 1
ORDER BY added_date DESC
LIMIT 10;
This does basically the same as the option with the temporary table, but it creates the counter in the fly by storing data from the previous column in global variables. It is messy but can be used within a single query.