MySQL sequence create - mysql

I need some help pulling records that happen in a sequence in the MySQL environment.
My dataset consists of cross-country games and the winning and losing country. I need to identify countries which have won atleast 3 games in a row. Below is a reproducible example. I created a matches dataset.
CREATE TABLE matches (date DATE, winner CHAR(10), loser CHAR(10));
INSERT INTO matches (date,winner,loser) VALUES (STR_TO_DATE('3-03-2013', '%m-%d-%Y') ,'USA','CHINA');
INSERT INTO matches (date,winner,loser) VALUES (STR_TO_DATE('3-05-2013', '%m-%d-%Y') ,'USA','RUSSIA');
INSERT INTO matches (date,winner,loser) VALUES (STR_TO_DATE('3-06-2013', '%m-%d-%Y') ,'FRANCE','GERMANY');
INSERT INTO matches (date,winner,loser) VALUES (STR_TO_DATE('3-09-2013', '%m-%d-%Y') ,'USA','RUSSIA');
INSERT INTO matches (date,winner,loser) VALUES (STR_TO_DATE('3-11-2013', '%m-%d-%Y') ,'USA','INDIA');
INSERT INTO matches (date,winner,loser) VALUES (STR_TO_DATE('3-15-2013', '%m-%d-%Y') ,'USA','AUSTRALIA');
INSERT INTO matches (date,winner,loser) VALUES (STR_TO_DATE('3-15-2013', '%m-%d-%Y') ,'USA','NEW ZEALAND');
I created another dataset which has a row number for each country ordered by date.
CREATE TABLE matches2
(
date DATE,
winner CHAR(10),
loser CHAR(10),
row INT
);
INSERT INTO matches2
(
row,
winner,
date,
loser
)
SELECT row,
winner,
date ,
loser
FROM
(
SELECT winner,
(#winner:=#winner+1) AS row,
date ,
loser
FROM matches ,
(SELECT #winner := 0) r
) x
ORDER BY date;
The table matches2 looks like below
date winning losing row
2013-03-03 USA CHINA 1
2013-03-05 USA RUSSIA 2
2013-03-06 FRANCE GERMANY 3
2013-03-09 USA RUSSIA 4
2013-03-11 USA INDIA 5
2013-03-15 USA AUSTRALIA 6
2013-03-15 USA NEW ZEALAN 7
As the data shows, USA has won >3 games in a row. how I write a code to capture this sequence ?

You can do this with a sequence of joins:
select m1.*, m2.date, m3.date
from matches2 m1 join
matches2 m2
on m2.row = m1.row + 1 and m2.winner = m1.winner join
matches2 m3
on m3.row = m2.row + 1 and m3.winner = m2.winner join
matches2 m4
on m4.row = m3.row + 1 and m4.winner = m3.winner;

Here's another approach to returning the "winner" of at least three in a row, if we consider only the matches that the country participated in as a series. That is, an intervening match between two different countries isn't considered to break another teams winning streak.
SELECT z.winner
FROM (SELECT #cnt := IF(v.team=#prev_team AND v.winner=#prev_winner,#cnt+1,1) AS cnt
, #prev_team := v.team AS team
, #prev_winner := v.winner AS winner
FROM (SELECT t.team
, m.winner
, m.loser
, m.date
FROM (SELECT #prev_team := NULL, #prev_winnner := NULL, #cnt := 0) i
CROSS
JOIN ( SELECT w.winner AS team
FROM matches w
GROUP BY w.winner
) t
JOIN matches m
ON m.winner = t.team
ORDER BY t.team, m.date
) v
) z
WHERE z.cnt = 3
GROUP BY z.winner
Here's an example test case:
CREATE TABLE matches (`date` DATE, `winner` VARCHAR(12), `loser` VARCHAR(12), `row` INT);
INSERT INTO matches (`date`,`winner`,`loser`,`row`) VALUES
(STR_TO_DATE('3-03-2013', '%m-%d-%Y') ,'USA' ,'CHINA' ,1)
,(STR_TO_DATE('3-05-2013', '%m-%d-%Y') ,'USA' ,'RUSSIA' ,2)
,(STR_TO_DATE('3-06-2013', '%m-%d-%Y') ,'FRANCE' ,'GERMANY' ,3)
,(STR_TO_DATE('3-08-2013', '%m-%d-%Y') ,'USA' ,'RUSSIA' ,4)
,(STR_TO_DATE('3-10-2013', '%m-%d-%Y') ,'FRANCE' ,'RUSSIA' ,5)
,(STR_TO_DATE('3-12-2013', '%m-%d-%Y') ,'SRI LANKA','MALAYSIA' ,6)
,(STR_TO_DATE('3-14-2013', '%m-%d-%Y') ,'USA' ,'AUSTRALIA' ,7)
,(STR_TO_DATE('3-16-2013', '%m-%d-%Y') ,'FRANCE' ,'RUSSIA' ,8)
,(STR_TO_DATE('3-18-2013', '%m-%d-%Y') ,'USA' ,'NEW ZEALAND',9);
In the matches that 'USA' participated in, they won every time. They played 5 matches, and they won 5 matches.
France also won three matches that they participated in, with no "loss" between those wins.
The query in this answer reports both 'USA' and 'FRANCE' as winning "three in a row".

Related

Get employees who received a raise in 2 consecutive years

I am trying to Get employees who received a raise in 2 consecutive years, in this case employee 1000 is the right answer.
here is the data and the sql i have tried.
EID
SALARY
YEAR
1000
10,000
2016
1000
7,500
2015
1000
6,000
2014
1001
8,000
2016
1001
7,500
2015
1002
7,500
2016
1002
7,500
2015
1002
5,000
2014
1003
6,000
2016
1003
7,000
2015
1003
5,000
2014
i have used following code however it gets only row number by EID and not calcualtion of last year and present year, i need to find employee who got raise in 2 consecutive years.
output
select * ,
row_number() over(partition by eid order by salary and year desc)as rn
from gs;
You can do it using the LEAD window function that compute the two consecutive previous value of the salary. Then you can check how many employees have at least one row with salary1 < salary2 < salary3.
SELECT DISTINCT
eid
FROM (
SELECT
eid,
year,
salary,
(LEAD(salary, 1) OVER(PARTITION BY eid ORDER BY year DESC)) AS prev_salary1,
(LEAD(salary, 2) OVER(PARTITION BY eid ORDER BY year DESC)) AS prev_salary2
FROM
employees
) consecutive3
WHERE
salary > prev_salary1
AND
prev_salary1 > prev_salary2
The assumption is that there are no missing years for which a salary of a dependent was not recorded.
Here's the fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8c0d8a1deec8e77bb32a173656c3e386.
EDIT: Detailed explanation
Let's do the example of Jennifer, who has worked for five years and got these salaries:
2018 -> 65000
2017 -> 55000
2016 -> 50000
She's a candidate for being selected as her salary was raised three times consecutively.
1. LEAD(salary, 1) OVER(PARTITION BY eid ORDER BY year DESC)
Allows you to get the salary for year "Y" and the salary for year "Y-1":
("year" -> "salary", "previous_salary")
2018 -> 65000 , 55000
2017 -> 55000 , 50000
2016 -> 50000 , NULL
2. LEAD(salary, 2) OVER(PARTITION BY eid ORDER BY year DESC)
Allows you to get the salary for year "Y" and the salary for year "Y-1":
("year" -> "salary", "previous_salary", "previous_salary_by_2_years")
2018 -> 65000 , 55000 , 50000
2017 -> 55000 , 50000 , NULL
2016 -> 50000 , NULL , NULL
3. WHERE salary > prev_salary1 AND prev_salary1 > prev_salary2
Some filtering on the employees who
have their year3 salary higher than their year2 salary (salary > prev_salary1)
have their year2 salary higher than their year1 salary (prev_salary1 > prev_salary2)
I know that this has already been answered but here is my take using the lag function to determine if there was an increase from the previous year and ran that twice.
SELECT *
FROM (
SELECT
t2.*,
LAG(increase) over (partition by eid order by year) AS increaseNextYear
FROM (
SELECT
t1.*,
COALESCE(salary - LAG(salary) over (partition by eid order by year), 0) > 0 AS increase
FROM tbl_test t1
) as t2
) t3 where increase AND increaseNextYear
with
dates as
(
select
a.*,
dense_rank() OVER (
partition by eid
order by year desc, salary
)
as rn,
case
when
lead(salary,2)over(partition by eid order by year, salary)
>
lead(salary,1)over(partition by eid order by year, salary)
and
lead(salary,1)over(partition by eid order by year, salary)
>
salary
then
1
else
0
end
as flag
from
employees a
)
select
eid
from
dates
where
rn = 3
and flag = 1
Not a beautiful query, but straight-forward: find employees who had a salary in a year where the salary in the previous year was lower and the salary in the year before that even lower. Using LAG is more elegant, but I thought I'd throw this in, just to show an alternative.
select *
from employee
where exists
(
select null
from gs
where gs.eid = employee.id
and exists
(
select null
from gs prev
where prev.eid = gs.eid
and prev.year = gs.year - 1
and prev.salary < gs.salary
and exists
(
select null
from gs prevprev
where prevprev.eid = prev.eid
and prevprev.year = prev.year - 1
and prevprev.salary < prev.salary
)
)
);
Same thing with a join:
select *
from employee
where exists
(
select null
from gs
join gs prev on prev.eid = gs.eid
and prev.year = gs.year - 1
and prev.salary < gs.salary
join gs prevprev on prevprev.eid = prev.eid
and prevprev.year = prev.year - 1
and prevprev.salary < prev.salary
where gs.eid = employee.id
);
For versions prior to 8.0 (mine is 5.7) which lack the cutting edge features of the newer one, I tried a procedure to accomplish the job. First and foremost, get all the eid which have no less than three years' salary record, which is the minimum requirement of the consecutive bonus. The rest is to fetch and compare using a cursor from the eid pool. The result is stored in a temporary table t .
delimiter //
drop procedure if exists lucky_emp//
create procedure lucky_emp()
begin
declare old_eid int default 0;
declare old_salary int;
declare new_eid int ;
declare new_salary int;
declare bonus_year int;
declare fin bool default false;
declare c cursor for select eid,salary from salary where eid in(select eid from salary group by eid having count(eid)>=3) order by eid,year;
declare continue handler for not found set fin=true;
drop temporary table if exists t ;
create temporary table t (t_eid int);
open c;
lp:loop
fetch c into new_eid ,new_salary;
if fin=true then
leave lp;
end if;
if new_eid !=old_eid then
set old_eid=new_eid,old_salary=0,bonus_year=0;
end if;
if new_salary> old_salary then
set bonus_year=bonus_year+1,old_salary=new_salary;
else
set bonus_year=0;
end if;
if bonus_year=3 and new_eid not in(select t_eid from t) then
insert t values(new_eid);
end if;
end loop lp;
end//
delimiter ;
select * from t ;
Select a.*, b.prev_sal1, b.prev_sal2
from employees a
join (
Select eid ,year,
lag(salary,1) over (partition by eid order by year) as prev_sal1,
lag(salary,2) over (partition by eid order by year) as prev_sal2
from employees ) b
on a.eid=b.eid
and a.year = b.year
where salary>prev_sal1 and prev_sal1>prev_sal2
fiddle: https://dbfiddle.uk/rfGv31zM

SQL query, how to improve?

I did a task to write an SQL query and I wonder if I can improve it somehow.
Description:
Let's say we have a db on some online service
Let's create tables, and insert some data
create table players (
player_id integer not null unique,
group_id integer not null
);
create table matches (
match_id integer not null unique,
first_player integer not null,
second_player integer not null,
first_score integer not null,
second_score integer not null
);
insert into players values(20, 2);
insert into players values(30, 1);
insert into players values(40, 3);
insert into players values(45, 1);
insert into players values(50, 2);
insert into players values(65, 1);
insert into matches values(1, 30, 45, 10, 12);
insert into matches values(2, 20, 50, 5, 5);
insert into matches values(13, 65, 45, 10, 10);
insert into matches values(5, 30, 65, 3, 15);
insert into matches values(42, 45, 65, 8, 4);
The output of the query should be:
group_id | winner_id
--------------------
1 | 45
2 | 20
3 | 40
So, we should output the winner (player id) of each group. Winner is the player, who got max amount of points in matches.
If user is alone in the group - he's a winner automatically, in case players have equal amount of points - the winner is the one, who has lower id value.
Output should be ordered by group_id field
My solution:
SELECT
results.group_id,
results.winner_id
FROM
(
SELECT
summed.group_id,
summed.player_id AS winner_id,
MAX(summed.sum) AS total_score
FROM
(
SELECT
mapped.player_id,
mapped.group_id,
SUM(mapped.points) AS sum
FROM
(
SELECT
p.player_id,
p.group_id,
CASE WHEN p.player_id = m.first_player THEN m.first_score WHEN p.player_id = m.second_player THEN m.second_score ELSE 0 END AS points
FROM
players AS p
LEFT JOIN matches AS m ON p.player_id = m.first_player
OR p.player_id = m.second_player
) AS mapped
GROUP BY
mapped.player_id
) as summed
GROUP BY
summed.group_id
ORDER BY
summed.group_id
) AS results;
It works, but I'm 99% sure it can be cleaner. Will be thankful for any suggestions
First, use UNION ALL to extract from matches 2 columns: player_id and score for all players and their scores.
Then aggregate to get each player's total score.
Finally do a LEFT join of players to the resultset you obtained, use GROUP_CONCAT() to collect all players of each group in descending order respective to their total score and with SUBSTRING_INDEX() pick the 1st player:
SELECT p.group_id,
SUBSTRING_INDEX(GROUP_CONCAT(p.player_id ORDER BY t.score DESC, p.player_id), ',', 1) winner_id
FROM players p
LEFT JOIN (
SELECT player_id, SUM(score) score
FROM (
SELECT first_player player_id, first_score score FROM matches
UNION ALL
SELECT second_player, second_score FROM matches
) t
GROUP BY player_id
) t ON t.player_id = p.player_id
GROUP BY p.group_id;
See the demo.
Note that, by doing a LEFT join, you get in the results all groups, even the ones that do not have any players that participated in any match (just like your sample data), in which case the winner is an arbitrary player (just like your expected results).
You can unpivot the matches table and sum the points per player (which is I think what you want):
select p.player_id, p.group_id, sum(score) as sum_score
from ((select first_player as player_id, first_score as score
from matches
) union all
(select second_player as player_id, second_score as score
from matches
)
) mp
players p
using (player_id)
group by p.player_id, p.group_id;
Next, you can introduce a window function to get the top:
select player_id, group_id, sum_score
from (select p.player_id, p.group_id, sum(score) as sum_score,
row_number() over (partition by p.group_id order by sum(score) desc p.player_id asc) as seqnum
from ((select first_player as player_id, first_score as score
from matches
) union all
(select second_player as player_id, second_score as score
from matches
)
) mp
players p
using (player_id)
group by p.player_id, p.group_id
) pg
where seqnum = 1;
If you actually want the maximum score over all the matches rather than the sum(), then use max() instead of sum().
Here's another way:
WITH match_records AS (
SELECT match_id,first_player players, first_score scores FROM matches UNION ALL
SELECT match_id,second_player, second_score FROM matches
)
SELECT group_id, player_id
FROM
(SELECT group_id, player_id, players, SUM(scores) ts,
ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY SUM(scores) DESC) pos
FROM players p LEFT JOIN match_records mr ON mr.players=p.player_id
GROUP BY group_id, player_id, players) fp
WHERE pos=1
ORDER BY group_id;
It's basically the same idea as others (to un-pivot the matches table) but with a slightly different operation.
Demo fiddle

Querying Customers who have rented a movie at least once every week or in the Weekend

I have a DB for movie_rental. The Tables I have are for :
Customer Level:
Primary key: Customer_id(INT)
first_name(VARCHAR)
last_name(VARCHAR)
Movie Level:
Primary key: Film_id(INT)
title(VARCHAR)
category(VARCHAR)
Rental Level:
Primary key: Rental_id(INT).
The other columns in this table are:
Rental_date(DATETIME)
customer_id(INT)
film_id(INT)
payment_date(DATETIME)
amount(DECIMAL(5,2))
Now the question is to Create a master list of customers categorized by the following:
Regulars, who rent at least once a week
Weekenders, for whom most of their rentals come on Saturday and Sundays
I am not looking for the code here but the logic to approach this problem. Have tried quite a number of ways but was not able to form the logic as to how I can look up for a customer id in each week. The code I tried is as follows:
select
r.customer_id
, concat(c.first_name, ' ', c.last_name) as Customer_Name
, dayname(r.rental_date) as day_of_rental
, case
when dayname(r.rental_date) in ('Monday','Tuesday','Wednesday','Thursday','Friday')
then 'Regulars'
else 'Weekenders'
end as Customer_Category
from rental r
inner join customer c on r.customer_id = c.customer_id;
I know it is not correct but I am not able to think beyond this.
First, you don't need the customer table for this. You can add that in after you have the classification.
To solve the problem, you need the following information:
The total number of rentals.
The total number of weeks with a rental.
The total number of weeks overall or with no rental.
The total number of rentals on weekend days.
You can obtain this information using aggregation:
select r.customer_id,
count(*) as num_rentals,
count(distinct yearweek(rental_date)) as num_weeks,
(to_days(max(rental_date)) - to_days(min(rental_date)) ) / 7 as num_weeks_overall,
sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
from rental r
group by r.customer_id;
Now, your question is a bit vague on thresholds and what to do if someone only rents on weekends but does so every week. So, I'll just make arbitrary assumptions for the final categorization:
select r.customer_id,
(case when num_weeks > 10 and
num_weeks >= num_weeks_overall * 0.9
then 'Regular' -- at least 10 weeks and rents in 90% of the weeks
when weekend_rentals >= 0.8 * num_rentals
then 'Weekender' -- 80% of rentals are on the weekend'
else 'Hoi Polloi'
end) as category
from (select r.customer_id,
count(*) as num_rentals,
count(distinct yearweek(rental_date)) as num_weeks,
(to_days(max(rental_date)) - to_days(min(rental_date)) ) / 7 as num_weeks_overall,
sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
from rental r
group by r.customer_id
) r;
The problem with the current approach is that every rental of every customer will be treated separately. I am assuming a customer might rent more than once and so, we will need to aggregate all rental data for a customer to calculate the category.
So to create the master table, you have mentioned in the logic that weekenders are customers "for whom most of their rentals come on Saturday and Sundays", whereas regulars are customers who rent at least once a week.
2 questions:-
What is the logic for "most" for weekenders?
Are these two categories mutually exclusive? From the statement it does not seem so, because a customer might rent only on a Saturday or a Sunday.
I have tried a solution in Oracle SQL dialect (working but performance can be improved) with the logic being thus: If the customer has rented more on weekdays than on weekends, the customer is a Regular, else a Weekender. This query can be modified based on the answers to the above questions.
select
c.customer_id,
c.first_name || ' ' || c.last_name as Customer_Name,
case
when r.reg_count>r.we_count then 'Regulars'
else 'Weekenders'
end as Customer_Category
from customer c
inner join
(select customer_id, count(case when trim(to_char(rental_date, 'DAY')) in ('MONDAY','TUESDAY','WEDNESDAY','THURSDAY','FRIDAY') then 1 end) as reg_count,
count(case when trim(to_char(rental_date, 'DAY')) in ('SATURDAY','SUNDAY') then 1 end) as we_count
from rental group by customer_id) r on r.customer_id=c.customer_id;
Updated query based on clarity given in comment:-
select
c.customer_id,
c.first_name || ' ' || c.last_name as Customer_Name,
case when rg.cnt>0 then 1 else 0 end as REGULAR,
case when we.cnt>0 then 1 else 0 end as WEEKENDER
from customer c
left outer join
(select customer_id, count(rental_id) cnt from rental where trim(to_char(rental_date, 'DAY')) in ('MONDAY','TUESDAY','WEDNESDAY','THURSDAY','FRIDAY') group by customer_id) rg on rg.customer_id=c.customer_id
left outer join
(select customer_id, count(rental_id) cnt from rental where trim(to_char(rental_date, 'DAY')) in ('SATURDAY','SUNDAY') group by customer_id) we on we.customer_id=c.customer_id;
Test Data :
insert into customer values (1, 'nonsensical', 'coder');
insert into rental values(1, 1, sysdate, 1, sysdate, 500);
insert into customer values (2, 'foo', 'bar');
insert into rental values(2, 2, sysdate-5, 2, sysdate-5, 800); [Current day is Friday]
Query Output (first query):
CUSTOMER_ID CUSTOMER_NAME CUSTOMER_CATEGORY
1 nonsensical coder Regulars
2 foo bar Weekenders
Query Output (second query):
CUSTOMER_ID CUSTOMER_NAME REGULAR WEEKENDER
1 nonsensical coder 0 1
2 foo bar 1 0
This is a study of cohorts. First find the minimal expression of each group:
# Weekday regulars
SELECT
customer_id
FROM rental
WHERE WEEKDAY(`date`) < 5 # 0-4 are weekdays
# Weekend warriors
SELECT
customer_id
FROM rental
WHERE WEEKDAY(`date`) > 4 # 5 and 6 are weekends
Now we know how to get a listing of customers who have rented on weekdays and weekends, inclusive. These queries only actually tell us that these were customers who visited on a day in the given series, hence we need to make some judgements.
Let's introduce a periodicity, which then allows us to gain thresholds. We'll need aggregation too, so we're going to count the weeks that are distinctly knowable by grouping to the rental.customer_id.
# Weekday regulars
SELECT
customer_id
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(`date`) < 5
GROUP BY customer_id
# Weekend warriors
SELECT
customer_id
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(`date`) > 4
GROUP BY customer_id
We also need a determinant period:
FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS weeks_in_period
Put those together:
# Weekday regulars
SELECT
customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(`date`) < 5
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
GROUP BY customer_id
# Weekend warriors
SELECT
customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
WHERE WEEKDAY(`date`) > 4
GROUP BY customer_id
So now we can introduce our threshold accumulator per cohort.
# Weekday regulars
SELECT
customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(`date`) < 5
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
GROUP BY customer_id
HAVING total_weeks = weeks_as_customer
# Weekend warriors
SELECT
customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
WHERE WEEKDAY(`date`) > 4
GROUP BY customer_id
HAVING total_weeks = weeks_as_customer
Then we can use these to subquery our master list.
SELECT
customer.customer_id
, CONCAT(customer.first_name, ' ', customer.last_name) as customer_name
, CASE
WHEN regulars.customer_id IS NOT NULL THEN 'regular'
WHEN weekenders.customer_id IS NOT NULL THEN 'weekender'
ELSE NULL
AS category
FROM customer
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
LEFT JOIN (
SELECT
rental.customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(rental.`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(rental.`date`) < 5
GROUP BY rental.customer_id
HAVING total_weeks = weeks_as_customer
) AS regulars ON customer.customer_id = regulars.customer_id
LEFT JOIN (
SELECT
rental.customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(rental.`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(rental.`date`) > 4
GROUP BY rental.customer_id
HAVING total_weeks = weeks_as_customer
) AS weekenders ON customer.customer_id = weekenders.customer_id
HAVING category IS NOT NULL
There is some ambiguity as far as whether cross-cohorts are to be left out (regulars who missed a week because they rented on the weekend-only at least once, for instance). You would need to work this type of inclusivity/exclusivity question out.
This would involve going back to the cohort-specific queries to introduce and tune the queries to explain that degree of further comprehension, and/or add other cohort cross-cutting subqueries that can be combined in other ways to establish better and/or more comprehensions at the top view.
However, I think what I've provided matches reasonably with what you've provided given this caveat.

Minimum number of Meeting Rooms required to Accomodate all Meetings in MySQL

I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?
Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.
Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.
CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0
Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;
Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);
Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)
I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a
IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha

2 different AVG column in 1 table with select

I am trying to make an table that shows an AVG of pickpockets in district with markets and an AVG of pickpockets in districts without Market.
i would like to have the output like this:
district with market | district without market
----------------------------------------------
269 | 34
but instead i get this:
district with market | district without market
----------------------------------------------
269 | 269
34 | 34
this is the query i used:
select round(avg(average),0) as districts_with_markets, round(avg(average),0) as districts_without_markets
from zakkenrollerij
where wijk in (select district
from market)
union
select round(avg(average),0) as districts_with_markets, round(avg(average),0) as districts_without_markets
from zakkenrollerij
where wijk not in (select district
from market)
I hope someone can help me :D
Assuming that distict is unique in market, then you can do this with a left join and conditional aggregation:
select round(avg(case when m.district is not null then average end), 0) as districts_with_markets,
round(avg(case when m.district is null then average end), 0) as districts_without_markets
from zakkenrollerij z left join
market m
on m.district = z.wijk;
If this is not the case, then use a subquery and a flag:
select round(avg(case when hasMarketFlag then average end), 0) as districts_with_markets,
round(avg(case when not hasMarketFlag then average end), 0) as districts_without_markets
from (select z.*,
(exists (select 1
from market m
where m.district = z.wijk
)
) as hasMarketFlag
from zakkenrollerij z;
Try this:-
Select sum(dist_with_markets) as district_with_markets,
sum(dist_without_markets) as district_without_markets
from
(
select round(avg(average),0) as dist_with_markets, 0 as dist_without_markets
from zakkenrollerij
where wijk in (select district
from market)
union
select 0 as dist_with_markets, round(avg(average),0) as dist_without_markets
from zakkenrollerij
where wijk not in (select district
from market) ) a;
Hope this helps:-)
Please try the following...
SELECT ROUND( AVG( with_markets ) ) AS districts_with_markets,
ROUND( AVG( without_markets ) ) AS without_markets
FROM ( SELECT average AS with_markets
NULL AS without_markets
FROM zakkenrollerij
WHERE wijk IN ( SELECT district
FROM market )
UNION
SELECT NULL,
average
FROM zakkenrollerij
WHERE wijk NOT IN ( SELECT district
FROM market )
) AS tempTable;
This starts by forming a list of all the values of average within zakkenrollerij that qualify as within. No attempt to perform calculations is made at this stage. The second column is for those values that qualify as without - all of its values will be set to NULL at this stage.
This list is then joined vertically with its without counterpart using the UNION operator.
The joined list then has the ROUND( AVG() ) operations performed upon its columns.
If you have any questions or comments, then please feel free to post a Comment accordingly.
You are now going trough the table twice, selecting the same variable twice under different names:
round(avg(average),0) as districts_with_markets, round(avg(average),0) as districts_without_market
Instead of uniting the tables afterwards you can use CASE to select a variable with specific conditions. This should give the wanted result:
select round(avg(case when wijk in (select district from market) then average else null end),0) as districts_with_markets,
round(avg(case when wijk not in (select district from market) then average else null end),0) as districts_without_markets
from zakkenrollerij