MySQL: count consecutive rows with the same value - mysql

I have read a few similar questions on counting consecutive rows, but none of them gave me a clear answer. I hope someone could give me some help with my problem. I have the following table as an example.
create table medical
(PatientID int,
Date Date,
TookTest int
);
insert into medical(PatientID, Date, TookTest)
values
(1, '2014-01-01', 1),
(1, '2014-01-05', 1),
(1, '2014-01-10', 1),
(2, '2014-01-01', 1),
(2, '2014-01-10', 0),
(2, '2014-01-20', 1),
(3, '2014-01-01', 1),
(3, '2014-01-07', 1),
(3, '2014-01-12', 1),
(3, '2014-01-21', 1),
(4, '2014-01-03', 1),
(4, '2014-01-05', 1),
(4, '2014-01-22', 0),
(4, '2014-01-27', 1)
This table is used to find out which patient took a medical test on certain dates. The PatientID and date columns are pretty self-explanatory. The last column, TookTest is a binary indicator column where 1 indicates that a patient took a test and 0 otherwise. The patientID and date are sorted at the time of this table's creation. I would like to count the number of patients who took tests at least 3 times consecutively. In our example, PatientID 1 and 3 took 3 or more tests. So the answer is 2. Could anyone show me how to write a query in MySQL? Thanks for your help in advance!

SELECT
m_id
FROM(
SELECT
m.PatientID AS m_id,
m.Date AS m_date,
m.TookTest,
IF(m.TookTest = 1 AND #b = m.PatientID, #a := #a +1, #a := 0) AS new_count,
#b := m.PatientID
FROM medical m
JOIN (
SELECT
#a := 0,
#b := 0
) AS t
) AS TEMP
WHERE new_count >= 2
GROUP BY m_id
this does the calculation for you.. only thing is it looks a little weird because the count starts at 0 instead of 1 so if its 3 consecutive the count will be 2. this does what you requested..... see the fiddle if you have questions http://sqlfiddle.com/#!2/22ba28/12

This code also works, too.
set #test = 0, #id=0, #count=0;
select m.id, max(count)
from (
select
#count := if(TookTest = 1 and PatientID = #id, #count+1, 0) as count,
#test := Tooktest,
#id := PatientID as id
from medical) as m
group by m.id
having max(count) >=2;
This code counts the historical max consecutive rows of TookTest as opposed to the most recent consecutive row count (This distinction is not relevant here because the example data is too small to make any difference between most recent consecutive count and historical max consecutive count.)
My coding background is R, Python and Java. Possibly because of my personal particular coding experience, it is hard for me to grasp using join twice in this context. My code above is a way to get around it. I hope this answer helps others in a similar situation like mine.

Related

I attended a Hackerrank online test and I got this question, I couldn't solve it, Help me out if you can

Question's Image
I am not able to understand how to show the details for the same person for different dates, what do I group the data by for this to happen?
I have added an Image of the question do check the link out on top, it won't let me post embedded images as I am new to stackoverflow.
I have made a sample test case tables for the problem for your convinence.
Please help out if you can.
Create Table delivery
(
deliveryId int primary key,
delivery_date date,
De_Id int ,
Pickup_time time ,
delivery_time time
);
Insert Into delivery Values (450, '2020-04-17' , 111, '8:00', '9:00');
Insert Into delivery Values (451, '2020-04-17' , 111, '21:00', '23:00');
Insert Into delivery Values (452, '2020-04-17' , 111, '11:00', '11:30');
Insert Into delivery Values (453, '2020-04-17' , 112, '2:00', '3:35');
Insert Into delivery Values (454, '2020-04-17' , 112, '4:00', '4:40');
Insert Into delivery Values (455, '2020-04-17' , 112, '5:00', '7:00');
Insert Into delivery Values (456, '2020-04-18' , 111, '9:00', '11:00');
Insert Into delivery Values (457, '2020-04-18' , 111, '8:50', '9:55');
Insert Into delivery Values (458, '2020-04-18' , 111, '7:00', '9:06');
Insert Into delivery Values (459, '2020-04-18' , 112, '2:00', '3:35');
Insert Into delivery Values (460, '2020-04-18' , 112, '4:00', '4:40');
Insert Into delivery Values (461, '2020-04-18' , 112, '5:00', '7:00');
Create Table delivery_executive
(
ID int primary key,
Name varchar(20)
);
Insert into delivery_executive Values (111, 'Abby');
Insert into delivery_executive Values (112, 'Binto');
Here's one way to solve using MySql using row_number to order each delivery by the difference between the times, then filtering out all except the highest two, then using a conditional aggregate to pivot both rows with columns for each Id
with t as (
select deliveryid, de_id, delivery_date,
Row_Number() over(partition by delivery_date, de_id order by subtime(delivery_time,Pickup_time) desc)rn
from delivery
)
select e.Name, t.delivery_date,
Max(case when rn=1 then t.deliveryid end) FirstDeliveryId, Max(case when rn=2 then t.deliveryid end) SecondDeliveryId
from t
join delivery_executive e on e.id=t.de_id
where rn<=2
group by t.delivery_date, e.Name
How max and case help.
It helps to work through in stages. So firstly if we start with just what the CTE (the with t as ) element returns for the sample data, we see 12 rows - 6 for each date - each with 3 rows per de_id and a row number rn ordered 1-3 against each group with 1 having the longest time between the pickup and delivery times and 3 the shortest.
If we just look at select * from t where rn<=2, that naturally removes the the rows where rn=3 and we're left with the data we want, the two longest delivery time periods for each date, for each executive.
The desired output is to have just two rows per each executive, one for each date, and the two IDs to be in two columns.
Looking at case first, we can add two new columns as the result of a case expression,
case when rn=1 then t.deliveryId end,
case when rn=2 then t.deliveryId end
The first case will have a value only where rn=1 likewise the second case will have a value only where rn=2 (by default without an else it returns NULL on all other non-matched cases).
So now we have our First and Second Id values but they are still split over two rows.
The other values on each pair of rows (rn=1 and rn=2), are the same in each case, being the same delivery_date and the same de_id, so this is now where we can aggregate the rows together.
The final selected columns are the de_id and delivery_date and we want to group by these - ie all the values that are the same are condensed onto a single row provided the columns that we do not group by are aggregated in some way.
The most common aggregation would be sum, for example if we were only selecting the row number column we could sum(rn) and the result would be 3 for each aggregated group (1+2). However using max it will return the maximum value for each group of rows, and since we only have either a value (deliveryId) or NULL, it returns the deliveryId.
Therefore we wrap each case expression with a max() function, group by the remaining columns and the result is the grouped columns occupy a single row in each case since the aggregate of the case expressions also returns 1 row.
Fiddle here

SSRS - avg function by subtotals

I have details, subtotals and totals.
When I put avg function in totals line I have avg of every row.
I need avg of subtotals
How to do it?
week1
day1..... 2
day3..... 3
day4..... 4
day6..... 2
total.... 11 sum()
week2
day1..... 3
day2..... 2
total..... 5 sum()
Total
........... 16 sum() OK
............ 2,66666 avg() here should be (11+5)/2 =8
Result after implementing solution
I created a dataset to replicate your sample data as follows:
DECLARE #t TABLE (week int, day int, amount int)
INSERT INTO #t VALUES
(1, 1, 2),
(1, 3, 3),
(1, 4, 4),
(1, 6, 2),
(2, 1, 3),
(2, 2, 2)
SELECT * FROM #t
I then built a simple tablix as you had done (more or less)
I included the incorrect results you had for illustration and then added a new expression to calculate this correctly.
The result looks like this
You can ignore the other datasets, this is just a report I use for testing. Only dataset3 is used here.
The expression used was this
=Sum(Fields!amount.Value) / CountDistinct(Fields!week.Value)
You'll just need to edit this to match your field names. It basically just sums all the detail amounts then divides by the number of distinct weeks in the dataset.

SQL Query for exact match in many to many relation

I have the following tables(only listing the required attributes)
medicine (id, name),
generic (id, name),
med_gen (med_id references medicine(id),gen_id references generic(id), potency)
Sample Data
medicine
(1, 'Crocin')
(2, 'Stamlo')
(3, 'NT Kuf')
generic
(1, 'Hexachlorodine')
(2, 'Methyl Benzoate')
med_gen
(1, 1, '100mg')
(1, 2, '50ml')
(2, 1, '100mg')
(2, 2, '60ml')
(3, 1, '100mg')
(3, 2, '50ml')
I want all the medicines which are equivalent to a given medicine. Those medicines are equivalent to each other that have same generic as well as same potency. In the above sample data, all the three have same generics, but only 1 and three also have same potency for the corresponding generics. So 1 and 3 are equivalent medicines.
I want to find out equivalent medicines given a medicine id.
NOTE : One medicine may have any number of generics. Medicine table has around 102000 records, generic table around 2200 and potency table around 200000 records. So performance is a key point.
NOTE 2 : The database used in MySQL.
One way to do it in MySQL is to leverage GROUP_CONCAT() function
SELECT g.med_id
FROM
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id = 1 -- here 1 is med_id for which you're trying to find analogs
) o JOIN
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id <> 1 -- here 1 is med_id for which you're trying to find analogs
GROUP BY med_id
) g
ON o.gen_id = g.gen_id
AND o.potency = g.potency
Output:
| MED_ID |
|--------|
| 3 |
Here is SQLFiddle demo

Query to get categorised sets of splits

Given this table structure:
CREATE TABLE IF NOT EXISTS splits (
id INT AUTO_INCREMENT,
sector_id INT,
type VARCHAR(100),
percentage INT,
PRIMARY KEY (id),
INDEX (type)
) ENGINE MyISAM;
And this data set:
INSERT INTO splits (sector_id, type, percentage) VALUES
(1, 'Manager', '50'),
(1, 'Sales Rep', '50'),
(2, 'Manager', '75'),
(2, 'Sales Rep', '25'),
(3, 'Manager', '75'),
(3, 'Sales Rep', '25'),
(4, 'Manager', '100'),
(5, 'Manager', '100'),
(6, 'Manager', '100');
How could I return the amount of sectors that split in the same way:
Like this:
Split | Number
---------------+-------
50% M / 50% SR | 1
75% M / 25% SR | 2
100% M | 3
So this shows 1 sector (id 1) has a split ratio of 50/50, 2 sectors have a split ratio of 75/25 (ids 2, 3) and 3 sectors have a split ratio off 100/0 (ids 4, 5, 6).
Here is a SQL Fiddle with the database setup: http://sqlfiddle.com/#!2/6b19f/1
What have you tried?
I cannot even think of where to start to solve this problem, so I apologise for not being able to show an attempted solution. I will update this question if I get anywhere.
The reason why I want to do this all in the database (and not the application) is because our automated reporting tools can be pointed to a table/view/query and automatically apply filtering, sorting, charting etc. To do it manually in the application loses all the default functionality.
I don't really understand the problem. Your DB contains already all the data you want to retrieve?!
SELECT
sector_id AS Number,
type
percentage
FROM
splits
The easiest thing would now to take you software and then turn those (type-percentage)-tuples into strings. Why do you need the database to create and concat this string?
Can there be more than 2 types?
For Postgres I'd use an array of tuples for output:
SELECT
sector_id,
array_agg(row(percentage, type))
FROM
splits
GROUP BY
sector_id
Correct Query:
SELECT
x.y,
COUNT(*) c
FROM (
SELECT
sector_id,
GROUP_CONCAT(CONCAT(percentage, '% '), type SEPARATOR ' / ') AS y
FROM (
SELECT
sector_id,
type,
percentage
FROM splits
ORDER BY sector_id, type
) z
GROUP BY sector_id
) x
GROUP BY x.y
ORDER by c
Result will look like this:
50% Manager / 50% Sales Rep | 1
75% Manager / 25% Sales Rep | 2
100% Manager | 3

return rows where sum on a field less than a given value

My knowledge of MySQL is basic. I want to build a query to return all rows that sum a given value, in ascending order. I can't figure out how I can do that. Using sum() only returns one row. I've tried a subquery but it returns all rows. I don't want anybody do my work, I just want you to help me to figuring this out.
Anybody have an idea?
How to retrieve all rows that its filed "value" sum 30
Example:
given value: 30
field to sum: value
table:
id name value order
1 name1 3 1
2 name2 10 6
3 name3 13 3
4 name4 5 8
5 name5 20 25
So, the query must return:
id 1, id 3, id 2, id 4
Thanks in advance.
set #total:=0;
select id, name, value, `order`
from
(select
id, name, value, `order`,
#total:=if(#total is null, 0, #total)+`order` as total
from THE_TABLE
order by `order`
) as derived
where total<=30;
Using postgres as database, I think this does what you want. I'm not sure if it works similar in mysql:
CREATE TABLE test (
id int,
name varchar(50),
value int,
order_ int
);
INSERT INTO test values (1, 'name1', 3, 1);
INSERT INTO test values (3, 'name3', 13, 3);
INSERT INTO test values (2, 'name2', 10, 6);
INSERT INTO test values (4, 'name4', 5, 8);
INSERT INTO test values (5, 'name5', 20, 25);
SELECT * FROM (SELECT *, SUM(value) OVER (ORDER BY order_) sumvalues FROM TEST) a WHERE sumvalues <30