Is it possible to change the following mySQL query to use a join instead of a subquery for efficiency (or another way to increase efficiency)? I have a table with patient visits to an emergency department. The table lists arrival and departure time. I need the query to return the total number of patients that were already present in the emergency department (the "census") when the patient arrived.
My table looks something like this:
+------+------+---------------------+---------------------+
| id | name | arrival | departure |
+------+------+---------------------+---------------------+
| 1 | Joe | 2010-01-01 00:00:00 | 2010-01-01 02:00:00 |
| 2 | John | 2010-01-01 00:05:00 | 2010-01-01 03:00:00 |
| 3 | Jane | 2010-01-01 01:00:00 | 2010-01-01 04:00:00 |
...
With a desired result like this:
+------+--------+
| name | census |
+------+--------+
| Joe | 0 |
| John | 1 |
| Jane | 2 |
...
The following query works, but is quite slow (about 3.5 seconds on 180,000 rows). Is there a way to increase the efficiency of this query (with some sort of join, or other method)?
select name, arrival,
(SELECT count(*)
FROM patient_arrivals as b
WHERE b.arrival <= a.arrival and b.departure >= a.departure) as census
FROM patient_arrivals as a
I don't think a join will help. Instead, you need to restructure the query. The following gets the number of patients in the room at any particular time:
select t, sum(num) as num, #total := #total + num as total
from (select arrival as t, 1 as num
from patient_arrivals
union all
select departure, -1
from patient_arrivals
) t cross join
(select #total := 0) vars
group by t
order by t
Then, you can use this as a subquery for the join:
select pa.*, t.total as census
from patient_arrivals pa join
(select t, sum(num) as num, #total := #total + num as total
from (select arrival as t, 1 as num
from patient_arrivals
union all
select departure, -1
from patient_arrivals
) t cross join
(select #total := 0) vars
group by t
order by t
) tnum
on pa.arrival = tnum.t;
This gives the number when the patient arrives. For the total that overlap:
select pa.*, max(t.total) as census
from patient_arrivals pa join
(select t, sum(num) as num, #total := #total + num as total
from (select arrival as t, 1 as num
from patient_arrivals
union all
select departure, -1
from patient_arrivals
) t cross join
(select #total := 0) vars
group by t
order by t
) tnum
on tnum.t between pa.arrival and pa.departure
group by pa.id
Related
I have a table called order_match which contain order_buyer_Id as the id of the transaction, createdby as the id of the buyer, and createdAt as the date when the transaction happened and quantity as the quantity of each order.
In this case, I want to count of the order (order_buyer_Id) for each buyer (createdby) and find out the maximum and the minumum count after that.
this is the example data:
+----------------+-----------+------------+--------+
| order_buyer_id | createdby | createdAt |quantity|
+----------------+-----------+------------+--------+
| 19123 | 19 | 2017-02-02 |0.4 |
| 193241 | 19 | 2017-02-02 |0.5
| 123123 | 20 | 2017-02-02 |1 |
| 32242 | 20 | 2017-02-02 |4
| 32434 | 20 | 2017-02-02 |3 |
+----------------+-----------+------------+---------
and if I run the query, the expected result is:
+-----+-----+---------+--------+
| max | min | average | median |
+-----+-----+---------+--------+
| 4 | 0.4 | 1,78 | 1 |
+-----+-----+---------+---------
This is the fiddle
http://www.sqlfiddle.com/#!9/d89772/15
and this is my query
SELECT MAX(quantity) AS max,
MIN(quantity) AS min,
AVG(quantity) AS average,
AVG(CASE WHEN rn IN (FLOOR((#tr+1)/2), FLOOR((#tr+2)/2)) THEN quantity END) AS median
FROM (
SELECT count,
#rn := #rn + 1 AS rn,
#tr := #rn AS tr
FROM (
SELECT COUNT(*) AS count
FROM order_match
GROUP BY order_buyer_Id
order by quantity
) o
CROSS JOIN (SELECT #rn := 0) init
) c
You are getting the error because quantity is not in your subquery.
Either you have join with your table again to get the quantity or you can include the quantity in you select (based on your sample data even group by with quantity gives the same result)
SELECT MAX(quantity) AS max,
MIN(quantity) AS min,
AVG(quantity) AS average,
AVG(CASE WHEN rn IN (FLOOR((#tr+1)/2), FLOOR((#tr+2)/2)) THEN quantity END) AS median
FROM (
SELECT count, quantity,
#rn := #rn + 1 AS rn,
#tr := #rn AS tr
FROM (
SELECT COUNT(*) AS count,Quantity
FROM order_match
GROUP BY order_buyer_Id,Quantity
order by quantity
) o
CROSS JOIN (SELECT #rn := 0) init
) c
SQL FIDDLE
SELECT t.max,t.min,t.average,0.00 AS 'Median'
FROM
(SELECT MAX(quantity) AS max,
MIN(quantity) AS min,
SUM(quantity)/COUNT(distinct created_by) AS average
FROM order_match)t
union
SELECT 0.00 AS 'max',0.00 AS 'min',0.00 AS 'Average',
((2*t1.average/3)+t1.mode) AS 'Median'
FROM (SELECT count(FLOOR(quantity)),IFNULL(FLOOR(quantity),min(quantity)) AS 'mode'
FROM order_match GROUP BY quantity HAVING
count(FLOOR(quantity))>1)t1
imagine we have 1 row which is students that contain, id, name, marks and rank. write query that return the last name of student where marks is equal to 100 ordered by grade.
example
- id | name | marks | grade |
- 01 | Jeff | 40 | 1 |
- 02 | Annie| 40 | 3 |
- 03 | Ramy | 20 | 5 |
- 04 | Jenny| 20 | 2 |
so the result should return
Annie
because Annie is the last row of the sum of marks where marks is equal to 100. Jeff is the first cause based on grade he's equal to 1 so he should be entered first, second is Jenny and third is Annie. Jeff(40)+Jenny(20)+Annie(40) = 100
You can make a running sum MySQL's user variable.
This query should work from MySQL 5.1 and up.
Query
SELECT
Table1_alias.name
FROM (
SELECT
Table1.name
, (#running_marks_sum := #running_marks_sum + Table1.marks) AS running_marks_sum
FROM
Table1
CROSS JOIN (SELECT #running_marks_sum := 0) AS init_user_param
ORDER BY
Table1.grade ASC
) AS Table1_alias
WHERE
Table1_alias.running_marks_sum = 100
Result
| name |
| ----- |
| Annie |
View on DB Fiddle
MySQL 8.0+ only
Query
SELECT
Table1_alias.name
FROM (
SELECT
Table1.name
, SUM(Table1.marks) OVER(ORDER BY Table1.grade) AS running_marks_sum
FROM
Table1
) AS Table1_alias
WHERE
Table1_alias.running_marks_sum = 100;
Result
| name |
| ----- |
| Annie |
View on DB Fiddle
Keep the cumulative sum of marks to a variable. And use this as a sub-query and select the row having the total is 100. But if no row having the cumulative total as 100, then wont't get any result.
Query
set #total := 0;
select `id`, `name`, `marks`, `grade` from(
select `id`, `name`, `marks`, `grade`, (#total := #total + `marks`) as `total`
from `your_table_name`
order by `grade`
) as `t`
where `t`.`total` = 100;
As mentioned the database structure above, Below is one of the way to get the output
select name from (select * from (SELECT id,name,grade,marks, #total := #total + marks AS total FROM (stud, (select #total := 0) t) order by grade ) t WHERE total <=100 ) final_view order by grade desc limit 1
I want get something like this
Mysql data
(dat_reg)
1.1.2000
1.1.2000
1.1.2000
2.1.2000
2.1.2000
3.1.2000
I want to get:
(dat_reg) (count)
1.1.2000 - 3
2.1.2000 - 5
3.1.2000 - 6
What I tried is this:
SELECT COUNT( * ) as a , DATE_FORMAT( dat_reg, '%d.%m.%Y' ) AS dat
FROM members
WHERE (dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY))
GROUP BY DATE_FORMAT(dat_reg, '%d.%m.%Y')
ORDER BY dat_reg
but I get:
1.1.2000 - 3 | 2.1.2000 - 2 | 3.1.2000 - 1
Some tips how create query for this?
I would suggest using variables in MySQL:
SELECT d.*, (#sumc := #sumc + cnt) as running_cnt
FROM (SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, COUNT(*) as cnt
FROM members
WHERE dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY)
GROUP BY dat
ORDER BY dat_reg
) d CROSS JOIN
(SELECT #sumc := 0) params;
If you want an accumulative from the beginning of time, then you need an additional subquery:
SELECT d.*
FROM (SELECT d.*, (#sumc := #sumc + cnt) as running_cnt
FROM (SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, dat_reg, COUNT(*) as cnt
FROM members
GROUP BY dat
ORDER BY dat_reg
) d CROSS JOIN
(SELECT #sumc := 0) params
) d
WHERE dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY)
A subquery counting the rows where the registration date is less than or equal to the current registration date could help you out.
SELECT m2.dat_reg,
(SELECT count(*)
FROM members m3
WHERE m3.dat_reg <= m2.dat_reg) count
FROM (SELECT DISTINCT m1.dat_reg
FROM m1.members
WHERE m1.dat_reg > date_sub(now(), INTERVAL 5 DAY)) m2
ORDER BY m2.dat_reg;
(If you got days, on which no one registered and don't want to have gaps in the result, you need to replace the subquery aliased m2 with a table or subquery, that has all days in the respective range.)
I believe you can use the window functions to do the work:
mysql> SELECT employee, sale, date, SUM(sale) OVER (PARTITION by employee ORDER BY date) AS cum_sales FROM sales;
+----------+------+------------+-----------+
| employee | sale | date | cum_sales |
+----------+------+------------+-----------+
| odin | 200 | 2017-03-01 | 200 |
| odin | 300 | 2017-04-01 | 500 |
| odin | 400 | 2017-05-01 | 900 |
| thor | 400 | 2017-03-01 | 400 |
| thor | 300 | 2017-04-01 | 700 |
| thor | 500 | 2017-05-01 | 1200 |
+----------+------+------------+-----------+
In your case you already have the right groups, it is only a matter of specifying the order in which you want the data the be aggregated.
Source: https://mysqlserverteam.com/mysql-8-0-2-introducing-window-functions/
Cheers
Here is a solution using rank and a continuous count variable:
WITH ranked AS (
SELECT m.*
,ROW_NUMBER() OVER (PARTITION BY m.dat_reg ORDER BY m.id DESC) AS rn
FROM (
select id, dat_reg
,#cnt := #cnt + 1 AS ccount from members
,(SELECT #cnt := 0) var
WHERE (dat_reg > DATE_SUB(NOW(), INTERVAL 5 DAY))
) AS m
)
SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, ccount FROM ranked WHERE rn = 1;
DB-Fiddle
I am struggling to make a GROUP BY contiguous blocks, I've used the following two for references:
- GROUP BY for continuous rows in SQL
- How can I do a contiguous group by in MySQL?
- https://gcbenison.wordpress.com/2011/09/26/queries-that-group-tables-by-contiguous-blocks/
The primary idea that I am trying to encapsulate periods with a start and end date of a given state. A complexity unlike other examples is that I'm using a date per room_id as the indexing field (rather than a sequential id).
My table:
room_id | calendar_date | state
Sample data:
1 | 2016-03-01 | 'a'
1 | 2016-03-02 | 'a'
1 | 2016-03-03 | 'a'
1 | 2016-03-04 | 'b'
1 | 2016-03-05 | 'b'
1 | 2016-03-06 | 'c'
1 | 2016-03-07 | 'c'
1 | 2016-03-08 | 'c'
1 | 2016-03-09 | 'c'
2 | 2016-04-01 | 'b'
2 | 2016-04-02 | 'a'
2 | 2016-04-03 | 'a'
2 | 2016-04-04 | 'a'
The objective:
room_id | date_start | date_end | state
1 | 2016-03-01 | 2016-03-03 | a
1 | 2016-03-04 | 2016-03-05 | b
1 | 2016-03-06 | 2016-03-09 | c
2 | 2016-04-01 | 2016-04-01 | b
2 | 2016-04-02 | 2016-04-04 | c
The two attempts I've made at this:
1)
SELECT
rooms.row_new,
rooms.state_new,
MIN(rooms.room_id) AS room_id,
MIN(rooms.state) AS state,
MIN(rooms.date) AS date_start,
MAX(rooms.date) AS date_end,
FROM
(
SELECT #r := #r + (#state != state) AS row_new,
#state := state AS state_new,
rooms.*
FROM (
SELECT #r := 0,
#state := ''
) AS vars,
rooms_vw
ORDER BY room_id, date
) AS rooms
WHERE room_id = 1
GROUP BY row_new
ORDER BY room_id, date
;
This is very close to working, but when I print out row_new it starts to jump (1, 2, 3, 5, 7, ...)
2)
SELECT
MIN(rooms_final.calendar_date) AS date_start,
MAX(rooms_final.calendar_date) AS date_end,
rooms_final.state,
rooms_final.room_id,
COUNT(*)
FROM (SELECT
rooms.date,
rooms.state,
rooms.room_id,
CASE
WHEN rooms_merge.state IS NULL OR rooms_merge.state != rooms.state THEN
#rownum := #rownum+1
ELSE
#rownum
END AS row_num
FROM rooms
JOIN (SELECT #rownum := 0) AS row
LEFT JOIN (SELECT rooms.date + INTERVAL 1 DAY AS date,
rooms.state,
rooms.room_id
FROM rooms) AS rooms_merge ON rooms_merge.calendar_date = rooms.calendar_date AND rooms_merge.room_id = rooms.room_id
ORDER BY rooms.room_id, rooms.calendar_date
) AS rooms_final
GROUP BY rooms_final.state, rooms_final.row_num
ORDER BY room_id, calendar_date;
For some reason this is returning some null room_id's results as well as generally inaccurate.
Working with variables is a bit tricky. I would go for:
SELECT r.state_new, MIN(r.room_id) AS room_id, MIN(r.state) AS state,
MIN(r.date) AS date_start, MAX(r.date) AS date_end
FROM (SELECT r.*,
(#grp := if(#rs = concat_ws(':', room, state), #grp,
if(#rs := concat_ws(':', room, state), #grp + 1, #grp + 1)
)
) as grp
FROM (SELECT r.* FROM rooms_vw r ORDER BY ORDER BY room_id, date
) r CROSS JOIN
(SELECT #grp := 0, #rs := '') AS params
) AS rooms
WHERE room_id = 1
GROUP BY room_id, grp
ORDER BY room_id, date;
Notes:
Assigning a variable in one expression and using it in another is unsafe. MySQL does not guarantee the order of evaluation of expressions.
In more recent versions of MySQL, you need to do the ORDER BY in a subquery.
In the most recent versions, you can use row_number(), greatly simplifying the calculation.
Thanks to #Gordon Linoff for giving me insights to get to this answer:
SELECT
MIN(room_id) AS room_id,
MIN(state) AS state,
MIN(date) AS date_start,
MAX(date) AS date_end
FROM
(
SELECT
#r := #r + IF(#state <> state OR #room_id <> room_id, 1, 0) AS row_new,
#state := state AS state_new,
#room_id := room_id AS room_id_new,
tmp_rooms.*
FROM (
SELECT #r := 0,
#room_id := 0,
#state := ''
) AS vars,
(SELECT * FROM rooms WHERE room_id IS NOT NULL ORDER BY room_id, date) tmp_rooms
) AS rooms
GROUP BY row_new
order by room_id, date
;
I created a view by the following statement.
CREATE VIEW
view_projectHour
AS
SELECT pno
, SUM( hours ) AS total_hours
FROM works_on
GROUP BY pno
ORDER BY total_hours DESC
Now, how can I implement ranking in this view? I want the projects to be ranked. The project with the highest hours must be ranked 1 and be placed on the top and so on. Also there are projects with the same hours.
Unfortunately MySQL lack support for analytic functions. Particularly RANK() and RANK_DENSE().
To emulate RANK() you can do
SELECT pno, total_hours, rank
FROM
(
SELECT pno, total_hours,
#n := #n + 1 rnum, #r := IF(#h = total_hours, #r, #n) rank, #h := total_hours
FROM
(
SELECT pno, SUM(hours) total_hours
FROM works_on
GROUP BY pno
) q CROSS JOIN (SELECT #n := 0, #r := 0, #h := NULL) i
ORDER BY total_hours DESC, pno
) t
Sample output:
| PNO | TOTAL_HOURS | RANK |
|-----|-------------|------|
| 3 | 61 | 1 |
| 1 | 40 | 2 |
| 2 | 40 | 2 |
| 4 | 10 | 4 |
To emulate DENSE_RANK() you can do
SELECT pno, total_hours, rank
FROM
(
SELECT pno, total_hours,
#r := IF(#h = total_hours, #r, #r + 1) rank, #h := total_hours
FROM
(
SELECT pno, SUM(hours) total_hours
FROM works_on
GROUP BY pno
) q CROSS JOIN (SELECT #r := 0, #h := NULL) i
ORDER BY total_hours DESC, pno
) t
Sample output:
| PNO | TOTAL_HOURS | RANK |
|-----|-------------|------|
| 3 | 61 | 1 |
| 1 | 40 | 2 |
| 2 | 40 | 2 |
| 4 | 10 | 3 |
Note: You can ditch outer SELECTs if you don't mind to have one or two extra columns in your resultset.
Here is SQLFiddle demo
An alternate solution is to use a JOIN to count how many values are ranked better for each row;
SELECT 1+COUNT(b.total_hours) rank, a.pno, a.total_hours
FROM test a
LEFT JOIN test b
ON a.total_hours < b.total_hours
GROUP BY a.pno, a.total_hours
ORDER BY total_hours DESC;
An SQLfiddle to test with.