I have a table called order_match which contain order_buyer_Id as the id of the transaction, createdby as the id of the buyer, and createdAt as the date when the transaction happened and quantity as the quantity of each order.
In this case, I want to count of the order (order_buyer_Id) for each buyer (createdby) and find out the maximum and the minumum count after that.
this is the example data:
+----------------+-----------+------------+--------+
| order_buyer_id | createdby | createdAt |quantity|
+----------------+-----------+------------+--------+
| 19123 | 19 | 2017-02-02 |0.4 |
| 193241 | 19 | 2017-02-02 |0.5
| 123123 | 20 | 2017-02-02 |1 |
| 32242 | 20 | 2017-02-02 |4
| 32434 | 20 | 2017-02-02 |3 |
+----------------+-----------+------------+---------
and if I run the query, the expected result is:
+-----+-----+---------+--------+
| max | min | average | median |
+-----+-----+---------+--------+
| 4 | 0.4 | 1,78 | 1 |
+-----+-----+---------+---------
This is the fiddle
http://www.sqlfiddle.com/#!9/d89772/15
and this is my query
SELECT MAX(quantity) AS max,
MIN(quantity) AS min,
AVG(quantity) AS average,
AVG(CASE WHEN rn IN (FLOOR((#tr+1)/2), FLOOR((#tr+2)/2)) THEN quantity END) AS median
FROM (
SELECT count,
#rn := #rn + 1 AS rn,
#tr := #rn AS tr
FROM (
SELECT COUNT(*) AS count
FROM order_match
GROUP BY order_buyer_Id
order by quantity
) o
CROSS JOIN (SELECT #rn := 0) init
) c
You are getting the error because quantity is not in your subquery.
Either you have join with your table again to get the quantity or you can include the quantity in you select (based on your sample data even group by with quantity gives the same result)
SELECT MAX(quantity) AS max,
MIN(quantity) AS min,
AVG(quantity) AS average,
AVG(CASE WHEN rn IN (FLOOR((#tr+1)/2), FLOOR((#tr+2)/2)) THEN quantity END) AS median
FROM (
SELECT count, quantity,
#rn := #rn + 1 AS rn,
#tr := #rn AS tr
FROM (
SELECT COUNT(*) AS count,Quantity
FROM order_match
GROUP BY order_buyer_Id,Quantity
order by quantity
) o
CROSS JOIN (SELECT #rn := 0) init
) c
SQL FIDDLE
SELECT t.max,t.min,t.average,0.00 AS 'Median'
FROM
(SELECT MAX(quantity) AS max,
MIN(quantity) AS min,
SUM(quantity)/COUNT(distinct created_by) AS average
FROM order_match)t
union
SELECT 0.00 AS 'max',0.00 AS 'min',0.00 AS 'Average',
((2*t1.average/3)+t1.mode) AS 'Median'
FROM (SELECT count(FLOOR(quantity)),IFNULL(FLOOR(quantity),min(quantity)) AS 'mode'
FROM order_match GROUP BY quantity HAVING
count(FLOOR(quantity))>1)t1
Related
I'm trying to get just top 3 selling products grouped within categories (just top 3 products by occurrence in transactions (id) count(id) by each category). I was searching a lot for possible solution but with no result. It looks like it is a bit tricky in MySQL since one can't simply use top() function and so on. Sample data structure bellow:
+--------+------------+-----------+
| id |category_id | product_id|
+--------+------------+-----------+
| 1 | 10 | 32 |
| 2 | 10 | 34 |
| 3 | 10 | 32 |
| 4 | 10 | 21 |
| 5 | 10 | 100 |
| 6 | 7 | 101 |
| 7 | 7 | 39 |
| 8 | 7 | 41 |
| 9 | 7 | 39 |
+--------+------------+-----------+
In earlier versions of MySQL, I would recommend using variables:
select cp.*
from (select cp.*,
(#rn := if(#c = category_id, #rn + 1,
if(#c := category_id, 1, 1)
)
) as rn
from (select category_id, product_id, count(*) as cnt
from mytable
group by category_id, product_id
order by category_id, count(*) desc
) cp cross join
(select #c := -1, #rn := 0) params
) cp
where rn <= 3;
If you are running MySQL 8.0, you can use window function rank() for this:
select *
from (
select
category_id,
product_id,
count(*) cnt,
rank() over(partition by category_id order by count(*) desc) rn
from mytable
group by category_id, product_id
) t
where rn <= 3
In earlier versions, one option is to filter with a correlated subquery:
select
category_id,
product_id,
count(*) cnt
from mytable t
group by category_id, product_id
having count(*) >= (
select count(*)
from mytable t1
where t1.category_id = t.category_id and t1.product_id = t.product_id
order by count(*) desc
limit 3, 1
)
I have this stored procedure:
CREATE DEFINER=`brambang`#`%` PROCEDURE `TNP_PRODUK_FrekuensiPembelianUlang`(IN paramdatefrom CHAR(19), paramdateto CHAR(19))
BEGIN
SELECT MAX(count) AS max,
MIN(count) AS min,
AVG(count) AS average,
AVG(CASE WHEN rn IN (FLOOR((#tr+1)/2), FLOOR((#tr+2)/2)) THEN count END) AS median
FROM (
SELECT count,
#rn := #rn + 1 AS rn,
#tr := #rn AS tr
FROM (
SELECT COUNT(*) AS count
FROM order_match om1
where om1.createdAt between paramdatefrom and paramdateto
and om1.order_status_id in (4, 5, 6, 8)
and EXISTS(SELECT 1 from order_match om2
where om1.createdby = om2.createdby
and om2.createdAt < paramdatefrom
and om2.order_status_id in (4, 5, 6, 8))
GROUP BY createdby
ORDER BY count
) o
CROSS JOIN (SELECT #rn := 0) init
) c;
END
and this is the result if i insert the parameter
+-----+-----+---------+---------+
| max | min | average | median |
+-----+-----+---------+---------+
| 24 | 1 | 1.6382 | 1.00000 |
+-----+-----+---------+---------+
What should I add to my stored procedure so that the value can be rounded off to be like this
+-----+-----+---------+---------+
| max | min | average | median |
+-----+-----+---------+---------+
| 24 | 1 | 1.64 | 1.0 |
+-----+-----+---------+---------+
Just use ROUND(). If you want two decimals maximum, then:
SELECT ROUND(MAX(count), 2) AS max,
ROUND(MIN(count), 2) AS min,
ROUND(AVG(count), 2) AS average,
ROUND(AVG(CASE WHEN rn IN (FLOOR((#tr+1)/2), FLOOR((#tr+2)/2)) THEN count END), 2) AS median
FROM ...
I am struggling to make a GROUP BY contiguous blocks, I've used the following two for references:
- GROUP BY for continuous rows in SQL
- How can I do a contiguous group by in MySQL?
- https://gcbenison.wordpress.com/2011/09/26/queries-that-group-tables-by-contiguous-blocks/
The primary idea that I am trying to encapsulate periods with a start and end date of a given state. A complexity unlike other examples is that I'm using a date per room_id as the indexing field (rather than a sequential id).
My table:
room_id | calendar_date | state
Sample data:
1 | 2016-03-01 | 'a'
1 | 2016-03-02 | 'a'
1 | 2016-03-03 | 'a'
1 | 2016-03-04 | 'b'
1 | 2016-03-05 | 'b'
1 | 2016-03-06 | 'c'
1 | 2016-03-07 | 'c'
1 | 2016-03-08 | 'c'
1 | 2016-03-09 | 'c'
2 | 2016-04-01 | 'b'
2 | 2016-04-02 | 'a'
2 | 2016-04-03 | 'a'
2 | 2016-04-04 | 'a'
The objective:
room_id | date_start | date_end | state
1 | 2016-03-01 | 2016-03-03 | a
1 | 2016-03-04 | 2016-03-05 | b
1 | 2016-03-06 | 2016-03-09 | c
2 | 2016-04-01 | 2016-04-01 | b
2 | 2016-04-02 | 2016-04-04 | c
The two attempts I've made at this:
1)
SELECT
rooms.row_new,
rooms.state_new,
MIN(rooms.room_id) AS room_id,
MIN(rooms.state) AS state,
MIN(rooms.date) AS date_start,
MAX(rooms.date) AS date_end,
FROM
(
SELECT #r := #r + (#state != state) AS row_new,
#state := state AS state_new,
rooms.*
FROM (
SELECT #r := 0,
#state := ''
) AS vars,
rooms_vw
ORDER BY room_id, date
) AS rooms
WHERE room_id = 1
GROUP BY row_new
ORDER BY room_id, date
;
This is very close to working, but when I print out row_new it starts to jump (1, 2, 3, 5, 7, ...)
2)
SELECT
MIN(rooms_final.calendar_date) AS date_start,
MAX(rooms_final.calendar_date) AS date_end,
rooms_final.state,
rooms_final.room_id,
COUNT(*)
FROM (SELECT
rooms.date,
rooms.state,
rooms.room_id,
CASE
WHEN rooms_merge.state IS NULL OR rooms_merge.state != rooms.state THEN
#rownum := #rownum+1
ELSE
#rownum
END AS row_num
FROM rooms
JOIN (SELECT #rownum := 0) AS row
LEFT JOIN (SELECT rooms.date + INTERVAL 1 DAY AS date,
rooms.state,
rooms.room_id
FROM rooms) AS rooms_merge ON rooms_merge.calendar_date = rooms.calendar_date AND rooms_merge.room_id = rooms.room_id
ORDER BY rooms.room_id, rooms.calendar_date
) AS rooms_final
GROUP BY rooms_final.state, rooms_final.row_num
ORDER BY room_id, calendar_date;
For some reason this is returning some null room_id's results as well as generally inaccurate.
Working with variables is a bit tricky. I would go for:
SELECT r.state_new, MIN(r.room_id) AS room_id, MIN(r.state) AS state,
MIN(r.date) AS date_start, MAX(r.date) AS date_end
FROM (SELECT r.*,
(#grp := if(#rs = concat_ws(':', room, state), #grp,
if(#rs := concat_ws(':', room, state), #grp + 1, #grp + 1)
)
) as grp
FROM (SELECT r.* FROM rooms_vw r ORDER BY ORDER BY room_id, date
) r CROSS JOIN
(SELECT #grp := 0, #rs := '') AS params
) AS rooms
WHERE room_id = 1
GROUP BY room_id, grp
ORDER BY room_id, date;
Notes:
Assigning a variable in one expression and using it in another is unsafe. MySQL does not guarantee the order of evaluation of expressions.
In more recent versions of MySQL, you need to do the ORDER BY in a subquery.
In the most recent versions, you can use row_number(), greatly simplifying the calculation.
Thanks to #Gordon Linoff for giving me insights to get to this answer:
SELECT
MIN(room_id) AS room_id,
MIN(state) AS state,
MIN(date) AS date_start,
MAX(date) AS date_end
FROM
(
SELECT
#r := #r + IF(#state <> state OR #room_id <> room_id, 1, 0) AS row_new,
#state := state AS state_new,
#room_id := room_id AS room_id_new,
tmp_rooms.*
FROM (
SELECT #r := 0,
#room_id := 0,
#state := ''
) AS vars,
(SELECT * FROM rooms WHERE room_id IS NOT NULL ORDER BY room_id, date) tmp_rooms
) AS rooms
GROUP BY row_new
order by room_id, date
;
Is it possible to change the following mySQL query to use a join instead of a subquery for efficiency (or another way to increase efficiency)? I have a table with patient visits to an emergency department. The table lists arrival and departure time. I need the query to return the total number of patients that were already present in the emergency department (the "census") when the patient arrived.
My table looks something like this:
+------+------+---------------------+---------------------+
| id | name | arrival | departure |
+------+------+---------------------+---------------------+
| 1 | Joe | 2010-01-01 00:00:00 | 2010-01-01 02:00:00 |
| 2 | John | 2010-01-01 00:05:00 | 2010-01-01 03:00:00 |
| 3 | Jane | 2010-01-01 01:00:00 | 2010-01-01 04:00:00 |
...
With a desired result like this:
+------+--------+
| name | census |
+------+--------+
| Joe | 0 |
| John | 1 |
| Jane | 2 |
...
The following query works, but is quite slow (about 3.5 seconds on 180,000 rows). Is there a way to increase the efficiency of this query (with some sort of join, or other method)?
select name, arrival,
(SELECT count(*)
FROM patient_arrivals as b
WHERE b.arrival <= a.arrival and b.departure >= a.departure) as census
FROM patient_arrivals as a
I don't think a join will help. Instead, you need to restructure the query. The following gets the number of patients in the room at any particular time:
select t, sum(num) as num, #total := #total + num as total
from (select arrival as t, 1 as num
from patient_arrivals
union all
select departure, -1
from patient_arrivals
) t cross join
(select #total := 0) vars
group by t
order by t
Then, you can use this as a subquery for the join:
select pa.*, t.total as census
from patient_arrivals pa join
(select t, sum(num) as num, #total := #total + num as total
from (select arrival as t, 1 as num
from patient_arrivals
union all
select departure, -1
from patient_arrivals
) t cross join
(select #total := 0) vars
group by t
order by t
) tnum
on pa.arrival = tnum.t;
This gives the number when the patient arrives. For the total that overlap:
select pa.*, max(t.total) as census
from patient_arrivals pa join
(select t, sum(num) as num, #total := #total + num as total
from (select arrival as t, 1 as num
from patient_arrivals
union all
select departure, -1
from patient_arrivals
) t cross join
(select #total := 0) vars
group by t
order by t
) tnum
on tnum.t between pa.arrival and pa.departure
group by pa.id
I created a view by the following statement.
CREATE VIEW
view_projectHour
AS
SELECT pno
, SUM( hours ) AS total_hours
FROM works_on
GROUP BY pno
ORDER BY total_hours DESC
Now, how can I implement ranking in this view? I want the projects to be ranked. The project with the highest hours must be ranked 1 and be placed on the top and so on. Also there are projects with the same hours.
Unfortunately MySQL lack support for analytic functions. Particularly RANK() and RANK_DENSE().
To emulate RANK() you can do
SELECT pno, total_hours, rank
FROM
(
SELECT pno, total_hours,
#n := #n + 1 rnum, #r := IF(#h = total_hours, #r, #n) rank, #h := total_hours
FROM
(
SELECT pno, SUM(hours) total_hours
FROM works_on
GROUP BY pno
) q CROSS JOIN (SELECT #n := 0, #r := 0, #h := NULL) i
ORDER BY total_hours DESC, pno
) t
Sample output:
| PNO | TOTAL_HOURS | RANK |
|-----|-------------|------|
| 3 | 61 | 1 |
| 1 | 40 | 2 |
| 2 | 40 | 2 |
| 4 | 10 | 4 |
To emulate DENSE_RANK() you can do
SELECT pno, total_hours, rank
FROM
(
SELECT pno, total_hours,
#r := IF(#h = total_hours, #r, #r + 1) rank, #h := total_hours
FROM
(
SELECT pno, SUM(hours) total_hours
FROM works_on
GROUP BY pno
) q CROSS JOIN (SELECT #r := 0, #h := NULL) i
ORDER BY total_hours DESC, pno
) t
Sample output:
| PNO | TOTAL_HOURS | RANK |
|-----|-------------|------|
| 3 | 61 | 1 |
| 1 | 40 | 2 |
| 2 | 40 | 2 |
| 4 | 10 | 3 |
Note: You can ditch outer SELECTs if you don't mind to have one or two extra columns in your resultset.
Here is SQLFiddle demo
An alternate solution is to use a JOIN to count how many values are ranked better for each row;
SELECT 1+COUNT(b.total_hours) rank, a.pno, a.total_hours
FROM test a
LEFT JOIN test b
ON a.total_hours < b.total_hours
GROUP BY a.pno, a.total_hours
ORDER BY total_hours DESC;
An SQLfiddle to test with.