I have a table like this,
CREATE TABLE order_match
(`order_buyer_id` int, `createdby` int, `createdAt` datetime, `quantity` decimal(10,2))
;
INSERT INTO order_match
(`order_buyer_id`, `createdby`, `createdAt`, `quantity`)
VALUES
(19123, 19, '2017-02-02', 5),
(193241, 19, '2017-02-03', 5),
(123123, 20, '2017-02-03', 1),
(32242, 20, '2017-02-04', 4),
(32434, 20, '2017-02-04', 5),
(2132131, 12, '2017-02-02', 6)
;
here's the fiddle
on this table, order_buyer_id is id of the transaction, createdby are the buyer, createdAt are the time of each transaction, quantity are the quantity of transaction
I want to find out the maximum, minimum, median and average for each repeat order (the buyer with transaction > 1)
so on this table, expected results are just like this
+-----+-----+---------+--------+
| MAX | MIN | Average | Median |
+-----+-----+---------+--------+
| 3 | 2 | 2.5 | 3 |
+-----+-----+---------+--------+
note: im using mysql 5.7
I am using this syntax
select -- om.createdby, om.quantity, x1.count_
MAX(count(om.createdby)) AS max,
MIN(count(om.createdby)) AS min,
AVG(count(om.createdby)) AS average
from (select count(xx.count_) as count_
from (select count(createdby) as count_ from order_match
group by createdby
having count(createdby) > 1) xx
) x1,
(select createdby
from order_match
group by createdby
having count(createdby) > 1) yy,
order_match om
where yy.createdby = om.createdby
and om.createdAt <= '2017-02-04'
and EXISTS (select 1 from order_match om2
where om.createdby = om2.createdby
and om2.createdAt >= '2017-02-02'
and om2.createdAt <= '2017-02-04')
but it's said
Invalid use of group function
We can try aggregating by createdby, and then taking the aggregates you want:
SELECT
MAX(cnt) AS MAX,
MIN(cnt) AS MIN,
AVG(cnt) AS Average
FROM
(
SELECT createdby, COUNT(*) AS cnt
FROM order_match
GROUP BY createdby
HAVING COUNT(*) > 0
) t
To simulate the median in MySQL 5.7 is a lot of work, and ugly. If you have a long term need for median, consider upgrading to MySQL 8+.
Related
I have a MySQL problem that I can't figure out.
I run a query:
SELECT id, totalsum FROM table ORDER BY totalsum DESC
This could give me the following result:
1, 10000
4, 90000
8, 80000
3, 50000
5, 40000
++++
What is need is a code that should work something like this:
SELECT id, totalsum
FROM table ORDER BY totalsum DESC
START LISTING FROM id=8 AND CONTINUE TO THE END OF RESULT / LIMIT
Resulting in someting like this
8, 80000
3, 50000
5, 40000
++++
I can not use this query:
SELECT id, totalsum
FROM table
WHERE id>=8
ORDER BY totalsum DESC
Because the id could be both < and >.
Have tried using LIMIT AND OFFSET but that resulting in very slow speed.
Any advice pointing me in the right direction will be appreciated!
Here's a way to do it:
Assign each row a row_num based on totalsum in descending order (CTE)
Select from the above where row_num >= the row_num of id=8
create table a_table (
id int,
total int);
insert into a_table values
(1, 100000),
(4, 90000),
(8, 80000),
(3, 50000),
(5, 40000);
with cte as (
select id,
total,
row_number() over (order by total desc) as row_num
from a_table)
select *
from cte
where row_num >= (select row_num from cte where id=8);
Result:
id|total|row_num|
--+-----+-------+
8|80000| 3|
3|50000| 4|
5|40000| 5|
EDIT:
The above query may return wrong result if other rows have the same total. A comment said it well, just use the following query can do the job:
select id, total
from a_table
where total <= (select total from a_table where id=8)
order by total desc;
I need to find median of total scores region wise. I have got the solution after trial and error on data but the query is not in a optimized way. i need a efficient mysql query for this problem.
#Thanks for the solutions#
Edit: first exam has to be filter from assessment table and second total_score needs to be summed for all subject of each student using studentassessment table. Then finally region wise median needs to be calculated.
SELECT region,
Avg(total_score) AS median
FROM (SELECT row_num,
region,
total_score,
region_cnt
FROM (SELECT Row_number()
OVER (
partition BY region
ORDER BY total_score) AS row_num,
region,
total_score,
Count(region)
OVER (
partition BY region) AS region_cnt
FROM (SELECT i.region AS region,
Sum(S.score) AS total_score
FROM tredence.assesment A
INNER JOIN tredence.studentassessment S
ON A.id_assessment = S.id_assessment
INNER JOIN tredence.studentinfo i
ON i.id_student = S.id_student
WHERE A.assessment = 'Exam'
GROUP BY S.id_student,
i.region
ORDER BY region,
total_score) t) r
GROUP BY 1,
2,
3,
4
HAVING row_num IN ( Floor(region_cnt / 2), Ceil(region_cnt / 2) )) z
GROUP BY region
ORDER BY median DESC ```
tables and columns:
|Assessments |student_info|student_assessment|
|---------------|------------|------------------|
|course_code |course_code |id_assessment |
|batch_code |batch_code |id_student |
|id_assessments |id_student |date_submitted |
|assessment_type|gender |is_banked |
|date |region |score |
Output:
|Region |Median|
|-------------|------|
|North Region | 82 |
|London Region| 80 |
|Scotland | 80 |
|Ireland | 76 |
Assuming you reduce the set to the following. Note: id_student isn't required at this point in the calculation.
CREATE TABLE tscores (
id int primary key auto_increment
, region int
, id_student int
, total_score int
, index (region, total_score)
);
INSERT INTO tscores (region, id_student, total_score) VALUES
(1, 1000, 40)
, (1, 1001, 50)
, (1, 1002, 30)
, (1, 1003, 90)
, (2, 1101, 50)
, (2, 1102, 51)
, (2, 1103, 55)
;
SQL and Result:
WITH cte1 AS (
SELECT region, total_score
, ((COUNT(*) OVER (PARTITION BY region) + 1) / 2) AS n
, ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_score) AS rn
FROM tscores AS t
)
SELECT region
, truncate(AVG(total_score), 2) AS med_score
FROM cte1 AS t
WHERE rn IN (ceil(n), floor(n))
GROUP BY region
;
+--------+-----------+
| region | med_score |
+--------+-----------+
| 1 | 45.00 |
| 2 | 51.00 |
+--------+-----------+
2 rows in set (0.004 sec)
Still not quite enough detail. But here's SQL that runs against your schema, minus the typos I think you had in your SQL:
WITH tscores AS (
SELECT i.region AS region
, Sum(S.score) AS total_score
FROM tredence.assessments A
JOIN tredence.studentassessment S
ON A.id_assessment = S.id_assessment
JOIN tredence.studentinfo i
ON i.id_student = S.id_student
WHERE A.assessment = 'Exam'
GROUP BY S.id_student
, i.region
)
, cte1 AS (
SELECT region, total_score
, ((COUNT(*) OVER (PARTITION BY region) + 1) / 2) AS n
, ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_score) AS rn
FROM tscores AS t
)
SELECT region
, truncate(AVG(total_score), 2) AS med_score
FROM cte1 AS t
WHERE rn IN (ceil(n), floor(n))
GROUP BY region
;
I have the following (simplified) Schema.
CREATE TABLE TEST_Appointment(
Appointment_id INT AUTO_INCREMENT PRIMARY KEY,
Property_No INT NOT NULL,
Property_Type varchar(10) NOT NULL
);
INSERT INTO TEST_Appointment(Property_No, Property_Type) VALUES
(1, 'House'),
(1, 'House'),
(1, 'House'),
(2, 'Flat'),
(2, 'Flat'),
(3, 'Flat'),
(4, 'House'),
(5, 'House'),
(6, 'Studio');
I am trying to write a query to get the properties that have the most appointments in each property type group. An example output would be:
Property_No | Property_Type | Number of Appointments
-----------------------------------------------------
1 | House | 3
2 | Flat | 2
6 | Studio | 1
I have the following query to get the number of appointments per property but I am not sure how to go from there
SELECT Property_No, Property_Type, COUNT(*)
from TEST_Appointment
GROUP BY Property_Type, Property_No;
If you are running MySQL 8.0, you can use aggregation and window functions:
select *
from (
select property_no, property_type, count(*) no_appointments,
rank() over(partition by property_type order by count(*) desc) rn
from test_appointment
group by property_no, property_type
) t
where rn = 1
In earlier versions, one option uses a having clause and a row-limiting correlated subquery:
select property_no, property_type, count(*) no_appointments
from test_appointment t
group by property_no, property_type
having count(*) = (
select count(*)
from test_appointment t1
where t1.property_type = t.property_type
group by t1.property_no
order by count(*) desc
limit 1
)
Note that both queries allow ties, if any.
I have a table of data:
ProductNum | ProductVariation | Past_Price | Current_Price | Order_Date
------------ ------------------ ------------ --------------- ---------------------
1 33 96.05 100.10 2014-01-01 00:00:00
1 33 97.65 100.10 2014-12-03 12:34:52
1 33 98.98 100.10 2015-01-02 05:50:32
1 33 99.98 100.10 2016-03-02 06:50:43
1 33 100.01 100.10 2016-12-12 06:05:43
1 33 100.05 100.10 2017-01-02 05:34:43
I was wondering if its possible to query for the rows such that we get the row that has the closest date to Dec 31,{Year} ?
So the output would be :
ProductNum | ProductVariation | Past_Price | Current_Price | Order_Date
------------ ------------------ ------------ --------------- ---------------------
1 33 98.98 100.10 2015-01-02 05:50:32
1 33 99.98 100.10 2016-03-02 06:50:43
1 33 100.01 100.10 2017-01-02 05:34:43
Each order being the closest to Dec 31,{Year} for Years: 2014,2015,2016
You can sort by the date difference and get the top 1 row for each year.
For SqlServer:
DECLARE #year2014 datetime2 = '2014-12-31 12:00:00';
DECLARE #year2015 datetime2 = '2015-12-31 12:00:00';
DECLARE #year2016 datetime2 = '2016-12-31 12:00:00';
select * from (
select top(1) * from products
order by abs(datediff(second, #year2014, Order_Date))
) as p
union all
select * from (
select top(1) * from products
order by abs(datediff(second, #year2015, Order_Date))
)as p
union all
select * from (
select top(1) * from products
order by abs(datediff(second, #year2016, Order_Date))
) as p
Change the time of the 31st of December as you like.
For MySql:
set #year2014 = '2014-12-31 12:00:00';
set #year2015 = '2015-12-31 12:00:00';
set #year2016= '2016-12-31 12:00:00';
select * from (
select * from products
order by abs(TIMESTAMPDIFF(second, #year2014, Order_Date)) limit 1
) as p
union all
select * from (
select * from products
order by abs(TIMESTAMPDIFF(second, #year2015, Order_Date)) limit 1
)as p
union all
select * from (
select * from products
order by abs(TIMESTAMPDIFF(second, #year2016, Order_Date)) limit 1
) as p
Get row_number()s for each year ordered by the absolute datediff() between the order date and 31-12 of the year. Then select all where one of the row numbers equals 1.
SELECT *
FROM (SELECT *,
row_number() OVER (ORDER BY abs(datediff(second, '2014-12-31', t.order_date))) rn2014,
row_number() OVER (ORDER BY abs(datediff(second, '2015-12-31', t.order_date))) rn2015,
row_number() OVER (ORDER BY abs(datediff(second, '2016-12-31', t.order_date))) rn2016
FROM elbat t) x
WHERE 1 IN (x.rn2014,
x.rn2015,
x.rn2016);
db<>fiddle
You can use this. This will avoid hard coding years and copying pasting unions.
declare #currDate datetime;
select #currDate = '12/31/2019';
while #currDate > '12/31/2013'
begin
select *
from Product
where abs(datediff(second, OrderDate, #currDate))
= (select min(
abs(datediff(second, OrderDate, #currDate))
)
from Product )
select #currDate = dateadd(year,-1,#currDate);
end
I used the following fiddle:
create table Product (ProdNum int, ProdVar int, PastPrice decimal, CurrentPrice decimal, OrderDate datetime);
insert into Product values (1, 33, 96.05, 100.10, '2014-01-01 00:00:00');
insert into Product values (1, 33, 97.65, 100.10, '2014-12-03 12:34:52');
insert into Product values (1, 33, 98.98, 100.10, '2015-01-02 05:50:32');
insert into Product values (1, 33, 99.98, 100.10, '2016-03-02 06:50:43');
insert into Product values (1, 33, 100.01, 100.10, '2016-12-12 06:05:43');
insert into Product values (1, 33, 100.05, 100.10, '2017-01-02 05:34:43');
You seem to actually want the first date after the end of the year:
select top (1) with ties t.*
from t
order by row_number() over (partition by year(order_date) order by order_date asc);
Daft SQL question. I have a table like so ('pid' is auto-increment primary col)
CREATE TABLE theTable (
`pid` INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`cost` INT UNSIGNED NOT NULL,
`rid` INT NOT NULL,
) Engine=InnoDB;
Actual table data:
INSERT INTO theTable (`pid`, `timestamp`, `cost`, `rid`)
VALUES
(1, '2011-04-14 01:05:07', 1122, 1),
(2, '2011-04-14 00:05:07', 2233, 1),
(3, '2011-04-14 01:05:41', 4455, 2),
(4, '2011-04-14 01:01:11', 5566, 2),
(5, '2011-04-14 01:06:06', 345, 1),
(6, '2011-04-13 22:06:06', 543, 2),
(7, '2011-04-14 01:14:14', 5435, 3),
(8, '2011-04-14 01:10:13', 6767, 3)
;
I want to get the PID of the latest row for each rid (1 result per unique RID). For the sample data, I'd like:
pid | MAX(timestamp) | rid
-----------------------------------
5 | 2011-04-14 01:06:06 | 1
3 | 2011-04-14 01:05:41 | 2
7 | 2011-04-14 01:14:14 | 3
I've tried running the following query:
SELECT MAX(timestamp),rid,pid FROM theTable GROUP BY rid
and I get:
max(timestamp) ; rid; pid
----------------------------
2011-04-14 01:06:06; 1 ; 1
2011-04-14 01:05:41; 2 ; 3
2011-04-14 01:14:14; 3 ; 7
The PID returned is always the first occurence of PID for an RID (row / pid 1 is frst time rid 1 is used, row / pid 3 the first time RID 2 is used, row / pid 7 is first time rid 3 is used). Though returning the max timestamp for each rid, the pids are not the pids for the timestamps from the original table. What query would give me the results I'm looking for?
(Tested in PostgreSQL 9.something)
Identify the rid and timestamp.
select rid, max(timestamp) as ts
from test
group by rid;
1 2011-04-14 18:46:00
2 2011-04-14 14:59:00
Join to it.
select test.pid, test.cost, test.timestamp, test.rid
from test
inner join
(select rid, max(timestamp) as ts
from test
group by rid) maxt
on (test.rid = maxt.rid and test.timestamp = maxt.ts)
select *
from (
select `pid`, `timestamp`, `cost`, `rid`
from theTable
order by `timestamp` desc
) as mynewtable
group by mynewtable.`rid`
order by mynewtable.`timestamp`
Hope I helped !
SELECT t.pid, t.cost, to.timestamp, t.rid
FROM test as t
JOIN (
SELECT rid, max(tempstamp) AS maxtimestamp
FROM test GROUP BY rid
) AS tmax
ON t.pid = tmax.pid and t.timestamp = tmax.maxtimestamp
I created an index on rid and timestamp.
SELECT test.pid, test.cost, test.timestamp, test.rid
FROM theTable AS test
LEFT JOIN theTable maxt
ON maxt.rid = test.rid
AND maxt.timestamp > test.timestamp
WHERE maxt.rid IS NULL
Showing rows 0 - 2 (3 total, Query took 0.0104 sec)
This method will select all the desired values from theTable (test), left joining itself (maxt) on all timestamps higher than the one on test with the same rid. When the timestamp is already the highest one on test there are no matches on maxt - which is what we are looking for - values on maxt become NULL. Now we use the WHERE clause maxt.rid IS NULL or any other column on maxt.
You could also have subqueries like that:
SELECT ( SELECT MIN(t2.pid)
FROM test t2
WHERE t2.rid = t.rid
AND t2.timestamp = maxtimestamp
) AS pid
, MAX(t.timestamp) AS maxtimestamp
, t.rid
FROM test t
GROUP BY t.rid
But this way, you'll need one more subquery if you want cost included in the shown columns, etc.
So, the group by and join is better solution.
If you want to avoid a JOIN, you can use:
SELECT pid, rid FROM theTable t1 WHERE t1.pid IN ( SELECT MAX(t2.pid) FROM theTable t2 GROUP BY t2.rid);
Try:
select pid,cost, timestamp, rid from theTable order by timestamp DESC limit 2;