Quest
After a day of running (against nearly 1 GB of data), a set of statements are tumbling down to 40 inserts per second. I am looking to increase that by an order of magnitude or two.
SQL Code
The code to insert the information comes in two parts: a master record and detail records. The master record:
INSERT INTO MONTH_REF (DISTRICT_ID, STATION_ID, CATEGORY_ID, YEAR, MONTH) VALUES
('101', '0066', '010', 1984, 07);
The detail records:
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES ((SELECT ID
FROM MONTH_REF M WHERE M.DISTRICT_ID = '101' AND M.STATION_ID = '0066' AND M.CAT
EGORY_ID = '010' AND M.YEAR = 1984 AND M.MONTH = 07), 0, ' ', 1);
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES ((SELECT ID
FROM MONTH_REF M WHERE M.DISTRICT_ID = '101' AND M.STATION_ID = '0066' AND M.CAT
EGORY_ID = '010' AND M.YEAR = 1984 AND M.MONTH = 07), 0.5, ' ', 2);
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES ((SELECT ID
FROM MONTH_REF M WHERE M.DISTRICT_ID = '101' AND M.STATION_ID = '0066' AND M.CAT
EGORY_ID = '010' AND M.YEAR = 1984 AND M.MONTH = 07), 0, 'T', 3);
Proposed Solution
The proposed solution eliminates looking up each MONTH_REF_ID by storing it in a local variable, as follows:
INSERT INTO MONTH_REF (DISTRICT_ID, STATION_ID, CATEGORY_ID, YEAR, MONTH) VALUES
('101', '0066', '010', 1984, 07);
SET #month_ref_id := (SELECT LAST_INSERT_ID());
The detail statements then become:
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES (#month_ref_id, 0, ' ', 1);
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES (#month_ref_id, 0.5, ' ', 2);
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES (#month_ref_id, 0, 'T', 3);
Constraints
The MONTH_REF table has an AUTO_INCREMENT primary key and is indexed on it. The DAILY table has no index and no primary key. A primary key can be added to the DAILY table, if it would help.
Question
What is a more efficient way to execute the (billion or so) insert statements than the proposed solution?
Thank you!
This solution works:
INSERT INTO MONTH_REF (DISTRICT_ID,STATION_ID,CATEGORY_ID,YEAR,MONTH) VALUES('101','QFEG','012',1973,08);
SET #month_ref_id := (SELECT LAST_INSERT_ID());
INSERT INTO DAILY (MONTH_REF_ID,AMOUNT,DAILY_FLAG_ID,DAY) VALUES(#month_ref_id,0,' ',1),(#month_ref_id,0,' ',2),(#month_ref_id,0,' ',3);
Inserts went up about four orders of magnitude.
Related
start
end
category
2022:10:14 17:13:00
2022:10:14 17:19:00
A
2022:10:01 16:29:00
2022:10:01 16:49:00
B
2022:10:19 18:55:00
2022:10:19 19:03:00
A
2022:10:31 07:52:00
2022:10:31 07:58:00
A
2022:10:13 18:41:00
2022:10:13 19:26:00
B
The table is sample data about trips
the target is to calculate the time consumed for each category . EX: category A = 02:18:02
1st I changed the time stamp criteria in the csv file as YYYY/MM/DD HH:MM:SS to match with MYSQL, and removed the headers
I created a table in MYSQL Workbench as the following code
CREATE TABLE trip (
start TIMESTAMP,
end TIMESTAMP,
category VARCHAR(6)
);
Then to calculate the consumed time I coded as
SELECT category, SUM(TIMEDIFF(end, start)) as length
FROM trip
GROUP BY CATEGORY;
The result was solid numbers as A=34900 & B = 38000
SO I added a convert, Time function as following:
SELECT category, Convert(SUM(TIMEDIFF(end, start)), Time) as length
FROM trip
GROUP BY category;
THE result was great with category A =03:49:00 , but unfortunately category B= NULL instead of 03:08:00
WHAT I'VE DONE WRONG , what is the different approach I should've done
You can do it as follows :
This is useful to Surpass MySQL's TIME value limit of 838:59:59
SELECT category,
CONCAT(FLOOR(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))/3600),":",FLOOR((SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)/60),":",(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)%60) as `length`
FROM trip
GROUP BY category;
This is to get time like 00:20:00 instead of 0:20:0
SELECT category,
CONCAT(
if(FLOOR(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))/3600) > 10, FLOOR(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))/3600), CONCAT('0',FLOOR(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))/3600)) ) ,
":",
if(FLOOR((SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)/60) > 10, FLOOR((SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)/60), CONCAT('0', FLOOR((SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)/60) ) ),
":",
if( (SUM(TIMESTAMPDIFF(SECOND, `start`, `end`) )%3600)%60 > 10, (SUM(TIMESTAMPDIFF(SECOND, `start`, `end`) )%3600)%60, concat('0', (SUM(TIMESTAMPDIFF(SECOND, `start`, `end`) )%3600)%60))
) as `length`
FROM trip
GROUP BY category;
You'd calculate the length for each separate trip in seconds, get sum of the lengths per category then convert seconds to time:
SELECT category, SEC_TO_TIME(SUM(TIMESTAMPDIFF(SECOND, `end`, `start`))) as `length`
FROM trip
GROUP BY category;
If SUM() exceeds the limit for TIME datatype (838:59:59) then this MAXVALUE will be returned.
For the values which exceeds the limit for TIME value use
SELECT category,
CONCAT_WS(':',
secs DIV (60 * 60),
LPAD(secs DIV 60 MOD 60, 2, 0),
LPAD(secs MOD 60, 2, 0)) AS `length`
FROM (
SELECT category, SUM(TIMESTAMPDIFF(SECOND, `end`, `start`)) AS secs
FROM trip
GROUP BY category
) subquery
;
I am trying to retrieve the percentage of available products at specific merchants over the last 30 days.
Desired result example:
20210504 merchant1 20%
20210504 merchant2 30%
20210505 merchant1 25%
20210505 merchant2 35%
There are 3 tables:
availability (containing availability info for each product and merchant and day)
products (where the manufacturer_id is, that we want to filter for)
merchants (merchant info)
Minimal example: https://www.db-fiddle.com/f/wtnK5R4DWi7Dy6LwLaP4mX/0
This returns the percentage for only one merchant and one day:
-- get percentage of available products per merchant over time
SELECT
m.name AS metric,
t.s AS AMOUNT_AVAILABLE,
count(*) AS AMOUNT_TOTAL,
t.s / count(*) AS percentage
FROM availability p
CROSS JOIN (
SELECT count(*) AS s FROM availability p2
INNER JOIN products mp on p2.SKU = mp.SKU
WHERE
availability = 'sofort lieferbar'
AND date = curdate() - interval 1 day -- testing for one day, but we want a time series
AND mp.MANUFACTURER_ID = 1
-- AND p2.merchant_id = p.merchant_id -- does not work
-- AND merchant_id = 2
-- GROUP BY merchant_id
) t
INNER JOIN products mp on p.SKU = mp.SKU
INNER JOIN merchants m ON m.id = p.MERCHANT_ID
WHERE
p.date = curdate() - interval 1 day
and mp.MANUFACTURER_ID = 1
-- and merchant_id = 2
GROUP BY
merchant_id
Now I am trying to somehow merge the cross join with the from table so I get the info for each merchant and day. How can a cross join be joined with the from table?
Data & Shema:
create table merchants
(
id tinyint unsigned not null
primary key,
name varchar(255) null
);
INSERT INTO merchants (id, name) VALUES (1, 'Amazon');
INSERT INTO merchants (id, name) VALUES (2, 'eBay');
create table availability
(
DATE date not null,
SKU char(10) not null,
merchant_id tinyint unsigned not null,
availability enum ('sofort lieferbar', 'verzögert lieferbar', 'nicht lieferbar', 'außer Handel') null,
constraint DATE
unique (DATE, SKU, merchant_id)
);
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-11', '1', 1, 'sofort lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-11', '1', 2, 'nicht lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-12', '1', 1, 'sofort lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-12', '1', 2, 'nicht lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-13', '1', 1, 'nicht lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-13', '1', 2, 'sofort lieferbar');
create table products
(
SKU char(8) not null
primary key,
NAME varchar(255) null,
MANUFACTURER_ID mediumint unsigned null,
updated datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP
);
INSERT INTO test.products (SKU, NAME, MANUFACTURER_ID, updated) VALUES ('1', 'Sneaker', 1, '2021-05-12 02:27:46');
INSERT INTO test.products (SKU, NAME, MANUFACTURER_ID, updated) VALUES ('2', 'Ball', 1, '2021-05-12 02:27:46');
INSERT INTO test.products (SKU, NAME, MANUFACTURER_ID, updated) VALUES ('3', 'Pen', 2, '2021-05-12 02:27:46');
INSERT INTO test.products (SKU, NAME, MANUFACTURER_ID, updated) VALUES ('4', 'Paper', 2, '2021-05-12 02:27:46');
I have written a query which seems to work for the data you have provided. Let me know if there's any issue and I'll see what I can do.
SELECT CONCAT('merchant', t.ID) as merchant,
t.Date,
g.prod_available / t.all_prod_from_merch AS percentage_available
# gets total number of products in time range Date,
FROM (SELECT ID,
COUNT(merchant_ID) AS all_prod_from_merch
FROM merchants m
JOIN availability a
ON m.ID = a.merchant_ID
WHERE Date < CURDATE()
AND Date >= curdate() - INTERVAL 10 DAY
GROUP BY merchant_ID,
Date ) t
LEFT JOIN (SELECT merchant_ID,
Date,
COUNT(merchant_ID) AS prod_available
FROM availability
WHERE AVAILABILITY = 'sofort lieferbar'
AND date IN (SELECT Date
FROM availability
WHERE date < CURDATE()
AND date >= CURDATE() - INTERVAL 10 DAY
GROUP BY Date )
GROUP BY merchant_ID,
Date ) g
ON g.merchant_ID = t.ID
AND g.Date = t.Date
ORDER BY t.date;
The first select in the join gets the total number of products in the time range for each merchant. The second one gets those available from each merchant. So the select at the beginning just does the fraction.
I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?
Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.
Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.
CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0
Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;
Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);
Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)
I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a
IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha
I'm having trouble coming up with a query which is going to allow me to keep only the most recent order from a user (maybe a better way to say this is delete all old orders):
CREATE TABLE orders(id integer, created_at datetime, user_id integer, label nvarchar(25));
INSERT INTO orders values(1, now(), 1, 'FRED FIRST');
INSERT INTO orders values(2, DATE_ADD(now(), INTERVAL 1 DAY), 1, 'FRED SECOND');
INSERT INTO orders values(3, DATE_ADD(now(), INTERVAL 2 DAY), 1, 'FRED THIRD');
INSERT INTO orders values(4, DATE_ADD(now(), INTERVAL 1 DAY), 3, 'BARNEY FIRST');
SELECT * FROM orders;
'1','2014-03-07 08:39:36','1','FRED FIRST'
'2','2014-03-08 08:39:36','1','FRED SECOND'
'3','2014-03-09 08:39:36','1','FRED THIRD'
'4','2014-03-08 08:39:36','3','BARNEY FIRST'
I would like to run a query which would leave me with FRED's THIRD order and BARNEY's FIRST order. FRED FIRST and FRED SECOND should be deleted because they are not the latest order from FRED.
Any thoughts about how I might be able to do this with a single query?
EDIT: After posting this, I found something that works (it does what I'm looking to do)-- but it seems a bit messy:
DELETE old_orders
FROM orders old_orders
left outer join(
SELECT MAX(created_at) as created_at, user_id
FROM orders
GROUP BY user_id) new_orders
ON new_orders.user_id = old_orders.user_id and new_orders.created_at = old_orders.created_at
WHERE new_orders.user_id is null;
Use a nested query, like this:
DELETE FROM orders
WHERE id NOT IN (
SELECT id FROM (
select id from orders o JOIN (
select user_id, max(created_at) t from orders group by user_id
) o1 ON o.user_id = o1.user_id AND o.created_at = o1.t
) AS tmp
)
Working Fiddle: http://sqlfiddle.com/#!2/56d913/1
One way you might achieve this is to set a flag for the row indicating that it is the most recent order. So when a new order is placed you would clear the flag on other orders for that customer and set the flag for the row that your inserting. Then your DELETE query could just delete all orders that don't have that flag set.
I have a MySQL database with a structure like this...
Site has many Sensors
Sensors has many SensorReadings
I want to get all Sensors for a Site and the last 5 SensorReadings for all those Sensors. I suspect I'm going to have to do something with a stored procedure and temporary tables (if they even exist in MySQL.
Possibly Something like...
SELECT reading,
date
FROM (select sensor_id,
reading,
date,
#num := if(#sensor_id = sensor_id, #num + 1, 1) as row_number,
#sensor_id := sensor_id as dummy
from sensor_readings
order by sensor_id,
date desc) T
WHERE row_number<=5
Please give your actual table structure(s) in your question.
Full example using MySQL variables. For brevity, this displays the top 2 readings per sensor.
drop table if exists Sensors;
create table Sensors (Id int);
insert Sensors (id) values (1), (2), (3);
drop table if exists SensorReadings;
create table SensorReadings (SensorId int, RecordDate date);
insert SensorReadings (SensorId, RecordDate) values
(1, '2011-01-01'),
(1, '2011-01-02'),
(1, '2011-01-03'),
(2, '2011-01-01'),
(2, '2011-01-02'),
(2, '2011-01-03');
set #num = -1;
set #SensorId = -1;
select *
from Sensors s
join (
select *
, #num := if(#SensorId = SensorId, #num + 1, 1) as rn
, #SensorId := SensorId
from SensorReadings sr
order by
SensorId
, RecordDate desc
) as numbered
on numbered.SensorId = s.Id
where numbered.rn < 3;
Based on Andomar dump
select * from SensorReadings as t1
where (select count(*) from SensorReadings as t2
where t1.sensorid = t2.sensorid and t2.recordDate > t1.recordDate) <2