I need to calculate the average time of all the operations stored in the database. The table I store operations in looks as follows:
creation time | operation_type | operation_id
2017-01-03 11:14:25 | START | 1
2017-01-03 11:14:26 | START | 2
2017-01-03 11:14:28 | END | 2
2017-01-03 11:14:30 | END | 1
In this case operation 1 took 5 seconds and operation 2 took 2 seconds to finish.
How can I calculate the average of these operations in MySQL?
EDIT:
It seems that operation_id doesn't need to be unique - given operation may be executed several times, so the table might look as follows:
creation time | operation_type | operation_id
2017-01-03 11:14:25 | START | 1
2017-01-03 11:14:26 | START | 2
2017-01-03 11:14:28 | END | 2
2017-01-03 11:14:30 | END | 1
2017-01-03 11:15:00 | START | 1
2017-01-03 11:15:10 | END | 1
What should I add in the query to properly calculate the average time of all these operations?
I'm not sure that a subquery is necessary...
SELECT AVG(TIME_TO_SEC(y.creation_time)-TIME_TO_SEC(x.creation_time)) avg_diff
FROM my_table x
JOIN my_table y
ON y.operation_id = x.operation_id
AND y.operation_type = 'end'
WHERE x.operation_type = 'start';
Since the END of an operation is always after the START you can use MIN and MAX
select avg(diff)
from
(
select operation_id,
TIME_TO_SEC(TIMEDIFF(max(creation_time), min(creation_time))) as diff
from your_table
group by operation_id
) tmp
select avg(diff)
from
(
select a1.operation_id, timediff(a2.operation_time, a1.operation_time) as diff
from oper a1 -- No table name provided, went with 'oper' because it made sense in my head
inner join oper a2
on a1.operation_id = a2.operation_id
where a1.operation_type = 'START'
and a2.operation_type = 'END'
)
Related
I need help to optimize my 3 queries into one.
I have 2 tables, the first has a list of image processing servers I use, so different servers can handle different simultaneous job loads at a time, so I have a field called quota as seen below.
First table name, "img_processing_servers"
| id | server_url | server_key | server_quota |
| 1 | examp.uu.co | X0X1X2XX3X | 5 |
| 2 | examp2.uu.co| X0X1X2YX3X | 3 |
The second table registers if there is a job being performed at this moment on the server
Second table, "img_servers_lock"
| id | lock_server | timestamp |
| 1 | 1 | 2020-04-30 12:08:09 |
| 2 | 1 | 2020-04-30 12:08:09 |
| 3 | 1 | 2020-04-30 12:08:09 |
| 4 | 2 | 2020-04-30 12:08:09 |
| 5 | 2 | 2020-04-30 12:08:09 |
| 6 | 2 | 2020-04-30 12:08:09 |
Basically what I want to achieve is that my image servers don't go past the max quota and crash, so the 3 queries I would like to combine are:
Select at least one server available that hasn't reached it's quota and then insert a lock record for it.
SELECT * FROM `img_processing_servers` WHERE
SELECT COUNT(timestamp) FROM `img_servers_lock` WHERE `lock_server` = id
! if the count is < than quota, go ahead and register use
INSERT INTO `img_servers_lock`(`lock_server`, `timestamp`) VALUES (id_of_available_server, now())
How would I go about creating this single query?
My goal is to keep my image servers safe from overload.
Join the two tables and put that into an INSERT query.
INSERT INTO img_servers_lock(lock_server, timestamp)
SELECT s.id, NOW()
FROM img_processing_servers s
LEFT JOIN img_servers_lock l ON l.lock_server = s.id
GROUP BY s.id
HAVING IFNULL(COUNT(l.id), 0) < s.server_quota
ORDER BY s.server_quota - IFNULL(COUNT(l.id), 0) DESC
LIMIT 1
The ORDER BY clause makes it select the server with the most available quota.
OK, so I encountered just a small addition that was giving me a bug and it was that the s.server_quota had to be added to GROUP BY for it to work in the HAVING
INSERT INTO img_servers_lock(lock_server, timestamp)
SELECT s.id, NOW()
FROM alpr_servers s
LEFT JOIN img_servers_lock l ON l.lock_server = s.id
GROUP BY s.id, s.server_quota
HAVING IFNULL(COUNT(l.id), 0) < s.server_quota
ORDER BY s.server_quota - IFNULL(COUNT(l.id), 0) DESC
LIMIT 1
Thanks again Barmar!
I am trying to optimize the sql query on a large event table (10 million+ rows) for date range search. I already have unique index on this table which (lid, did, measurement, date).The query below is trying to get the event of three type of measurement (Kilowatts, Current and voltage) for every 2 second interval in date column :
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Voltage")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Current")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Kilowatts")
group by timekey
This is the table that I am trying to look up to.
=============================================================
id | lid | did | measurement | date
=============================================================
1 | 1 | 1 | Kilowatts | 2020-04-27 00:00:00
=============================================================
2 | 1 | 1 | Current | 2020-04-27 00:00:00
=============================================================
3 | 1 | 1 | Voltage | 2020-04-27 00:00:00
=============================================================
4 | 1 | 1 | Kilowatts | 2020-04-27 00:00:01
=============================================================
5 | 1 | 1 | Current | 2020-04-27 00:00:01
=============================================================
6 | 1 | 1 | Voltage | 2020-04-27 00:00:01
=============================================================
7 | 1 | 1 | Kilowatts | 2020-04-27 00:00:02
=============================================================
8 | 1 | 1 | Current | 2020-04-27 00:00:02
=============================================================
9 | 1 | 1 | Voltage | 2020-04-27 00:00:02
The expected result is retrieve all data that have the date equal to 2020-04-27 00:00:00 and 2020-04-27 00:00:02. The query provided above work as expected. But I am using UNION for look up different measurements on the table, I believe it might not be the optimal way to do it.
Can any SQL expert help me to tone the query that I have to increase the performance?
You have one record every second for each and every measurement, and you want to select one record every two seconds.
You could try:
select *
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
and extract(second from date) % 2 = 0
This would select records that have an even second part.
Alternatively, if you always have one record every second, another option is row_number() (this requires MySQL 8.0):
select *
from (
select
e.*,
row_number() over(partition by measurement order by date) rn
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1
This is a bit less accurate than the previous query though.
Your query is actually three queries combined into one. Luckily they all select rows of data based on similar columns. If you want to make this query run fast you can add the following index:
create index ix1 on events (lid, did, measurement);
In addition to above suggestions, changing the PRIMARY KEY will give you a little more performance:
PRIMARY KEY(lid, did, date, measurement)
and toss id.
Caveat, there could be hiccups if two readings come in at exactly the same "second". This could easily happen if one reading comes in just after the clock ticks, and the next comes in just before the next tick.
MySQL
Lets say there is a credit card processing company. Every time a credit card is used a row gets inserted into a table.
create table tran(
id int,
tran_dt datetime,
card_id int,
merchant_id int,
amount int
);
One wants to know what cards have been used 3+ times in any 15 minute window at the same merchant.
My attempt:
select card_id, date(tran_dt), hour(tran_dt), merchant_id, count(*)
from tran
group by card_id, date(tran_dt), hour(tran_dt), merchant_id
having count(*)>=3
The first problem is that would give excessive transactions per hour, not per a 15 minute window. The second problem is that would not catch transactions that cross the hour mark ie at 1:59pm and 2:01pm.
To make this simpler, it would ok to split up the hour into 5 minute increments. So we would not have to check 1:00-1:15pm, 1:01-1:16pm, etc. It would be ok to check 1:00-1:15pm, 1:05-1:20pm, etc., if that is easier.
Any ideas how to fix the sql? I have a feeling maybe I need sql window functions, that are not yet available in MySQL. Or write a stored procedure that can look at each 15 block.
http://sqlfiddle.com/#!9/f2d74/1
You can convert the date/time to seconds and do arithmetic on the seconds to get the value within a 15 minute clock interval:
select card_id, min(date(tran_dt)) as first_charge_time, merchant_id, count(*)
from tran
group by card_id, floor(to_seconds(tran_dt) / (60 * 15)), merchant_id
having count(*) >= 3;
The above uses to_seconds(). In earlier versions of MySQL, you can use unix_timestamp().
Getting any 15 minute interval is more challenging. You can express the query as:
select t1.*, count(*) as numTransactions
from tran t1 join
tran t2
on t1.merchant_id = t2.merchanti_d and
t1.card_id = t2.card_id and
t2.tran_dt >= t1.tran_dt and
t2.tran_dt < t1.tran_dt + interval 15 minute
group by t1.id
having numTransactions >= 3;
Performance of this query might be problematic. An index on trans(card_id, merchant_id, tran_dt) should help a lot.
An option might be adding a trigger to the tran table on insert that checks the card_id inserted against the previous 15 minutes. If the count is greater than 3 then insert it into an "audit" table that you can query at your leisure.
-- create table to store audited cards
create table audit_cards(
card_id int,
tran_dt datetime
);
-- create trigger on tran table to catch the cards used 3 times in 15 min
CREATE TRIGGER audit_card AFTER INSERT ON tran
BEGIN
if (select count(new.card_id)
from tran
where tran_dt >= (new.tran_dt - INTERVAL 15 MINUTE)) >= 3
THEN
INSERT new.card_id, new.tran_dt into audit_cards;
END;
Then you can run a report on these cards...
select * from audit_cards;
http://dev.mysql.com/doc/refman/5.6/en/trigger-syntax.html
SELECT t1.card_id,t1.merchant_id,count(distinct t1.id)+1 as ChargeCount
FROM tran t1
INNER JOIN tran t2
on t2.card_id=t1.card_id
and t2.merchant_id=t1.merchant_id
and t2.tran_dt <= DATE_ADD(t1.tran_dt, INTERVAL 15 MINUTE)
and t2.id>t1.id
GROUP BY t1.card_id,t1.merchant_id
HAVING ChargeCount>2;
I was able to group all rows belonging to the same 15 minute window without duplicate records in the result, using in a single query.
Say your table has:
| id | tran_dt | card_id | merchant_id | amount |
|----|---------------------|---------|-------------|--------|
| 13 | 2015-07-23 16:40:00 | 1 | 1 | 10 |
| 14 | 2015-07-23 16:59:00 | 1 | 1 | 10 | <-- these should
| 15 | 2015-07-23 17:00:00 | 1 | 1 | 10 | <-- be identified
| 16 | 2015-07-23 17:01:00 | 1 | 1 | 10 | <-- in the
| 17 | 2015-07-23 17:02:00 | 1 | 1 | 10 | <-- first group
| 18 | 2015-07-23 17:03:00 | 2 | 2 | 10 |
...
| 50 | 2015-07-24 17:58:00 | 1 | 1 | 10 | <-- and these
| 51 | 2015-07-24 17:59:00 | 1 | 1 | 10 | <-- in the
| 52 | 2015-07-24 18:00:00 | 1 | 1 | 10 | <-- second
The result will be:
| id | card_id | merchant_id | numTrans | amount | dateTimeFirstTrans | dateTimeLastTrans
|----|---------|-------------|----------|--------|---------------------|---------------------
| 14 | 1 | 1 | 4 | 40 | 2015-07-23 16:59:00 | 2015-07-23 17:02:00
| 50 | 1 | 1 | 3 | 30 | 2015-07-24 17:58:00 | 2015-07-24 18:00:00
The query (SQL Fiddle):
select output.* from
(
select
min(subquery.main_id) as id,
subquery.main_card_id as card_id,
subquery.main_merchant_id as merchant_id,
count(subquery.main_id) as numTrans,
sum(subquery.main_amount) as amount,
min(subquery.x_timeFrameStart) as dateTimeFirstTrans,
max(subquery.x_timeFrameStart) as dateTimeLastTrans
from
(
select
main.id as main_id,
main.card_id as main_card_id,
main.merchant_id as main_merchant_id,
main.tran_dt as main_timeFrameStart,
main.amount as main_amount,
main.tran_dt + INTERVAL 15 MINUTE as main_timeFrameEnd,
xList.tran_dt as x_timeFrameStart,
xList.tran_dt + INTERVAL 15 MINUTE as x_timeFrameEnd
from tran as main
inner join tran as xList on /* cross list */
main.card_id = xList.card_id and
main.merchant_id = xList.merchant_id
where
xList.tran_dt between main.tran_dt and main.tran_dt + INTERVAL 15 MINUTE
) as subquery
group by subquery.main_id, subquery.main_card_id, subquery.main_merchant_id, subquery.main_timeFrameStart, subquery.main_timeFrameEnd
having count(subquery.main_id) >= 3
) as output
left join (
select
xList.id as x_id
from tran as main
inner join tran as xList on /* cross list */
main.card_id = xList.card_id and
main.merchant_id = xList.merchant_id and
main.id <> xList.id /* keep only first of the list */
where
xList.tran_dt between main.tran_dt and main.tran_dt + INTERVAL 15 MINUTE
) as exclude on output.id = exclude.x_id
where exclude.x_id is null;
The query is a bit long, and it repeats one subquery just to filter duplicates, so do your testing and tuning to make sure you don't incur in performance problems.
Using MySQL, I have a table that keep track of user visit:
USER_ID | TIMESTAMP
--------+----------------------
1 | 2014-08-11 14:37:36
2 | 2014-08-11 12:37:36
3 | 2014-08-07 16:37:36
1 | 2014-07-14 15:34:36
1 | 2014-07-09 14:37:36
2 | 2014-07-03 14:37:36
3 | 2014-05-23 15:37:36
3 | 2014-05-13 12:37:36
Time is not important, more concern about answer to "how many days between entries"
How do I go about figuring how the average number of days between entries through SQL queries?
For example, the output should look like something like:
(output is just a sample, not reflection of the data table above)
USER_ID | AVG TIME (days)
--------+----------------------
1 | 2
2 | 3
3 | 1
MySQL has no direct "get something from a previous row" capabilities. Easiest workaround is to use a variable to store that "previous" value:
SET last = null;
SELECT user_id, AVG(diff)
FROM (
SELECT user_id, IF(last IS NULL, 0, timestamp - last) AS diff, #last := timestamp
FROM yourtable
ORDER BY user_id, timestamp ASC
) AS foo
GROUP BY user_id
The inner query does your "difference from previous row" calculations, and the outer query does the averaging.
Here is my problem, I have a MYSQL table with the following columns and data examples :
id | user | starting date | ending date | activity code
1 | Andy | 2010-04-01 | 2010-05-01 | 3
2 | Andy | 1988-11-01 | 1991-03-01 | 3
3 | Andy | 2005-06-01 | 2008-08-01 | 3
4 | Andy | 2005-08-01 | 2008-11-01 | 3
5 | Andy | 2005-06-01 | 2010-05-01 | 4
6 | Ben | 2010-03-01 | 2011-06-01 | 3
7 | Ben | 2010-03-01 | 2010-05-01 | 4
8 | Ben | 2005-04-01 | 2011-05-01 | 3
As you can see in this table users can have same activity code and similar dates or periods. And For a same user, periods can overlap others or not. It is also possible to have several overlap periods in the table.
What I want is a MYSQL QUERY to get the following result :
new id | user | starting date | ending date | activity code
1 | Andy | 2010-04-01 | 2010-05-01 | 3 => ok, no overlap period
2 | Andy | 1988-11-01 | 1991-03-01 | 3 => ok, no overlap period
3 | Andy | 2005-06-01 | 2008-11-01 | 3 => same user, same activity but ending date coming from row 4 as extended period
4 | Andy | 2005-06-01 | 2010-05-01 | 4 => ok other activity code
5 | Ben | 2005-04-01 | 2011-06-01 | 3 => ok other user, but as overlap period rows 6 and 8 for the same user and activity, I take the widest range
6 | Ben | 2010-03-01 | 2010-05-01 | 4 => ok other activity for second user
In other words, for a same user and activity code, if there is no overlap, I need the starting and ending dates as they are. If there is an overlap for a same user and activity code, I need the lower starting date and the higher ending date coming from the different related rows. I need this for all the users and activity code of the table and in SQL for MYSQL.
I hope it is clear enough and someone can help me because I try different codes from solutions supplied on this site and others without success.
I have somewhat convoluted (strictly MySQL-specific) solution:
SET #user = NULL;
SET #activity = NULL;
SET #interval_id = 0;
SELECT
MIN(inn.`starting date`) AS start,
MAX(inn.`ending date`) AS end,
inn.user,
inn.`activity code`
FROM
(SELECT
IF(user <> #user OR `activity code` <> #activity,
#interval_id := #interval_id + 1, NULL),
IF(user <> #user OR `activity code` <> #activity,
#interval_end := STR_TO_DATE('',''), NULL),
#user := user,
#activity := `activity code`,
#interval_id := IF(`starting date` > #interval_end,
#interval_id + 1,
#interval_id) AS interval_id,
#interval_end := IF(`starting date` < #interval_end,
GREATEST(#interval_end, `ending date`),
`ending date`) AS interval_end,
t.*
FROM Table1 t
ORDER BY t.user, t.`activity code`, t.`starting date`, t.`ending date`) inn
GROUP BY inn.user, inn.`activity code`, inn.interval_id;
The underlying idea was shamelessly borrowed from the 1st answer to this question.
You can use this SQL Fiddle to review the results and try different source data.
Here is a solution - (see http://sqlfiddle.com/#!2/fda3d/15)
SELECT DISTINCT summarized.`user`
, summarized.activity_code
, summarized.true_begin
, summarized.true_end
FROM (
SELECT t1.id,t1.`user`,t1.activity_code
, MIN(LEAST(t1.`starting`, COALESCE(overlap.`starting` ,t1.`starting`))) as true_begin
, MAX(GREATEST(t1.`ending`, COALESCE(overlap.`ending` ,t1.`ending`))) as true_end
FROM t1
LEFT JOIN t1 AS overlap
ON t1.`user` = overlap.`user`
AND t1.activity_code = overlap.activity_code
AND overlap.`ending` >= t1.`starting`
AND overlap.`starting` <= t1.`ending`
AND overlap.id <> t1.id
GROUP BY t1.id, t1.`user`, t1.activity_code) AS summarized;
I am not sure how it will perform with a large data set with many overlaps. You will definitely need an index on the user and activity_code fields - probably the starting and ending date fields also as part of that index.