I must be overseeing something, but I have the following query:
SELECT
`Poster`, Round(Sum(If((`Date`>=DATE_ADD(CURDATE(),INTERVAL -1 Month) And `Date`<CURDATE()),1,0))/DATEDIFF(CURDATE(),DATE_ADD(CURDATE(),INTERVAL -1 Month)),0) AS `statistics`
FROM `forenposts`
GROUP BY `Poster`
ORDER BY `statistics` DESC
LIMIT 5
This takes roughly 15 seconds in a database with more than 1.5 million entries.
Is there an easy way to optimize it or is the If function just taking so long?
Can you try this query? I removed some constant calculations out of the query, as operation with every row tends to be very slow. Also, I would use constant, not CURDATE() function inside the query, so DB server can optimise it. I can't try it, let me know if it works and I can explain it with more details.
SET #CURDATE = CURDATE();
SET #DATEADD = DATE_ADD(#CURDATE,INTERVAL -1 Month);
SET #DATEDIFF = DATEDIFF(#CURDATE,#DATEADD);
SELECT
`Poster`,
Round(Sum(If((`Date`>=#DATEADD And `Date`< #CURDATE), 1, 0)) / #DATEDIFF,0) AS `statistics`
FROM `forenposts`
GROUP BY `Poster`
ORDER BY `statistics` DESC
LIMIT 5
Related
I have a real time data table with time stamps for different data points
Time_stamp, UID, Parameter1, Parameter2, ....
I have 400 UIDs so each time_stamp is repeated 400 times
I want to write a query that uses this table to check if the real time data flow to the SQL database is working as expected - new timestamp every 5 minute should be available
For this what I usually do is query the DISTINCT values of time_stamp in the table and order descending - do a visual inspection and copy to excel to calculate the difference in minutes between subsequent distinct time_stamp
Any difference over 5 min means I have a problem. I am trying to figure out how I can do something similar in SQL, maybe get a table that looks like this. Tried to use LEAD and DISTINCT together but could not write the code myself, im just getting started on SQL
Time_stamp, LEAD over last timestamp
Thank you for your help
You can use lag analytical function as follows:
select t.* from
(select t.*
lag(Time_stamp) over (order by Time_stamp) as lg_ts
from your_Table t)
where timestampdiff('minute',lg_ts,Time_stamp) > 5
Or you can also use the not exists as follows:
select t.*
from your_table t
where not exists
(select 1 from your_table tt
where timestampdiff('minute',tt.Time_stamp,t.Time_stamp) <= 5)
and t.Time_stamp <> (select min(tt.Time_stamp) from your_table tt)
lead() or lag() is the right approach (depending on whether you want to see the row at the start or end of the gap).
For the time comparison, I recommend direct comparisons:
select t.*
from (select t.*
lead(Time_stamp) over (partition by uid order by Time_stamp) as next_time_stamp
from t
) t
where next_timestamp > time_stamp + interval 5 minute;
Note: exactly 5 minutes seems unlikely. You might want a fudge factor such as:
where next_timestamp > time_stamp + interval 5*60 + 10 second;
timestampdiff() counts the number of "boundaries" between two values. So, the difference in minutes between 00:00:59 and 00:01:02 is 1. And the difference between 00:00:00 and 00:00:59 is 0.
So, a difference of "5 minutes" could really be 4 minutes and 1 second or could be 5 minutes and 59 seconds.
I have written a query to select all rows where value of a column 'gvA' in previous row is 0 and non-zero in current row. But my issue is this query takes too long to execute.
My table has 40000 rows and query takes about 60-65 seconds which is too much for a query. How can I improve query for better performance.Following is my query
SELECT device_no,datetime
FROM (
SELECT
gvA,
(SELECT e2.gvA
FROM tyn_records e2
WHERE e2.tyn_id < e1.tyn_id
ORDER BY tyn_id DESC LIMIT 1) as previous_value,
datetime,
device_no
FROM tyn_records e1
WHERE gvA > 0 AND DATE(datetime) = CURDATE() - INTERVAL 2 DAY
) selected
WHERE selected.previous_value = 0
Following are my tables
Devices:
tyn_records:
I would do two things:
I would rephrase the query a bit, specifically to remove the DATE() function in the left side of the filtering condition.
select
device_no,
datetime
from (
select
gva,
lag(gva) over(order by tyn_id) as previous_value,
datetime,
device_no
from tyn_records
where gva > 0
and datetime between curdate() - interval 2 day
and curdate() - interval 1 day
) x
where previous_value = 0
With the function on the left side of the predicate removed, you can create an index suitable to optimize the query:
create index ix1 on tyn_records (datetime, gva);
As a side note, the way you compute previous_value may not be deterministic, and could produce different results each time you run the query. This may happen if the column tyn_id is non unique.
We have to check 7 million rows to make campagne statistics. It takes around 30 seconds to run the query and it doesnt improve with indexes.
Indexes didnt change the speed at all.
I tried adding indexes on the where fields, the where fields + group by and the where fields + sum.
Server type is MYSQL and the server version is 5.5.31.
SELECT
NOW(), `banner_campagne`.name, `banner_view`.banner_uid, SUM(`banner_view`.fetched) AS fetched,
SUM(`banner_view`.loaded) AS loaded,
SUM(`banner_view`.seen) AS seen
FROM `banner_view` INNER JOIN
`banner_campagne`
ON `banner_campagne`.uid = `banner_view`.banner_uid AND
`banner_campagne`.deleted = 0 AND
`banner_campagne`.weergeven = 1
WHERE
`banner_view`.campagne_uid = 6 AND `banner_view`.datetime >= '2019-07-31 00:00:00' AND `banner_view`.datetime < '2019-08-30 00:00:00'
GROUP BY
`banner_view`.banner_uid
I expect the query to run around 5 seconds.
The indexes that you want for this query are probably:
banner_view(campagne_uid, datetime)
banner_campagne(banner_uid, weergeven, deleted)
Note that the order of the columns in the index does matter.
I use a Mantis Bug Database (that uses MySQL) and I want to query which bugs had a change in their severity within the last 2 weeks, however only the last severity change of the bug should be indicated.
The problem is, that I get multiple entries per bugID (which is the primary key), which is not my desired result since I want to have only the latest change per bug. This means that somehow I am using the max function and the group by clause wrongfully.
Here you can see my query:
SELECT `bug_id`,
max(date_format(from_unixtime(`mantis_bug_history_table`.`date_modified`),'%Y-%m-%d %h:%i:%s')) AS `Severity_changed`,
`mantis_bug_history_table`.`old_value`,
`mantis_bug_history_table`.`new_value`
from `prepared_bug_list`
join `mantis_bug_history_table` on `prepared_bug_list`.`bug_id` = `mantis_bug_history_table`.`bug_id`
where (`mantis_bug_history_table`.`field_name` like 'severity')
group by `bug_id`,`old_value`,`.`new_value`
having (`Severity_modified` >= (now() - interval 2 week))
order by bug_id` ASC
For the bug with the id 8 for example I get three entries with this query. The bug with the id 8 had indeed three severity changes within the last 2 weeks but I only want to get the latest severity change.
What could be the problem with my query?
max() is an aggregation function and it does not appear to be suitable for what you are trying to do.
I have feeling that what you are trying to do is to get the latest out of all the applicable bug_id in mantis_bug_history_table . If that is true, then I would rewrite the query as the following -- I would write a sub-query getLatest and join it with prepared_bug_list
Updated answer
Caution: I don't have access to the actual DB tables so this query may have bugs
select
`getLatest`.`last_bug_id`
, `mantis_bug_history_table`.`date_modified`
, `mantis_bug_history_table`.`old_value`
, `mantis_bug_history_table`.`new_value`
from
(
select
(
select
`bug_id`
from
`mantis_bug_history_table`
where
`date_modified` > unix_timestamp() - 14*24*3600 -- two weeks
and `field_name` like 'severity'
and `bug_id` = `prepared_bug_list`.`bug_id`
order by
`date_modified` desc
limit 1
) as `last_bug_id`
from
`prepared_bug_list`
) as `getLatest`
inner join `mantis_bug_history_table`
on `prepared_bug_list`.`bug_id` = `getLatest`.`last_bug_id`
order by `getLatest`.`bug_id` ASC
I finally have a solution! I friend of mine helped me and one part of the solution was to include the Primary key of the mantis bug history table, which is not the bug_id, but the column id, which is a consecutive number.
Another part of the solution was the subquery in the where clause:
select `prepared_bug_list`.`bug_id` AS `bug_id`,
`mantis_bug_history_table`.`old_value` AS `old_value`,
`mantis_bug_history_table`.`new_value` AS `new_value`,
`mantis_bug_history_table`.`type` AS `type`,
date_format(from_unixtime(`mantis_bug_history_table`.`date_modified`),'%Y-%m-%d %H:%i:%s') AS `date_modified`
FROM `prepared_bug_list`
JOIN mantis_import.mantis_bug_history_table
ON `prepared_bug_list`.`bug_id` = mantis_bug_history_table.bug_id
where (mantis_bug_history_table.id = -- id = that is the id of every history entry, not confuse with bug_id
(select `mantis_bug_history_table`.`id` from `mantis_bug_history_table`
where ((`mantis_bug_history_table`.`field_name` = 'severity')
and (`mantis_bug_history_table`.`bug_id` = `prepared_bug_list`.`bug_id`))
order by `mantis_bug_history_table`.`date_modified` desc limit 1)
and `date_modified` > unix_timestamp() - 14*24*3600 )
order by `prepared_bug_list`.`bug_id`,`mantis_bug_history_table`.`date_modified` desc
The problem:
We're getting stock prices and trades from a provider, and to speed things up we cache the trades as they come in (1 trade per second per stock is not a lot). We've got around 2,000 stocks, so technically, we're expecting as much as 120,000 trades per minute (2,000 * 60). Now, these prices are realtime, but to avoid paying licensing fees to show these data to the customer we need to show the prices delayed with 15 minutes. (We need the realtime prices internally, which is why we've bought and pay for them (they are NOT cheap!))
I feel like I've tried everything, and I've run into an uncountable number of problems.
Things I've tried:
1:
Run a cronjob every 15 seconds that runs a query that checks what the trade for the stock, more than 15 minutes ago, had for an ID (for joins):
SELECT
MAX(`time`) as `max_time`,
`stock_id`
FROM
`stocks_trades`
WHERE
`time` <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
AND
`time` > '0000-00-00 00:00:00'
GROUP BY
`stock_id`
This works very fast - 1.8 seconds with ~2,000,000 rows, but the following is very slow:
SELECT
st.id,
st.stock_id
FROM
(
SELECT
MAX(`time`) as `max_time`,
`stock_id`
FROM
`stocks_trades`
WHERE
`time` <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
AND
`time` > '0000-00-00 00:00:00'
GROUP BY
`stock_id`
) as `tmp`
INNER JOIN
`stocks_trades` as `st`
ON
(tmp.max_time = st.time AND tmp.stock_id = st.stock_id)
GROUP BY
`stock_id`
..that takes ~180-200 seconds, which is WAY too slow. There's an index on both time and stock_id (indiviudally).
2:
Switch between InnoDB/MyISAM. I'd think I would need InnoDB (we're inserting A LOT of rows from multiple threads, we don't want to block between each insert) - InnoDB seems faster at inserting, but WAY slower at reading (we require both, obviously).
3:
Optimize tables every day. Still slow.
What I think might help:
Using ints instead of DateTime. Perhaps (since the markets are open from 9-22) keep a custom int time, which would be "seconds since 9 o'clock this morning" and use the same method as above (it seems to make some difference, albeit not a lot)
Use MEMORY instead of InnoDB - probably not the best idea with ~18,000,000 rows per 15 minutes, even though we have plenty of memory
Save price/stockID/time in memory in our application receiving the prices (I don't see how this would be any different than using MEMORY, except my code probably will be worse than MySQL's own code)
Keep deleting trades older than 15 minutes in hopes that it'll speed up the queries
Some magic query that I just haven't thought of that uses the indexes perfectly and does magical things
Give up and kill one self after spending ~12 hours on trying to wrap my head around this and different solutions
Since your are joining against your subquery on two columns (stock_id, time), MySQL ought to be able to make use of a compound index across both of them, while it cannot make use of either of the individual column indices you already have.
ALTER TABLE `stocks_trades` ADD INDEX `idx_stock_id_time` (`stock_id`, `time`)
Assuming your have an auto incrementing id as the primary key on stock_trades (call it stock_trade_id), you could select the max('stock_trade_id') as 'last_id' on the inner query and then do an inner join on the 'last_id' = 'stock_trade_id' so you will be joining on your PK and have no date compares on your main join.
SELECT
st.id,
st.stock_id
FROM
(
SELECT
MAX(`stock_trade_id`) as `last_id`,
`stock_id`
FROM
`stocks_trades`
WHERE
`time` <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
AND
`time` > '0000-00-00 00:00:00'
GROUP BY
`stock_id`
) as `tmp`
INNER JOIN
`stocks_trades` as `st`
ON
(tmp.last_id = st.stock_trade_id)
GROUP BY
`stock_id`
What happens if you run something like this? Try to change it to include the proper column name for the price if needed:
SELECT st.id, st.stock_id
FROM stock_trades as st
WHERE time <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
AND time > DATE_SUB(NOW(), INTERVAL 45 MINUTE)
AND not exists (select 1 from stock_trades as st2 where st2.time <= DATE_SUB(NOW(), INTERVAL 15 MINUTE) and st2.stock_id = st.stock_id and st2.time > st.time)
hope it helps!