MySQL Index calculation? - mysql

I have a Query that could do with optimization if possible as it's taking 15 seconds to run.
It is querying a large db with approx 1000000 records and is slowed down by grouping by hour (which is derived from DATE_FORMAT()).
I indexed all relevant fields in all the tables which improved the performance significantly but I don't know how to or if it's even possible to create an index for the hour group since it's not a field...
I do realise the dataset is very large but I'd like to know if I have any options.
Any help would be appreciated!
Thanks!
SELECT `id`,
tbl1.num,
name,
DATE_FORMAT(`timestamp`,'%x-%v') AS wknum,
DATE_FORMAT(`timestamp`,'%Y-%m-%d') AS date,
DATE_FORMAT(`timestamp`,'%H') as hour,
IF(code<>0,codedescription,'') AS status,
SUM(TIME_TO_SEC(`timeblock`))/60 AS time,
SUM(`distance`) AS distance,
SUM(`distance`)/(SUM(TIME_TO_SEC(`timeblock`))/60) AS speed
FROM `tbl1`
LEFT JOIN `tbl2` ON tbl1.code = tbl2.code
LEFT JOIN `tbl3` ON tbl1.status = tbl3.status
LEFT JOIN `tbl4` ON tbl1.conditionnum = tbl4.conditionnum
LEFT JOIN `tbl5` ON tbl1.num = edm_mc_list.num
WHERE `timestamp`>'2013-07-28 00:00:00'
GROUP BY `num`,DATE_FORMAT(`timestamp`,'%H'),`mcstatus`

MySQL generally can’t use indexes on columns unless the columns are
isolated in the query. Isolating the column means it should not be part of an expression or be inside a function in the query.
Solutions:
1-You can store hour separate from timestamp column. for example you can store it by both before insert and before update triggers.
DELIMITER $$
CREATE TRIGGER `before_update_hour`
BEFORE UPDATE ON `tbl1`
FOR EACH ROW
BEGIN
IF NEW.`timestamp` != OLD.`timestamp` THEN
SET NEW.`hour` = DATE_FORMAT( NEW.`timestamp`,'%H')
END IF;
END;
$$
DELIMITER ;
DELIMITER $$
CREATE TRIGGER `before_insert_hour`
BEFORE INSERT ON `tbl1`
FOR EACH ROW
BEGIN
SET NEW.`hour` = DATE_FORMAT( NEW.`timestamp`,'%H')
END;
$$
DELIMITER ;
2-If you can use MariaDB, you can use MariaDB virtual columns.

Related

MySQL - Find gaps in time series table [duplicate]

Lets say we have a database table with two columns, entry_time and value. entry_time is timestamp while value can be any other datatype. The records are relatively consistent, entered in roughly x minute intervals. For many x's of time, however, an entry may not be made, thus producing a 'gap' in the data.
In terms of efficiency, what is the best way to go about finding these gaps of at least time Y (both new and old) with a query?
To start with, let us summarize the number of entries by hour in your table.
SELECT CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME) hour,
COUNT(*) samplecount
FROM table
GROUP BY CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME)
Now, if you log something every six minutes (ten times an hour) all your samplecount values should be ten. This expression: CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME) looks hairy but it simply truncates your timestamps to the hour in which they occur by zeroing out the minute and second.
This is reasonably efficient, and will get you started. It's very efficient if you can put an index on your entry_time column and restrict your query to, let's say, yesterday's samples as shown here.
SELECT CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME) hour,
COUNT(*) samplecount
FROM table
WHERE entry_time >= CURRENT_DATE - INTERVAL 1 DAY
AND entry_time < CURRENT_DATE
GROUP BY CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME)
But it isn't much good at detecting whole hours that go by with missing samples. It's also a little sensitive to jitter in your sampling. That is, if your top-of-the-hour sample is sometimes a half-second early (10:59:30) and sometimes a half-second late (11:00:30) your hourly summary counts will be off. So, this hour summary thing (or day summary, or minute summary, etc) is not bulletproof.
You need a self-join query to get stuff perfectly right; it's a bit more of a hairball and not nearly as efficient.
Let's start by creating ourselves a virtual table (subquery) like this with numbered samples. (This is a pain in MySQL; some other expensive DBMSs make it easier. No matter.)
SELECT #sample:=#sample+1 AS entry_num, c.entry_time, c.value
FROM (
SELECT entry_time, value
FROM table
ORDER BY entry_time
) C,
(SELECT #sample:=0) s
This little virtual table gives entry_num, entry_time, value.
Next step, we join it to itself.
SELECT one.entry_num, one.entry_time, one.value,
TIMEDIFF(two.value, one.value) interval
FROM (
/* virtual table */
) ONE
JOIN (
/* same virtual table */
) TWO ON (TWO.entry_num - 1 = ONE.entry_num)
This lines up the tables next two each other offset by a single entry, governed by the ON clause of the JOIN.
Finally we choose the values from this table with an interval larger than your threshold, and there are the times of the samples right before the missing ones.
The over all self join query is this. I told you it was a hairball.
SELECT one.entry_num, one.entry_time, one.value,
TIMEDIFF(two.value, one.value) interval
FROM (
SELECT #sample:=#sample+1 AS entry_num, c.entry_time, c.value
FROM (
SELECT entry_time, value
FROM table
ORDER BY entry_time
) C,
(SELECT #sample:=0) s
) ONE
JOIN (
SELECT #sample2:=#sample2+1 AS entry_num, c.entry_time, c.value
FROM (
SELECT entry_time, value
FROM table
ORDER BY entry_time
) C,
(SELECT #sample2:=0) s
) TWO ON (TWO.entry_num - 1 = ONE.entry_num)
If you have to do this in production on a large table you may want to do it for a subset of your data. For example, you could do it each day for the previous two days' samples. This would be decently efficient, and would also make sure you didn't overlook any missing samples right at midnight. To do this your little rownumbered virtual tables would look like this.
SELECT #sample:=#sample+1 AS entry_num, c.entry_time, c.value
FROM (
SELECT entry_time, value
FROM table
ORDER BY entry_time
WHERE entry_time >= CURRENT_DATE - INTERVAL 2 DAY
AND entry_time < CURRENT_DATE /*yesterday but not today*/
) C,
(SELECT #sample:=0) s
A very efficient way to do this is with a stored procedure using cursors. I think this is simpler and more efficient than the other answers.
This procedure creates a cursor and iterates it through the datetime records that you are checking. If there is ever a gap of more than what you specify, it will write the gap's begin and end to a table.
CREATE PROCEDURE findgaps()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE a,b DATETIME;
DECLARE cur CURSOR FOR SELECT dateTimeCol FROM targetTable
ORDER BY dateTimeCol ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
FETCH cur INTO a;
read_loop: LOOP
SET b = a;
FETCH cur INTO a;
IF done THEN
LEAVE read_loop;
END IF;
IF DATEDIFF(a,b) > [range you specify] THEN
INSERT INTO tmp_table (gap_begin, gap_end)
VALUES (a,b);
END IF;
END LOOP;
CLOSE cur;
END;
In this case it is assumed that 'tmp_table' exists. You could easily define this as a TEMPORARY table in the procedure, but I left it out of this example.
I'm trying this on MariaDB 10.3.27 so this procedure may not work, but I'm getting an error creating the procedure and I can't figure out why! I have a table called electric_use with a field Intervaldatetime DATETIME that I want to find gaps in. I created a target table electric_use_gaps with fields of gap_begin datetime and gap_end datetime
The data are taken every hour and I want to know if I'm missing even an hour's worth of data across 5 years.
DELIMITER $$
CREATE PROCEDURE findgaps()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE a,b DATETIME;
DECLARE cur CURSOR FOR SELECT Intervaldatetime FROM electric_use
ORDER BY Intervaldatetime ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
FETCH cur INTO a;
read_loop: LOOP
SET b = a;
FETCH cur INTO a;
IF done THEN
LEAVE read_loop;
END IF;
IF TIMESTAMPDIFF(MINUTE,a,b) > [60] THEN
INSERT INTO electric_use_gaps(gap_begin, gap_end)
VALUES (a,b);
END IF;
END LOOP;
CLOSE cur;
END&&
DELIMITER ;
This is the error:
Query: CREATE PROCEDURE findgaps() BEGIN DECLARE done INT DEFAULT FALSE; DECLARE a,b DATETIME; DECLARE cur CURSOR FOR SELECT Intervalda...
Error Code: 1064
You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '[60] THEN
INSERT INTO electric_use_gaps(gap_begin, gap_end)
...' at line 16

Can't Setup MySQL Trigger To INSERT Into Second Table

I have two tables which I use to store call details in. One table (Call_Detail) stores the header details against each call that gets entered, the second (Call_History) stores every comment against the call. So a single call will only appear ONCE in the Call_Detail table, but may appear multiple times in the Call_History table.
I currently run a Query to return the latest comment against a group of calls. So, I return the header details out of Call_Detail and then cross reference against the Call_History to find the 'newest' comment (thanks to some outside help). However, this Query can be quite time consuming when running against a large number of calls.
Therefore, I'm thinking to optimize my Query, I want to setup a trigger that records these details.
I am wanting to catch any INSERT command into the Call_History table and record the comment and date/time into the Call_Detail table against the relevant call ID.
So far I have the following but it doesn't like my syntax for some reason:
DELIMITER $$
CREATE TRIGGER Last_Call_Update
AFTER INSERT ON call_history
FOR EACH ROW
BEGIN
UPDATE call_detail
SET last_updated = NEW.updated_at, last_commment = NEW.body
WHERE id = NEW.ticket_id
END $$
DELIMITER ;
Add semicolon after UPDATE statement
DELIMITER $$
CREATE TRIGGER Last_Call_Update
AFTER INSERT ON call_history
FOR EACH ROW
BEGIN
UPDATE call_detail
SET last_updated = NEW.updated_at, last_commment = NEW.body
WHERE id = NEW.ticket_id;
END $$
DELIMITER ;

MySQL: return updated rows

I am trying to combine these two queries in twisted python:
SELECT * FROM table WHERE group_id = 1013 and time > 100;
and:
UPDATE table SET time = 0 WHERE group_id = 1013 and time > 100
into a single query. Is it possible to do so?
I tried putting the SELECT in a sub query, but I don't think the whole query returns me what I want.
Is there a way to do this? (even better, without a sub query)
Or do I just have to stick with two queries?
Thank You,
Quan
Apparently mysql does have something that might be of use, especially if you are only updating one row.
This example is from: http://lists.mysql.com/mysql/219882
UPDATE mytable SET
mycolumn = #mycolumn := mycolumn + 1
WHERE mykey = 'dante';
SELECT #mycolumn;
I've never tried this though, but do let me know how you get on.
This is really late to the party, but I had this same problem, and the solution I found most helpful was the following:
SET #uids := null;
UPDATE footable
SET foo = 'bar'
WHERE fooid > 5
AND ( SELECT #uids := CONCAT_WS(',', fooid, #uids) );
SELECT #uids;
from https://gist.github.com/PieterScheffers/189cad9510d304118c33135965e9cddb
You can't combine these queries directly. But you can write a stored procedure that executes both queries. example:
delimiter |
create procedure upd_select(IN group INT, IN time INT)
begin
UPDATE table SET time = 0 WHERE group_id = #group and time > #time;
SELECT * FROM table WHERE group_id = #group and time > #time;
end;
|
delimiter ;
So what you're trying to do is reset time to zero whenever you access a row -- sort of like a trigger, but MySQL cannot do triggers after SELECT.
Probably the best way to do it with one server request from the app is to write a stored procedure that updates and then returns the row. If it's very important to have the two occur together, wrap the two statements in a transaction.
There is a faster version of the return of updated rows, and more correct when dealing with highly loaded system asks for the execution of the query at the same time on the same database server
update table_name WITH (UPDLOCK, READPAST)
SET state = 1
OUTPUT inserted.
UPDATE tab SET column=value RETURNING column1,column2...

Method of finding gaps in time series data in MySQL?

Lets say we have a database table with two columns, entry_time and value. entry_time is timestamp while value can be any other datatype. The records are relatively consistent, entered in roughly x minute intervals. For many x's of time, however, an entry may not be made, thus producing a 'gap' in the data.
In terms of efficiency, what is the best way to go about finding these gaps of at least time Y (both new and old) with a query?
To start with, let us summarize the number of entries by hour in your table.
SELECT CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME) hour,
COUNT(*) samplecount
FROM table
GROUP BY CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME)
Now, if you log something every six minutes (ten times an hour) all your samplecount values should be ten. This expression: CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME) looks hairy but it simply truncates your timestamps to the hour in which they occur by zeroing out the minute and second.
This is reasonably efficient, and will get you started. It's very efficient if you can put an index on your entry_time column and restrict your query to, let's say, yesterday's samples as shown here.
SELECT CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME) hour,
COUNT(*) samplecount
FROM table
WHERE entry_time >= CURRENT_DATE - INTERVAL 1 DAY
AND entry_time < CURRENT_DATE
GROUP BY CAST(DATE_FORMAT(entry_time,'%Y-%m-%d %k:00:00') AS DATETIME)
But it isn't much good at detecting whole hours that go by with missing samples. It's also a little sensitive to jitter in your sampling. That is, if your top-of-the-hour sample is sometimes a half-second early (10:59:30) and sometimes a half-second late (11:00:30) your hourly summary counts will be off. So, this hour summary thing (or day summary, or minute summary, etc) is not bulletproof.
You need a self-join query to get stuff perfectly right; it's a bit more of a hairball and not nearly as efficient.
Let's start by creating ourselves a virtual table (subquery) like this with numbered samples. (This is a pain in MySQL; some other expensive DBMSs make it easier. No matter.)
SELECT #sample:=#sample+1 AS entry_num, c.entry_time, c.value
FROM (
SELECT entry_time, value
FROM table
ORDER BY entry_time
) C,
(SELECT #sample:=0) s
This little virtual table gives entry_num, entry_time, value.
Next step, we join it to itself.
SELECT one.entry_num, one.entry_time, one.value,
TIMEDIFF(two.value, one.value) interval
FROM (
/* virtual table */
) ONE
JOIN (
/* same virtual table */
) TWO ON (TWO.entry_num - 1 = ONE.entry_num)
This lines up the tables next two each other offset by a single entry, governed by the ON clause of the JOIN.
Finally we choose the values from this table with an interval larger than your threshold, and there are the times of the samples right before the missing ones.
The over all self join query is this. I told you it was a hairball.
SELECT one.entry_num, one.entry_time, one.value,
TIMEDIFF(two.value, one.value) interval
FROM (
SELECT #sample:=#sample+1 AS entry_num, c.entry_time, c.value
FROM (
SELECT entry_time, value
FROM table
ORDER BY entry_time
) C,
(SELECT #sample:=0) s
) ONE
JOIN (
SELECT #sample2:=#sample2+1 AS entry_num, c.entry_time, c.value
FROM (
SELECT entry_time, value
FROM table
ORDER BY entry_time
) C,
(SELECT #sample2:=0) s
) TWO ON (TWO.entry_num - 1 = ONE.entry_num)
If you have to do this in production on a large table you may want to do it for a subset of your data. For example, you could do it each day for the previous two days' samples. This would be decently efficient, and would also make sure you didn't overlook any missing samples right at midnight. To do this your little rownumbered virtual tables would look like this.
SELECT #sample:=#sample+1 AS entry_num, c.entry_time, c.value
FROM (
SELECT entry_time, value
FROM table
ORDER BY entry_time
WHERE entry_time >= CURRENT_DATE - INTERVAL 2 DAY
AND entry_time < CURRENT_DATE /*yesterday but not today*/
) C,
(SELECT #sample:=0) s
A very efficient way to do this is with a stored procedure using cursors. I think this is simpler and more efficient than the other answers.
This procedure creates a cursor and iterates it through the datetime records that you are checking. If there is ever a gap of more than what you specify, it will write the gap's begin and end to a table.
CREATE PROCEDURE findgaps()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE a,b DATETIME;
DECLARE cur CURSOR FOR SELECT dateTimeCol FROM targetTable
ORDER BY dateTimeCol ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
FETCH cur INTO a;
read_loop: LOOP
SET b = a;
FETCH cur INTO a;
IF done THEN
LEAVE read_loop;
END IF;
IF DATEDIFF(a,b) > [range you specify] THEN
INSERT INTO tmp_table (gap_begin, gap_end)
VALUES (a,b);
END IF;
END LOOP;
CLOSE cur;
END;
In this case it is assumed that 'tmp_table' exists. You could easily define this as a TEMPORARY table in the procedure, but I left it out of this example.
I'm trying this on MariaDB 10.3.27 so this procedure may not work, but I'm getting an error creating the procedure and I can't figure out why! I have a table called electric_use with a field Intervaldatetime DATETIME that I want to find gaps in. I created a target table electric_use_gaps with fields of gap_begin datetime and gap_end datetime
The data are taken every hour and I want to know if I'm missing even an hour's worth of data across 5 years.
DELIMITER $$
CREATE PROCEDURE findgaps()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE a,b DATETIME;
DECLARE cur CURSOR FOR SELECT Intervaldatetime FROM electric_use
ORDER BY Intervaldatetime ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
FETCH cur INTO a;
read_loop: LOOP
SET b = a;
FETCH cur INTO a;
IF done THEN
LEAVE read_loop;
END IF;
IF TIMESTAMPDIFF(MINUTE,a,b) > [60] THEN
INSERT INTO electric_use_gaps(gap_begin, gap_end)
VALUES (a,b);
END IF;
END LOOP;
CLOSE cur;
END&&
DELIMITER ;
This is the error:
Query: CREATE PROCEDURE findgaps() BEGIN DECLARE done INT DEFAULT FALSE; DECLARE a,b DATETIME; DECLARE cur CURSOR FOR SELECT Intervalda...
Error Code: 1064
You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '[60] THEN
INSERT INTO electric_use_gaps(gap_begin, gap_end)
...' at line 16

Mysql Loop through selected column and execute Insert query in the loop

I'm trying to loop over selected slugs and execute little complicated INSERT INTO SELECT query.
slugs[iteration] usage is not a correct mysql syntax. But I have to access fetched slugs one by one inside the query. How Could I achieve that ?
DELIMITER $$
CREATE PROCEDURE create_sitemap_from_slugs()
BEGIN
SELECT `slug` INTO slugs FROM slug_table;
SELECT COUNT(*) INTO count FROM slug_table;
SET iteration = 0;
START TRANSACTION;
WHILE iteration < count DO
INSERT INTO line_combinations
SELECT REPLACE(`line`, '{a}', slugs[iteration]) AS `line`
FROM line_combinations
WHERE `line` LIKE CONCAT('%/', '{a}', '%');
SET iteration = iteration + 1;
END WHILE;
COMMIT;
END
$$
DELIMITER ;
Btw, I don't want to use any external programming language to make this, this procedure will be working for billions of rows. I read Loops in SQL is not a good way due to performance concerns.
If you suggest another way I would accept this also.
I asked another detailed question but couldn't get an answer. if you would like to check that also : https://stackoverflow.com/questions/35320494/fetch-placeholders-from-table-and-place-into-generated-line-combination-pattern
So for each line with {a} you need to insert COUNT(*) from slug_table times values filled with slug value.
It seems you can do that just in one INSERT from SELECT
INSERT INTO line_combinations
(SELECT REPLACE(lc.line, '{a}', st.slug) AS `line`
FROM line_combinations lc, slug_table st
WHERE lc.line LIKE CONCAT('%/', '{a}', '%');
UPDATE:
You can create a temp table line_combinations2 and insert all the records
FROM line_combinations
WHERE line LIKE CONCAT('%/', '{a}', '%')
into the temp table. Then just use the temp table in the INSERT instead of original one