this query runs about 15 hours in production, I am looking for alternatives to improvements to this,
some improvements those are I think may help are commented in here:
SELECT table1.*
FROM table1
WHERE UPPER(LEFT(table1.cloumn1, 1)) IN ('A', 'B')
AND table1.cloumn2 = 'N' /* add composite index for cloumn2,
column3 */
AND table1.cloumn3 != 'Y'
AND table1.id IN (
SELECT MAX(id)
FROM table1
GROUP BY column5,column6
) /* move this clause to 2nd after
where */
AND table1.column4 IN (
SELECT column1
FROM table2
WHERE column2 IN ('VALUE1', 'VALUE2')
AND (SUBSTRING(column3,6,1) = 'Y'
OR SUBSTRING(column3,25,1) = 'Y')
) /* move this clause to 1st after
where */
AND (table1.column5,table1.column6) NOT IN (
SELECT column1, column2
FROM table3
WHERE table3.column3 IN ('A', 'B')/* add index for this column*/
)
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE,
INTERVAL - 28 DAY)) /* need index ON this col? */ ;
Any comments/suggestions are appreciated.
Update: with only updating filtering order, Query performance was improved to ~ 28 secs, will update here after adding some indexes and replacing some subqueries to joins
Assuming you can add useful indexes (which will help on some of your checks), then maybe try and exclude rows as early as possible.
I suspect you have quite a few rows on table1 for each column5 / column6 combination. If you can get just the latest of each of these (ie, using a sub query that you join) as early as possible then you can exclude most rows from table1 before you need to check any of the non indexed WHERE clauses. You can also exclude some of these by doing a further join against a sub query on table3.
Not tested, but if my assumptions about your database structure are correct then this might be an improvement:-
SELECT table1.*
FROM
(
SELECT MAX(table1.id) AS max_id
FROM table1
INNER JOIN
(
SELECT DISTINCT column1, column2
FROM table3
WHERE table3.column3 IN ('A', 'B')
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE, INTERVAL - 28 DAY)
) sub0_0
ON table1.column5 = sub0_0.column1
AND table1.column6 = sub0_0.column2
WHERE (table1.cloumn1 LIKE 'A%' OR table1.cloumn1 LIKE 'B%')
AND table1.cloumn2 = 'N'
AND table1.cloumn3 != 'Y'
GROUP BY table1.column5,
table1.column6
) sub0
INNER JOIN table1
ON table1.id = sub0.max_id
INNER JOIN
(
SELECT DISTINCT column1
FROM table2
WHERE column2 IN ('VALUE1', 'VALUE2')
AND (SUBSTRING(column3,6,1) = 'Y'
OR SUBSTRING(column3,25,1) = 'Y')
) sub1
ON table1.column4 = sub1.column1
(It might help to see SHOW CREATE TABLE.)
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE,
INTERVAL - 28 DAY))
cannot use an index; this might be equivalent:
AND timstampColumn > CURRENT_DATE - INTERVAL 28 DAY
Please provide EXPLAIN.
What version are you using?
It might (version dependent) help to turn the IN ( SELECT ... ) clauses into 'derived' tables:
JOIN ( SELECT ... ) ON ...
WHERE (x,y) IN ... is not well optimized. What types of values are they?
With a *_ci collation,
UPPER(LEFT(table1.cloumn1, 1)) IN ('A', 'B')
could be done:
LEFT(table1.cloumn1, 1) IN ('A', 'B')
That won't help performance noticeably. It would be better not to have to break apart columns for testing.
This might use an index involving cloumn1:
table1.cloumn1 >= 'A'
AND table1.cloumn1 < 'C'
The order of things AND'd together rarely matters. The order in the INDEX can make a big difference.
Related
I have this table in MySQL, for example:
ID | Name
1 | Bob
4 | Adam
6 | Someguy
If you notice, there is no ID number (2, 3 and 5).
How can I write a query so that MySQL would answer the missing IDs only, in this case: "2,3,5" ?
SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM testtable AS a, testtable AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)
Hope this link also helps
http://www.codediesel.com/mysql/sequence-gaps-in-mysql/
A more efficent query:
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM my_table t3 WHERE t3.id > t1.id) as gap_ends_at
FROM my_table t1
WHERE NOT EXISTS (SELECT t2.id FROM my_table t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
Rather than returning multiple ranges of IDs, if you instead want to retrieve every single missing ID itself, each one on its own row, you could do the following:
SELECT id+1 FROM table WHERE id NOT IN (SELECT id-1 FROM table) ORDER BY 1
The query is very efficient. However, it also includes one extra row on the end, which is equal to the highest ID number, plus 1. This last row can be ignored in your server script, by checking for the number of rows returned (mysqli_num_rows), and then using a for loop if the number of rows is greater than 1 (the query will always return at least one row).
Edit:
I recently discovered that my original solution did not return all ID numbers that are missing, in cases where missing numbers are contiguous (i.e. right next to each other). However, the query is still useful in working out whether or not there are numbers missing at all, very quickly, and would be a time saver when used in conjunction with hagensoft's query (top answer). In other words, this query could be run first to test for missing IDs. If anything is found, then hagensoft's query could be run immediately afterwards to help identify the exact IDs that are missing (no time saved, but not much slower at all). If nothing is found, then a considerable amount of time is potentially saved, as hagensoft's query would not need to be run.
To add a little to Ivan's answer, this version shows numbers missing at the beginning if 1 doesn't exist:
SELECT 1 as gap_starts_at,
(SELECT MIN(t4.id) -1 FROM testtable t4 WHERE t4.id > 1) as gap_ends_at
FROM testtable t5
WHERE NOT EXISTS (SELECT t6.id FROM testtable t6 WHERE t6.id = 1)
HAVING gap_ends_at IS NOT NULL limit 1
UNION
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM testtable t3 WHERE t3.id > t1.id) as gap_ends_at
FROM testtable t1
WHERE NOT EXISTS (SELECT t2.id FROM testtable t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL;
It would be far more efficient to get the start of the gap in one query and the end of the gap in one query.
I had 18M records and it took me less than a second each to get the two results. When I tried getting them together my query timed out after an hour.
Get the start of gap:
SELECT (t1.id + 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id + 1);
Get the end of gap:
SELECT (t1.id - 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id - 1);
Above queries will give two columns so you can try this to get the missing numbers in a single column
select start from
(SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) b
UNION
select c.end from (SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) c order by start;
By using window functions (available in mysql 8)
finding the gaps in the id column can be expressed as:
WITH gaps AS
(
SELECT
LAG(id, 1, 0) OVER(ORDER BY id) AS gap_begin,
id AS gap_end,
id - LAG(id, 1, 0) OVER(ORDER BY id) AS gap
FROM test
)
SELECT
gap_begin,
gap_end
FROM gaps
WHERE gap > 1
;
if you are on the older version of the mysql you would have to rely on the variables (so called poor-man's window function idiom)
SELECT
gap_begin,
gap_end
FROM (
SELECT
#id_previous AS gap_begin,
id AS gap_end,
id - #id_previous AS gap,
#id_previous := id
FROM (
SELECT
t.id
FROM test t
ORDER BY t.id
) AS sorted
JOIN (
SELECT
#id_previous := 0
) AS init_vars
) AS gaps
WHERE gap > 1
;
if you want a lighter way to search millions of rows of data,
SET #st=0,#diffSt=0,#diffEnd=0;
SELECT res.startID, res.endID, res.diff
, CONCAT(
"SELECT * FROM lost_consumer WHERE ID BETWEEN "
,res.startID+1, " AND ", res.endID-1) as `query`
FROM (
SELECT
#diffSt:=(#st) `startID`
, #diffEnd:=(a.ID) `endID`
, #st:=a.ID `end`
, #diffEnd-#diffSt-1 `diff`
FROM consumer a
ORDER BY a.ID
) res
WHERE res.diff>0;
check out this http://sqlfiddle.com/#!9/3ea00c/9
I want to replace a null value with previous non-null value using MySQL.
I tried this:
SELECT
`Date_Column`
,CASE
WHEN `Value_Column` is null
THEN (
SELECT
`Value_Column`
FROM
table_name t2
WHERE
`Date_Column` = (
SELECT
MAX(`Date_Column`)
FROM
table_name t3
WHERE
`Date_Column` < t1.`Date_Column`
AND `Value_Column` > 0
)
)
ELSE `Value_Column`
END AS `Value_Column`
FROM
table_name t1
This works but takes really long for big datasets.
I tried this for a subset of data and it worked.
Is there an easier/more efficient way to achieve the same?
Thanks.
Based on your query, your first check should be that your take is correctly indexed by the date column (and the value column as a covering index). If it is, then you could use the following slightly simplified query.
Note, this replaces NULL as per your description, where yours replaced 0 contrary to your description, you should be clear as to Exactly which behaviour you want.
SELECT
date_column,
COALESCE(
value_column,
(
SELECT lookup.value_column
FROM table_name AS lookup
WHERE lookup.date_column < table_name.date_column
AND lookup.value_column IS NOT NULL
AND table_name.value_column IS NULL
ORDER BY lookup.date_column DESC
LIMIT 1
)
)
FROM
table_name
(on my phone, so please forgive typos)
You can simplify the query to:
SELECT t1.Date_Column
(CASE WHEN t1.Value_Column = 0
THEN (SELECT t2.Value_Column
FROM table_name t2
WHERE t2.Date_Column < t1.Date_Column AND t2.Value_Column > 0
ORDER BY t2.Date_Column DESC
LIMIT 1
)
ELSE t2.Value_Column
END) AS Value_Column
FROM table_name t1;
This is an improvement on your query because it removes the second nested subquery. But it will still be slow. An index on table_name(Date_Column, Value_Column) might help.
Below is my MySQL query to find the difference between successive date for each account and then using the results to prepare a frequency count table. This query is of course very slow but before that am I doing the right thing? Please help if you can. Also embedded is a small data sample.
Appreciate your time.
OZooHA
ID DATE
403 2008-06-01
403 2012-06-01
403 2011-06-01
403 2010-06-01
403 2009-06-01
15028 2011-07-01
15028 2010-07-01
15028 2009-07-01
15028 2008-07-01
SELECT
month_diff,
count(*)
FROM
(SELECT t1.id,
t1.date,
MIN(t2.date) AS lag_date,
TIMESTAMPDIFF(MONTH, t1.date, MIN(t2.date)) AS month_diff
FROM tbl_name T1
INNER JOIN tbl_name T2
ON t1.id = t2.id
AND t2.date > t1.date
GROUP BY t1.id, t1.date
ORDER BY t1.id, t1.date
)
GROUP BY month_diff
ORDER BY month_diff
Likely, materializing the inline view is taking most of the time. Ensure you have suitable indexes available to improve performance of the join operation; a covering index ON tbl_name (id, date) would likely be optimal for this query.
With a suitable index available (as above) it may be possible to get better performance with a query something like this:
SELECT d.month_diff
, COUNT(*)
FROM ( SELECT IF(#prev_id = t.id
, TIMESTAMPDIFF(MONTH, t.date, #prev_date )
, NULL
) AS month_diff
, #prev_date := t.date
, #prev_id := t.id
FROM tbl_name t
CROSS
JOIN (SELECT #prev_date := NULL, #prev_id := NULL) i
GROUP BY t.id DESC, t.date DESC
) d
WHERE d.month_diff IS NOT NULL
GROUP BY d.month_diff
Note that the usage of MySQL user-defined variables is not guaranteed. But we do observe consistent behavior with queries written in a particular way. (Future versions of MySQL may change the behavior we observe.)
EDIT: I modified the query above, to replace the ORDER BY t.id, t.date with a GROUP BY t.id, t.date... It's not clear from the example data whether (id,date) is guaranteed to be unique. (If we do have that guarantee, then we don't need the GROUP BY, we can just use ORDER BY. Otherwise, we need the GROUP BY to get the same result returned by the original query.)
I have this table in MySQL, for example:
ID | Name
1 | Bob
4 | Adam
6 | Someguy
If you notice, there is no ID number (2, 3 and 5).
How can I write a query so that MySQL would answer the missing IDs only, in this case: "2,3,5" ?
SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM testtable AS a, testtable AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)
Hope this link also helps
http://www.codediesel.com/mysql/sequence-gaps-in-mysql/
A more efficent query:
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM my_table t3 WHERE t3.id > t1.id) as gap_ends_at
FROM my_table t1
WHERE NOT EXISTS (SELECT t2.id FROM my_table t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
Rather than returning multiple ranges of IDs, if you instead want to retrieve every single missing ID itself, each one on its own row, you could do the following:
SELECT id+1 FROM table WHERE id NOT IN (SELECT id-1 FROM table) ORDER BY 1
The query is very efficient. However, it also includes one extra row on the end, which is equal to the highest ID number, plus 1. This last row can be ignored in your server script, by checking for the number of rows returned (mysqli_num_rows), and then using a for loop if the number of rows is greater than 1 (the query will always return at least one row).
Edit:
I recently discovered that my original solution did not return all ID numbers that are missing, in cases where missing numbers are contiguous (i.e. right next to each other). However, the query is still useful in working out whether or not there are numbers missing at all, very quickly, and would be a time saver when used in conjunction with hagensoft's query (top answer). In other words, this query could be run first to test for missing IDs. If anything is found, then hagensoft's query could be run immediately afterwards to help identify the exact IDs that are missing (no time saved, but not much slower at all). If nothing is found, then a considerable amount of time is potentially saved, as hagensoft's query would not need to be run.
To add a little to Ivan's answer, this version shows numbers missing at the beginning if 1 doesn't exist:
SELECT 1 as gap_starts_at,
(SELECT MIN(t4.id) -1 FROM testtable t4 WHERE t4.id > 1) as gap_ends_at
FROM testtable t5
WHERE NOT EXISTS (SELECT t6.id FROM testtable t6 WHERE t6.id = 1)
HAVING gap_ends_at IS NOT NULL limit 1
UNION
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM testtable t3 WHERE t3.id > t1.id) as gap_ends_at
FROM testtable t1
WHERE NOT EXISTS (SELECT t2.id FROM testtable t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL;
It would be far more efficient to get the start of the gap in one query and the end of the gap in one query.
I had 18M records and it took me less than a second each to get the two results. When I tried getting them together my query timed out after an hour.
Get the start of gap:
SELECT (t1.id + 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id + 1);
Get the end of gap:
SELECT (t1.id - 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id - 1);
Above queries will give two columns so you can try this to get the missing numbers in a single column
select start from
(SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) b
UNION
select c.end from (SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) c order by start;
By using window functions (available in mysql 8)
finding the gaps in the id column can be expressed as:
WITH gaps AS
(
SELECT
LAG(id, 1, 0) OVER(ORDER BY id) AS gap_begin,
id AS gap_end,
id - LAG(id, 1, 0) OVER(ORDER BY id) AS gap
FROM test
)
SELECT
gap_begin,
gap_end
FROM gaps
WHERE gap > 1
;
if you are on the older version of the mysql you would have to rely on the variables (so called poor-man's window function idiom)
SELECT
gap_begin,
gap_end
FROM (
SELECT
#id_previous AS gap_begin,
id AS gap_end,
id - #id_previous AS gap,
#id_previous := id
FROM (
SELECT
t.id
FROM test t
ORDER BY t.id
) AS sorted
JOIN (
SELECT
#id_previous := 0
) AS init_vars
) AS gaps
WHERE gap > 1
;
if you want a lighter way to search millions of rows of data,
SET #st=0,#diffSt=0,#diffEnd=0;
SELECT res.startID, res.endID, res.diff
, CONCAT(
"SELECT * FROM lost_consumer WHERE ID BETWEEN "
,res.startID+1, " AND ", res.endID-1) as `query`
FROM (
SELECT
#diffSt:=(#st) `startID`
, #diffEnd:=(a.ID) `endID`
, #st:=a.ID `end`
, #diffEnd-#diffSt-1 `diff`
FROM consumer a
ORDER BY a.ID
) res
WHERE res.diff>0;
check out this http://sqlfiddle.com/#!9/3ea00c/9
I have table contains around 14 million records, and I have multiple SP's contain Dynamic SQL, and these SP's contain multiple parameters,and I build Indexes on my table, but the problem is I have a performance Issue, I tried to get the Query from Dynamic SQL and run it, but this query takes between 30 Seconds to 1 minute, my query contains just select from table and some queries contain join with another table with numeric values in where statement and grouping and order by.
I checked status result, I found the grouping by takes all time, and I checked Explain result, It's using right index.
So what I should doing to enhance my queries performance.
Thanks for your cooperation.
-- EDIT, Added queries directly into question instead of comment.
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange,
cast(SUM(column2) as SIGNED) AS Alias1
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY
MONTH(column1)
ORDER BY
Alias1 ASC
LIMIT 0, 10;
and this one:
SELECT
cast(column1 as char(30)) AS DateRange,
cast(SUM(column2) as SIGNED)
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110102)
GROUP BY
column1
ORDER BY
Alias1 ASC
LIMIT 0, 10;
For this query:
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange <<--error? never mind
, cast(SUM(column2) as SIGNED)
FROM Table1
INNER JOIN Table2 DD ON Table1.Date = Table2.Date
WHERE Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY MONTH(column1) <<-- problem 1.
ORDER BY column2 ASC <<-- problem 2.
LIMIT 0, 10;
If you group by a function MySQL cannot use an index. You can speed this up by adding an extra column YearMonth to the table1 that contains the year+month, put an index on that and then group by yearmonth.
The order by does not make sense. You are adding column2, ordering by that column serves no purpose. If you order by yearmonth asc the query will run much faster and make more sense.