Performance Issue On MySQL DataBase - mysql

I have table contains around 14 million records, and I have multiple SP's contain Dynamic SQL, and these SP's contain multiple parameters,and I build Indexes on my table, but the problem is I have a performance Issue, I tried to get the Query from Dynamic SQL and run it, but this query takes between 30 Seconds to 1 minute, my query contains just select from table and some queries contain join with another table with numeric values in where statement and grouping and order by.
I checked status result, I found the grouping by takes all time, and I checked Explain result, It's using right index.
So what I should doing to enhance my queries performance.
Thanks for your cooperation.
-- EDIT, Added queries directly into question instead of comment.
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange,
cast(SUM(column2) as SIGNED) AS Alias1
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY
MONTH(column1)
ORDER BY
Alias1 ASC
LIMIT 0, 10;
and this one:
SELECT
cast(column1 as char(30)) AS DateRange,
cast(SUM(column2) as SIGNED)
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110102)
GROUP BY
column1
ORDER BY
Alias1 ASC
LIMIT 0, 10;

For this query:
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange <<--error? never mind
, cast(SUM(column2) as SIGNED)
FROM Table1
INNER JOIN Table2 DD ON Table1.Date = Table2.Date
WHERE Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY MONTH(column1) <<-- problem 1.
ORDER BY column2 ASC <<-- problem 2.
LIMIT 0, 10;
If you group by a function MySQL cannot use an index. You can speed this up by adding an extra column YearMonth to the table1 that contains the year+month, put an index on that and then group by yearmonth.
The order by does not make sense. You are adding column2, ordering by that column serves no purpose. If you order by yearmonth asc the query will run much faster and make more sense.

Related

order by with union in SQL is not working

Is it possible to order when the data comes from many select and union it together? Such as
In this statement, the vouchers data is not showing in the same sequence as I saved on the database, I also tried it with "ORDER BY v_payments.payment_id ASC" but won't be worked
( SELECT order_id as id, order_date as date, ... , time FROM orders WHERE client_code = '$searchId' AND order_status = 1 AND order_date BETWEEN '$start_date' AND '$end_date' ORDER BY time)
UNION
( SELECT vouchers.voucher_id as id, vouchers.payment_date as date, v_payments.account_name as name, ac_balance as oldBalance, v_payments.debit as debitAmount, v_payments.description as descriptions,
vouchers.v_no as v_no, vouchers.v_type as v_type, v_payments.credit as creditAmount, time, zero as tax, zero as freightAmount FROM vouchers INNER JOIN v_payments
ON vouchers.voucher_id = v_payments.voucher_id WHERE v_payments.client_code = '$searchId' AND voucher_status = 1 AND vouchers.payment_date BETWEEN '$start_date' AND '$end_date' ORDER BY v_payments.payment_id ASC , time )
UNION
( SELECT return_id as id, return_date as date, ... , time FROM w_return WHERE client_code = '$searchId' AND w_return_status = 1 AND return_date BETWEEN '$start_date' AND '$end_date' ORDER BY time)
Wrap the sub-select queries in the union within a SELECT
SELECT id, name
FROM
(
SELECT id, name FROM fruits
UNION
SELECT id, name FROM vegetables
)
foods
ORDER BY name
If you want the order to only apply to one of the sub-selects, use parentheses as you are doing.
Note that depending on your DB, the syntax may differ here. And if that's the case, you may get better help by specifying what DB server (MySQL, SQL Server, etc.) you are using and any error messages that result.
You need to put the ORDER BY at the end of the statement i.e. you are ordering the final resultset after union-ing the 3 intermediate resultsets
To use an ORDER BY or LIMIT clause to sort or limit the entire UNION result, parenthesize the individual SELECT statements and place the ORDER BY or LIMIT after the last one. See link below:
ORDER BY and LIMIT in Unions
(SELECT a FROM t1 WHERE a=10 AND B=1)
UNION
(SELECT a FROM t2 WHERE a=11 AND B=2)
ORDER BY a LIMIT 10;

Mysql 5.7: to retun ids that do not exist [duplicate]

I have this table in MySQL, for example:
ID | Name
1 | Bob
4 | Adam
6 | Someguy
If you notice, there is no ID number (2, 3 and 5).
How can I write a query so that MySQL would answer the missing IDs only, in this case: "2,3,5" ?
SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM testtable AS a, testtable AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)
Hope this link also helps
http://www.codediesel.com/mysql/sequence-gaps-in-mysql/
A more efficent query:
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM my_table t3 WHERE t3.id > t1.id) as gap_ends_at
FROM my_table t1
WHERE NOT EXISTS (SELECT t2.id FROM my_table t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
Rather than returning multiple ranges of IDs, if you instead want to retrieve every single missing ID itself, each one on its own row, you could do the following:
SELECT id+1 FROM table WHERE id NOT IN (SELECT id-1 FROM table) ORDER BY 1
The query is very efficient. However, it also includes one extra row on the end, which is equal to the highest ID number, plus 1. This last row can be ignored in your server script, by checking for the number of rows returned (mysqli_num_rows), and then using a for loop if the number of rows is greater than 1 (the query will always return at least one row).
Edit:
I recently discovered that my original solution did not return all ID numbers that are missing, in cases where missing numbers are contiguous (i.e. right next to each other). However, the query is still useful in working out whether or not there are numbers missing at all, very quickly, and would be a time saver when used in conjunction with hagensoft's query (top answer). In other words, this query could be run first to test for missing IDs. If anything is found, then hagensoft's query could be run immediately afterwards to help identify the exact IDs that are missing (no time saved, but not much slower at all). If nothing is found, then a considerable amount of time is potentially saved, as hagensoft's query would not need to be run.
To add a little to Ivan's answer, this version shows numbers missing at the beginning if 1 doesn't exist:
SELECT 1 as gap_starts_at,
(SELECT MIN(t4.id) -1 FROM testtable t4 WHERE t4.id > 1) as gap_ends_at
FROM testtable t5
WHERE NOT EXISTS (SELECT t6.id FROM testtable t6 WHERE t6.id = 1)
HAVING gap_ends_at IS NOT NULL limit 1
UNION
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM testtable t3 WHERE t3.id > t1.id) as gap_ends_at
FROM testtable t1
WHERE NOT EXISTS (SELECT t2.id FROM testtable t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL;
It would be far more efficient to get the start of the gap in one query and the end of the gap in one query.
I had 18M records and it took me less than a second each to get the two results. When I tried getting them together my query timed out after an hour.
Get the start of gap:
SELECT (t1.id + 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id + 1);
Get the end of gap:
SELECT (t1.id - 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id - 1);
Above queries will give two columns so you can try this to get the missing numbers in a single column
select start from
(SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) b
UNION
select c.end from (SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) c order by start;
By using window functions (available in mysql 8)
finding the gaps in the id column can be expressed as:
WITH gaps AS
(
SELECT
LAG(id, 1, 0) OVER(ORDER BY id) AS gap_begin,
id AS gap_end,
id - LAG(id, 1, 0) OVER(ORDER BY id) AS gap
FROM test
)
SELECT
gap_begin,
gap_end
FROM gaps
WHERE gap > 1
;
if you are on the older version of the mysql you would have to rely on the variables (so called poor-man's window function idiom)
SELECT
gap_begin,
gap_end
FROM (
SELECT
#id_previous AS gap_begin,
id AS gap_end,
id - #id_previous AS gap,
#id_previous := id
FROM (
SELECT
t.id
FROM test t
ORDER BY t.id
) AS sorted
JOIN (
SELECT
#id_previous := 0
) AS init_vars
) AS gaps
WHERE gap > 1
;
if you want a lighter way to search millions of rows of data,
SET #st=0,#diffSt=0,#diffEnd=0;
SELECT res.startID, res.endID, res.diff
, CONCAT(
"SELECT * FROM lost_consumer WHERE ID BETWEEN "
,res.startID+1, " AND ", res.endID-1) as `query`
FROM (
SELECT
#diffSt:=(#st) `startID`
, #diffEnd:=(a.ID) `endID`
, #st:=a.ID `end`
, #diffEnd-#diffSt-1 `diff`
FROM consumer a
ORDER BY a.ID
) res
WHERE res.diff>0;
check out this http://sqlfiddle.com/#!9/3ea00c/9

mysql - long running query without proper indexes

this query runs about 15 hours in production, I am looking for alternatives to improvements to this,
some improvements those are I think may help are commented in here:
SELECT table1.*
FROM table1
WHERE UPPER(LEFT(table1.cloumn1, 1)) IN ('A', 'B')
AND table1.cloumn2 = 'N' /* add composite index for cloumn2,
column3 */
AND table1.cloumn3 != 'Y'
AND table1.id IN (
SELECT MAX(id)
FROM table1
GROUP BY column5,column6
) /* move this clause to 2nd after
where */
AND table1.column4 IN (
SELECT column1
FROM table2
WHERE column2 IN ('VALUE1', 'VALUE2')
AND (SUBSTRING(column3,6,1) = 'Y'
OR SUBSTRING(column3,25,1) = 'Y')
) /* move this clause to 1st after
where */
AND (table1.column5,table1.column6) NOT IN (
SELECT column1, column2
FROM table3
WHERE table3.column3 IN ('A', 'B')/* add index for this column*/
)
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE,
INTERVAL - 28 DAY)) /* need index ON this col? */ ;
Any comments/suggestions are appreciated.
Update: with only updating filtering order, Query performance was improved to ~ 28 secs, will update here after adding some indexes and replacing some subqueries to joins
Assuming you can add useful indexes (which will help on some of your checks), then maybe try and exclude rows as early as possible.
I suspect you have quite a few rows on table1 for each column5 / column6 combination. If you can get just the latest of each of these (ie, using a sub query that you join) as early as possible then you can exclude most rows from table1 before you need to check any of the non indexed WHERE clauses. You can also exclude some of these by doing a further join against a sub query on table3.
Not tested, but if my assumptions about your database structure are correct then this might be an improvement:-
SELECT table1.*
FROM
(
SELECT MAX(table1.id) AS max_id
FROM table1
INNER JOIN
(
SELECT DISTINCT column1, column2
FROM table3
WHERE table3.column3 IN ('A', 'B')
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE, INTERVAL - 28 DAY)
) sub0_0
ON table1.column5 = sub0_0.column1
AND table1.column6 = sub0_0.column2
WHERE (table1.cloumn1 LIKE 'A%' OR table1.cloumn1 LIKE 'B%')
AND table1.cloumn2 = 'N'
AND table1.cloumn3 != 'Y'
GROUP BY table1.column5,
table1.column6
) sub0
INNER JOIN table1
ON table1.id = sub0.max_id
INNER JOIN
(
SELECT DISTINCT column1
FROM table2
WHERE column2 IN ('VALUE1', 'VALUE2')
AND (SUBSTRING(column3,6,1) = 'Y'
OR SUBSTRING(column3,25,1) = 'Y')
) sub1
ON table1.column4 = sub1.column1
(It might help to see SHOW CREATE TABLE.)
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE,
INTERVAL - 28 DAY))
cannot use an index; this might be equivalent:
AND timstampColumn > CURRENT_DATE - INTERVAL 28 DAY
Please provide EXPLAIN.
What version are you using?
It might (version dependent) help to turn the IN ( SELECT ... ) clauses into 'derived' tables:
JOIN ( SELECT ... ) ON ...
WHERE (x,y) IN ... is not well optimized. What types of values are they?
With a *_ci collation,
UPPER(LEFT(table1.cloumn1, 1)) IN ('A', 'B')
could be done:
LEFT(table1.cloumn1, 1) IN ('A', 'B')
That won't help performance noticeably. It would be better not to have to break apart columns for testing.
This might use an index involving cloumn1:
table1.cloumn1 >= 'A'
AND table1.cloumn1 < 'C'
The order of things AND'd together rarely matters. The order in the INDEX can make a big difference.

Finding intersection of two select query

I need to find intersection between the following queries in MYSQL
SELECT *
FROM project.backup_table
where project.backup_table.date <= (SELECT date FROM project.main_inout_table ORDER BY date desc LIMIT 1)
and project.backup_table.date >= (SELECT date FROM project.main_inout_table ORDER BY date asc LIMIT 1)
SELECT *
FROM project.backup_table
WHERE concat(empid,date) not IN (SELECT concat(empid,date) FROM project.main_inout_table
The tables are:
maintable
backuptable
My atttempt:
SELECT * FROM project.backup_table
where project.backup_table.date <= (SELECT date FROM project.main_inout_table
ORDER BY date desc LIMIT 1) and project.backup_table.date >= (SELECT date FROM project.main_inout_table
ORDER BY date asc LIMIT 1) and exists (SELECT * FROM project.backup_table
WHERE concat(empid,date) not IN (SELECT concat(empid,date)
FROM project.main_inout_table));
Problem: the details of tid 4 is present shouldn't it be filter out by second select query ?
The intersection would be the rows that meet both conditions. So, just bring the conditions together:
SELECT bt.*
FROM project.backup_table bt
WHERE bt.date <= (SELECT MAX(date) FROM project.main_inout_table mit) AND
bt.date >= (SELECT MIN(date) FROM project.main_inout_table mit) AND
NOT EXISTS (SELECT 1
FROM project.main_inout_table mit
WHERE mit.empid = bt.empid AND mit.date = bt.date
);
Note the following changes:
The tables are given aliases, which are abbreviations for the table names.
The columns are all qualified with the table aliases.
The first two subqueries simply use MIN() and MAX(). These could be combined into one subquery or join, but this follows your original formulation.
The last subquery uses EXISTS rather than CONCAT(). Actually, this could also use IN with tuples (something that MySQL supports, but not all databases).

MySQL query is slow - difference in successive dates at the group level

Below is my MySQL query to find the difference between successive date for each account and then using the results to prepare a frequency count table. This query is of course very slow but before that am I doing the right thing? Please help if you can. Also embedded is a small data sample.
Appreciate your time.
OZooHA
ID DATE
403 2008-06-01
403 2012-06-01
403 2011-06-01
403 2010-06-01
403 2009-06-01
15028 2011-07-01
15028 2010-07-01
15028 2009-07-01
15028 2008-07-01
SELECT
month_diff,
count(*)
FROM
(SELECT t1.id,
t1.date,
MIN(t2.date) AS lag_date,
TIMESTAMPDIFF(MONTH, t1.date, MIN(t2.date)) AS month_diff
FROM tbl_name T1
INNER JOIN tbl_name T2
ON t1.id = t2.id
AND t2.date > t1.date
GROUP BY t1.id, t1.date
ORDER BY t1.id, t1.date
)
GROUP BY month_diff
ORDER BY month_diff
Likely, materializing the inline view is taking most of the time. Ensure you have suitable indexes available to improve performance of the join operation; a covering index ON tbl_name (id, date) would likely be optimal for this query.
With a suitable index available (as above) it may be possible to get better performance with a query something like this:
SELECT d.month_diff
, COUNT(*)
FROM ( SELECT IF(#prev_id = t.id
, TIMESTAMPDIFF(MONTH, t.date, #prev_date )
, NULL
) AS month_diff
, #prev_date := t.date
, #prev_id := t.id
FROM tbl_name t
CROSS
JOIN (SELECT #prev_date := NULL, #prev_id := NULL) i
GROUP BY t.id DESC, t.date DESC
) d
WHERE d.month_diff IS NOT NULL
GROUP BY d.month_diff
Note that the usage of MySQL user-defined variables is not guaranteed. But we do observe consistent behavior with queries written in a particular way. (Future versions of MySQL may change the behavior we observe.)
EDIT: I modified the query above, to replace the ORDER BY t.id, t.date with a GROUP BY t.id, t.date... It's not clear from the example data whether (id,date) is guaranteed to be unique. (If we do have that guarantee, then we don't need the GROUP BY, we can just use ORDER BY. Otherwise, we need the GROUP BY to get the same result returned by the original query.)