How to avoid spending time in creating tmp table in MySQL - mysql

I have the following MYSQL query.
SELECT
COUNT(analyzer_host.server) AS count,
analyzer_host.server AS server
FROM
analyzer_host,
analyzer_url,
analyzer_code
WHERE
analyzer_host.server IS NOT NULL
AND analyzer_host.server != ''
AND analyzer_code.account_id = 33
AND analyzer_code.id = analyzer_url.url_id
AND analyzer_url.id = analyzer_host.url_id
GROUP BY analyzer_host.server;
I did some profiling on this query and this is stuck in "Copying to tmp table" . Is there a way I can avoid that. Also any pointers in what is causing the query to create tmp tables.

First
SELECT COUNT(host.server) AS count, host.server AS server
FROM host
JOIN url ON url.id = host.url_id
JOIN code ON code.id = url.url_id
WHERE host.server IS NOT NULL
AND host.server != ''
AND code.account_id = 33
GROUP BY host.server;
That gets rid of analyzer_ clutter and use JOIN...ON syntax.
Second, it seems that the JOINs are not quite right -- is there both an id and a url_id in url? Is the url_id different between host and url?
Does code have PRIMARY KEY(account_id)? That is where the optimizer would like to start.
Please provide EXPLAIN SELECT ... so we can see if it is doing any table scans. If it is, then that is the problem, not the "tmp table".
Please provide SHOW CREATE TABLE for all three tables if you need further discussion.

Related

What is the best solution for adding INDEX to speed up the query?

Now I have a Query that runs 50 minutes on Mysql database and I can't accept that...
I want this process can running under 15 minutes....
insert into appianData.IC_DeletedRecords
(sourcetableid,
concatkey,
sourcetablecode)
select mstid,
concatkey,
'icmstlocationheader' as sourcetablecode
from appianData.IC_MST_LocationHeader
where concatkey not in(select concatkey
from appianData.IC_PURGE_LocationHeader)
The "sourcetableid" and "mstid" are unique.
So what is the best way to add INDEX or optimize on this?
Thank you
I would write the select as:
select mstid, concatkey, 'icmstlocationheader' as sourcetablecode
from appianData.IC_MST_LocationHeader lh
where not exists (select 1
from appianData.IC_PURGE_LocationHeader lhp
where lhp.concatkey = lh.concatkey
);
Then you want an index on IC_PURGE_LocationHeader(concatkey).
Since it is a NOT IN condition you should be able to use a "LEFT JOIN ... WHERE rightTable has no match" without concern for multiple matches inflating the results.
INSERT INTO appianData.IC_DeletedRecords (sourcetableid, concatkey, sourcetablecode)
SELECT m.mstid, m.concatkey, 'icmstlocationheader' as sourcetablecode
FROM appianData.IC_MST_LocationHeader AS m
LEFT JOIN appianData.IC_PURGE_LocationHeader AS p
ON m.concatkey = p.concatkey
WHERE p.concatkey IS NULL
;
With this version query, or the one you presented in the question, indexes on concatkey in both source tables should help significantly.

mysql Query performance is low

I have a query which is running for around 2 hours in last few days. But
before that it took only 2 to 3 minutes of time. i could not able to find
the reason for its sudden slowness. Can any one help me on this?
Please find the below query explain plan[![enter image description here][1]]
[1]...
select
IFNULL(EMAIL,'') as EMAIL,
IFNULL(SITE_CD,'') as SITE_CD,
IFNULL(OPT_TYPE_CD,'') as OPT_TYPE_CD,
IFNULL(OPT_IN_IND,'') as OPT_IN_IND,
IFNULL(EVENT_TSP,'') as EVENT_TSP,
IFNULL(APPLICATION,'') as APPLICATION
from (
SELECT newsletter_entry.email email,
newsletter.site_cd site_cd,
REPLACE (newsletter.TYPE, 'OPTIN_','') opt_type_cd,
CASE
WHEN newsletter_event_temp.post_status = 'SUBSCRIBED' THEN 'Y'
WHEN newsletter_event_temp.post_status = 'UNSUBSCRIBED' THEN
'N'
ELSE ''
END
opt_in_ind,
newsletter_event_temp.event_date event_tsp,
entry_context.application application
FROM amg_toolkit.newsletter_entry,
amg_toolkit.newsletter,
(select NEWSLETTER_EVENT.* from amg_toolkit.NEWSLETTER_EVENT,
amg_toolkit.entry_context where newsletter_event.EVENT_DATE >= '2017-07-11
00:01:23' AND newsletter_event.EVENT_DATE < '2017-07-11 01:01:23' and
newsletter_event.ENTRY_CONTEXT_ID = entry_context.ENTRY_CONTEXT_ID and
entry_context.APPLICATION != 'feedbackloop') newsletter_event_temp,
amg_toolkit.entry_context
WHERE newsletter_entry.newsletter_id = newsletter.newsletter_id
AND newsletter_entry.newsletter_entry_id =
newsletter_event_temp.newsletter_entry_id
AND newsletter.TYPE IN ('OPTIN_PRIM', 'OPTIN_THRD', 'OPTIN_WRLS')
AND newsletter_event_temp.entry_context_id NOT IN
(select d.ENTRY_CONTEXT_ID from amg_toolkit.sweepstake a,
amg_toolkit.sweepstake_entry b, amg_toolkit.user_entry c,
amg_toolkit.entry_context d where a.exclude_data = 'Y' and
a.sweepstake_id=b.sweepstake_id and b.USER_ENTRY_ID=c.USER_ENTRY_ID and
c.ENTRY_CONTEXT_ID = d.ENTRY_CONTEXT_ID)
AND newsletter_event_temp.entry_context_id =
entry_context.entry_context_id
AND newsletter_event_temp.event_date >= '2017-07-11 00:01:23'
AND newsletter_event_temp.event_date < '2017-07-11 01:01:23') a;`
[1]: https://i.stack.imgur.com/cgsS1.png
dont use .*
select only the columns of data you are using in your query.
Avoid nested sub selects if you dont need them.
I don't see a need for them in this query. You query the data 3 times this way instead of just once.
Slowness can be explained by an inefficient query haveing to deal with tables that have a growing number of records.
"Not in" is resource intensive. Can you do that in a better way avoiding "not in" logic?
JOINs are usually faster than subqueries. NOT IN ( SELECT ... ) can usually be turned into LEFT JOIN ... WHERE id IS NULL.
What is the a in a.exclude_data? Looks like a syntax error.
These indexes are likely to help:
newsletter_event: INDEX(ENTRY_CONTEXT_ID, EVENT_DATE) -- in this order
You also need it for newsetter_event_temp, but since that is not possible, something has to give. What version of MySQL are you running? Perhaps you could actually CREATE TEMPORARY TABLE and ADD INDEX.

MySQL Insert from results of left & Right Join results in memory issue

I am having issues running a insert query on 2 large tables. One table is 67,000,000 and the other is 100,000. I am trying to do a LEFT and RIGHT Join on the 2 tables and put the results into another table. The query runs perfect on smaller tables under 1M entries. but when getting to the higher entries it bombs out. I get this error :
Incorrect key file for table 'C:\Windows\TEMP\#sql3838_2_6.MYI'; try to repair it
After reading the solutions online they say to increase the memory utilized by mysql and it's keys for indexing. I have tried this and I am still getting the same issue. I am not sure at this point if it's a bad configuration for mysql or a bar written query.
So I am really looking for a solution of optimizing my query so that it is more memory efficient or a change to my.config to handle the query. Or Splitting the query into 2 different inserts??? would that help?
MySQL Query
INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File)
SELECT C.A__Doc_ID, C.A_File, C.A_Table, C.B_File
FROM( SELECT A._Doc_ID AS A__Doc_ID, A.File AS A_File, A.Table AS A_Table, B.File AS B_File
FROM schema.Temp_Entries A
LEFT JOIN schema.temp_dir_scan B ON A.File = B.File
UNION SELECT A._Doc_ID as A__Doc_ID, A.File AS A_File, A.Table AS A_Table, B.File AS B_File
FROM schema.Temp_Entries A
RIGHT JOIN schema.temp_dir_scan B ON A.File = B.File) C
WHERE C.A_File IS NULL OR C.B_File IS NULL
Here is my.config for MySql
default-storage-engine=INNODB
max_connections=800
query_cache_size=186M
table_cache=1520
tmp_table_size=900M
thread_cache_size=38
myisam_max_sort_file_size=100G
myisam_sort_buffer_size=268M
key_buffer_size=1160M
read_buffer_size=128K
read_rnd_buffer_size=512K
sort_buffer_size=512K
innodb_additional_mem_pool_size=96M
innodb_buffer_pool_size=563M
My System
16 Gigs of Mem
52 Gigs of Free disk space.
The error message usually results from low disc space, but since 52gigs should be enough (and I assume your filesystem can handle >2gb files) it might be something different.
The following 2 things should work to limit the required temporary space:
You should create an index for temp_dir_scan.File and Temp_Entries.File.
You should use "union all" instead of "union" (or, as you proposed, split the query).
And you could rewrite your code (still, create the index, please):
INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File)
SELECT A._Doc_ID, A.File, A.Table, null
FROM schema.Temp_Entries A
where not exists (select 1 from schema.temp_dir_scan B where A.File = B.File)
-- or a.file is null -- you might need that if a.file can be null
INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File)
select null, null, null, B.File
from schema.temp_dir_scan B
where not exists (select 1 from schema.Temp_Entries A where A.File = B.File)
Since UNION has a built-in distinct (though I am not sure if you are aware of that), you might want to use select distinct A._Doc_ID ..., but if you don't really need it, don't!

SQL statement hanging up in MySQL database

I am needing some SQL help. I have a SELECT statement that references several tables and is hanging up in the MySQL database. I would like to know if there is a better way to write this statement so that it runs efficiently and does not hang up the DB? Any help/direction would be appreciated. Thanks.
Here is the code:
Select Max(b.BurID) As BurID
From My.AppTable a,
My.AddressTable c,
My.BurTable b
Where a.AppID = c.AppID
And c.AppID = b.AppID
And (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
There is NO mysql error in the log. Sorry.
Pro tip: use explicit JOINs rather than a comma-separated list of tables. It's easier to see the logic you're using to JOIN that way. Rewriting your query to do that gives us this.
select Max(b.BurID) As BurID
From My.AppTable AS a
JOIN My.AddressTable AS c ON a.AppID = c.AppID
JOIN My.BurTable AS b ON c.AppID = b.AppID
WHERE (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
Next pro tip: Don't use functions (like DateDiff()) in WHERE clauses, because they defeat using indexes to search. That means you should change the last line of your query to
AND a.ApplicationDate >= CurDate() - INTERVAL 30 DAY
This has the same logic as in your query, but it leaves a naked (and therefore index-searchable) column name in the search expression.
Next, we need to look at your columns to see how you are searching, and cook up appropriate indexes.
Let's start with AppTable. You're screening by specific values of Forename, Surname, and DOB. You're screening by a range of ApplicationDate values. Finally you need AppID to manage your join. So, this compound index should help. Its columns are in the correct order to use a range scan to satisfy your query, and contains the needed results.
CREATE INDEX search1 USING BTREE
ON AppTable
(Forename, Surname, DOB, ApplicationDate, AppID)
Next, we can look at your AddressTable. Similar logic applies. You'll enter this table via the JOINed AppID, and then screen by specific values of three columns. So, try this index
CREATE INDEX search2 USING BTREE
ON AddressTable
(AppID, PostcodeAnywherePostcode, PostcodeAnywhereBuildingNumber, isPrimary)
Finally, we're on to your BurTable. Use similar logic as the other two, and try this index.
CREATE INDEX search3 USING BTREE
ON BurTable
(AppID, ErrorInd, BurID)
This kind of index is called a compound covering index, and can vastly speed up the sort of summary query you have asked about.

How to avoid filesort for that mysql query?

I'm using this kind of queries with different parameters :
EXPLAIN SELECT SQL_NO_CACHE `ilan_genel`.`id` , `ilan_genel`.`durum` , `ilan_genel`.`kategori` , `ilan_genel`.`tip` , `ilan_genel`.`ozellik` , `ilan_genel`.`m2` , `ilan_genel`.`fiyat` , `ilan_genel`.`baslik` , `ilan_genel`.`ilce` , `ilan_genel`.`parabirimi` , `ilan_genel`.`tarih` , `kgsim_mahalleler`.`isim` AS mahalle, `kgsim_ilceler`.`isim` AS ilce, (
SELECT `ilanresimler`.`resimlink`
FROM `ilanresimler`
WHERE `ilanresimler`.`ilanid` = `ilan_genel`.`id`
LIMIT 1
) AS resim
FROM (
`ilan_genel`
)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `kgsim_mahalleler` ON `kgsim_mahalleler`.`id` = `ilan_genel`.`mahalle`
WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
ORDER BY `ilan_genel`.`id` DESC
LIMIT 225 , 15
and this is what i get in explain section:
these are the indexes that i already tried to use:
any help will be deeply appreciated what kind of index will be the best option or should i use another table structure ?
You should first simplify your query to understand your problem better. As it appears your problem is constrained to the ilan_gen1 table, the following query would also show you the same symptoms.:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
So the first thing to do is check that this is the case. If so, the simpler question is simply why does this query require a file sort on 3661 rows. Now the 'hepsi' index sort order is:
ilce->mahelle->durum->kategori->tip->ozelik
I've written it that way to emphasise that it is first sorted on 'ilce', then 'mahelle', then 'durum', etc. Note that your query does not specify the 'mahelle' value. So the best the index can do is lookup on 'ilce'. Now I don't know the heuristics of your data, but the next logical step in debugging this would be:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'`
Does this return 3661 rows?
If so, you should be able to see what is happening. The database is using the hepsi index, to the best of it's ability, getting 3661 rows back then sorting those rows in order to eliminate values according to the other criteria (i.e. 'durum', 'kategori', 'tip').
The key point here is that if data is sorted by A, B, C in that order and B is not specified, then the best logical thing that can be done is: first a look up on A then a filter on the remaining values against C. In this case, that filter is performed via a file sort.
Possible solutions
Supply 'mahelle' (B) in your query.
Add a new index on 'ilan_gene1' that doesn't require 'mahelle', i.e. A->C->D...
Another tip
In case I have misdiagnosed your problem (easy to do when I don't have your system to test against), the important thing here is the approach to solving the problem. In particular, how to break a complicated query into a simpler query that produces the same behaviour, until you get to a very simple SELECT statement that demonstrates the problem. At this point, the answer is usually much clearer.