MySQL How to efficiently compare multiple fields between tables? - mysql

So my expertise is not in MySQL so I wrote this query and it is starting to run increasingly slow as in 5 minutes or so with 100k rows in EquipmentData and 30k or so in EquipmentDataStaging (which to me is very little data):
CREATE TEMPORARY TABLE dataCompareTemp
SELECT eds.eds_id FROM equipmentdatastaging eds
INNER JOIN equipment e ON e.e_id_string = eds.eds_e_id_string
INNER JOIN equipmentdata ed ON e.e_id = ed.ed_e_id
AND eds.eds_ed_log_time=ed.ed_log_time
AND eds.eds_ed_unit_type=ed.ed_unit_type
AND eds.eds_ed_value = ed.ed_value
I am using this query to compare data rows pulled from a clients device to current data sitting within their database. From here I take the temp table and use the ID's off it to make conditional decisions. I have the e_id_string indexed and I have e_id indexed and everything else is not. I know that it looks stupid that I have to compare all this information, but the clients system is spitting out redundant data and I am using this query to find it. Any type of help on this would be greatly appreciated whether it be a different approach by SQL or MySql Management. I feel like when I do stuff like this in MSSQL it handles the requests much better, but that is probably because I have something set up incorrectly.

TIPS
index all necessary columns which are using with ON or WHERE condition
here you need to index eds_ed_log_time,eds_e_id_string, eds_ed_unit_type, eds_ed_value,ed_e_id,ed_log_time,ed_unit_type,ed_value
change syntax to SELECT STRAIGHT JOIN ... see more reference

Related

Right way to phrase MySQL query across many (possible empty) tables

I'm trying to do what I think is a set of simple set operations on a database table: several intersections and one union. But I don't seem to be able to express that in a simple way.
I have a MySQL table called Moment, which has many millions of rows. (It happens to be a time-series table but that doesn't impact on my problem here; however, these data have a column 'source' and a column 'time', both indexed.) Queries to pull data out of this table are created dynamically (coming in from an API), and ultimately boil down to a small pile of temporary tables indicating which 'source's we care about, and maybe the 'time' ranges we care about.
Let's say we're looking for
(source in Temp1) AND (
((source in Temp2) AND (time > '2017-01-01')) OR
((source in Temp3) AND (time > '2016-11-15'))
)
Just for excitement, let's say Temp2 is empty --- that part of the API request was valid but happened to include 'no actual sources'.
If I then do
SELECT m.* from Moment as m,Temp1,Temp2,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11'15'))
)
... I get a heaping mound of nothing, because the empty Temp2 gives an empty Cartesian product before we get to the WHERE clause.
Okay, I can do
SELECT m.* from Moment as m
LEFT JOIN Temp1 on m.source=Temp1.source
LEFT JOIN Temp2 on m.source=Temp2.source
LEFT JOIN Temp3 on m.source=Temp3.source
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... but this takes >70ms even on my relatively small development database.
If I manually eliminate the empty table,
SELECT m.* from Moment as m,Temp1,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... it finishes in 10ms. That's the kind of time I'd expect.
I've also tried putting a single unmatchable row in the empty table and doing SELECT DISTINCT, and it splits the difference at ~40ms. Seems an odd solution though.
This really feels like I'm just conceptualizing the query wrong, that I'm asking the database to do more work than it needs to. What is the Right Way to ask the database this question?
Thanks!
--UPDATE--
I did some actual benchmarks on my actual database, and came up with some really unexpected results.
For the scenario above, all tables indexed on the columns being compared, with an empty table,
doing it with left joins took 3.5 minutes (!!!)
doing it without joins (just 'FROM...WHERE') and adding a null row to the empty table, took 3.5 seconds
even more striking, when there wasn't an empty table, but rather ~1000 rows in each of the temporary tables,
doing the whole thing in one query took 28 minutes (!!!!!), but,
doing each of the three AND clauses separately and then doing the final combination in the code took less than a second.
I still feel I'm expressing the query in some foolish way, since again, all I'm trying to do is one set union (OR) and a few set intersections. It really seems like the DB is making this gigantic Cartesian product when it seriously doesn't need to. All in all, as pointed out in the answer below, keeping some of the intelligence up in the code seems to be the better approach here.
There are various ways to tackle the problem. Needless to say it depends on
how many queries are sent to the database,
the amount of data you are processing in a time interval,
how the database backend is configured to manage it.
For your use case, a little more information would be helpful. The optimization of your query by using CASE/COUNT(*) or CASE/LIMIT combinations in queries to sort out empty tables would be one option. However, if-like queries cost more time.
You could split the SQL code to downgrade the scaling of the problem from 1*N^x to y*N^z, where z should be smaller than x.
You said that an API is involved, maybe you are able handle the temporary "no data" tables differently or even don't store them?
Another option would be to enable query caching:
https://dev.mysql.com/doc/refman/5.5/en/query-cache-configuration.html

Performance issues of mysql query as compared to oracle query

I have created a complex view which gives me output within a second on Oracle 10g DBMS.. but the same view takes 2,3 minutes on MYSQL DBMS.. I have created indexes on all the fields which are included in the view definition and also increased the query_cache_size but still failed to get answer in less time. My query is given below
select * from results where beltno<1000;
And my view is:
create view results as select person_biodata.*,current_rank.*,current_posting.* from person_biodata,current_rank,current_posting where person_biodata.belt_no=current_rank.belt_no and person_biodata.belt_no=current_posting.belt_no ;
The current_posting view is defined as follows:
select p.belt_no belt_no,ps.ps_name police_station,pl.pl_name posting_as,p.st_date from p_posting p,post_list pl,police_station ps where p.ps_id=ps.ps_id and p.pl_id=pl.pl_id and (p.belt_no,p.st_date) IN(select belt_no,max(st_date) from p_posting group by belt_no);
The current_rank view is defined as follows:
select p.belt_no belt_no,r.r_name from p_rank p,rank r where p.r_id=r.r_id and (p.belt_no,p.st_date) IN (select belt_no,max(st_date) from p_rank group by belt_no)
Some versions of MySQL have a particular problem with in and subqueries, which you have in this view:
select p.belt_no belt_no,ps.ps_name police_station,pl.pl_name posting_as,p.st_date
from p_posting p,post_list pl,police_station ps
where p.ps_id=ps.ps_id and p.pl_id=pl.pl_id and
(p.belt_no,p.st_date) IN(select belt_no,max(st_date) from p_posting group by belt_no)
Try changing that to:
where exists (select 1
from posting
group by belt_no
having belt_no = p.belt_no and p.st_date = max(st_date)
)
There may be other issues, of course. At the very least, you could format your queries so they are readable and use ANSI standard join syntax. Being able to read the queries would be the first step to improving their performance. Then you should use explain in MySQL to see what the query plans are like.
Muhammad Jawad it's so simple. you have already created indexes on table that allow database application to find data fast, but if you change/update the (indexes tables) table (e.g: inserst,update,delete) then it take more time that of which have no indexes applied on table because the indexes also need updation so each index will be updated that take too much time. So you should apply indexes on columns or tables that we use it only for search purposes only. hope this will help u. thank u.

MySQL CONCAT '*' symbol toasts the database

I have a table used for lookups which stores the human-readable value in one column and a the same text stripped of special characters and spaces in another. e.g., the value "Children's Shows" would appear in the lookup column as "childrens-shows".
Unfortunately the corresponding main table isn't quite that simple - for historical reasons I didn't create myself and now would be difficult to undo, the lookup value is actually stored with surrounding asterisks, e.g. '*childrens-shows*'.
So, while trying to join the lookup table sans-asterisks with the main table that has asterisks, I figured CONCAT would help me add them on-the-fly, e.g.;
SELECT *
FROM main_table m
INNER JOIN lookup_table l
ON l.value = CONCAT('*',m.value,'*')
... and then the table was toast. Not sure if I created an infinite loop or really screwed the data, but it required an ISP backup to get the table responding again. I suspect it's because the '*' symbol is probably reserved, like a wildcard, and I've asked the database to do the equivalent of licking its own elbow. Either way, I'm hesitant to 'experiment' to find the answer given the spectacular way it managed to kill the database.
Thanks in advance to anyone who can (a) tell me what the above actually did to the database, and (b) how I should actually join the tables?
When using CONCAT, mysql won't use the index. Use EXPLAIN to check this, but a recent problem I had was that on a large table, the indexed column was there, but the key was not used. This should not bork the whole table however, just make it slow. Possibly it ran out of memory, started to swap and then crashed halfway, but you'd need to check the logs to find out.
However, the root cause is clearly bad table design and that's where the solution lies. Any answer you get that allows you to work around this can only be temporary at best.
Best solution is to move this data into a separate table. 'Childrens shows' sounds like a category and therefore repeated data in many rows. This should really be an id for a 'categories' table, which would prevent the DB from having to run CONCAT on every single row in the table, as you could do this:
SELECT *
FROM main_table m
INNER JOIN lookup_table l
ON l.value = m.value
/* and optionally */
INNER JOIN categories cat
ON l.value = cat.id
WHERE cat.name = 'whatever'
I know this is not something you may be able to do given the information you supplied in your question, but really the reason for not being able to make such a change to a badly normalised DB is more important than the code here. Without either the resources or political backing to do things the right way, you will end up with even more headaches like this, which will end up costing more in the long term. Time for a word with the boss perhaps :)

How can I improve this multiple join monster query?

For your sakes i'm not gonna paste it all here, just know that it's hella long and involves a lot of joins.
The table structure i'm working on, at its most complicated, looks like this:
tables = Co, Cn, O, A, N;
scheme (simplified):
N(n)-> (1)every other table
A(n)-> (1)(every other table except N
O(n)-> (1)Co, Cn
Cn(n) -> (1)Co
Co(1) -> (1)cn
The most laborious part of the query goes trough the structure like this (i'm trying to get information from table N but also from the other ones):
N join A join O join ((Co join Cn) union (Cn join Co ))
I need to get the contents from N but I also need stuff from Co and Cn but I must go trough A and O to get them. Now, this may not seem like a huge query but N can also relate to every one of those other tables and I need to get them ALL; let's just say I can't see the content from the query if I echo it in the aplication. I join each case separately (N->A, N->O, N->Cn, etc) and then grab them all up with a UNION
At the moment, with almost 200,000 lines in N and a couple hundred to almost a thousand in each of the other tables the database server and the website just hangs when I try to run the query.
One solution I thought of was to ditch the foreign keys and make a field for links that goes, for an example from N trough all the others: "A_IdA,O_IdO,Co_IdCo,Cn_IdCn", and process it in php; but I'd rather try something else before such a drastic change in table design.
I'm having a hard time trying to grasp your exact table structure and query.
However, I feel like you're trying to combine too much into 1 query, causing the joining to be extremely stressful on the DB server. You'd most likely be better of performing multiple simpler queries, then trying to do it all in 1 huge query.

MYSQL Triple Join Performance help, Copying to Tmp Table

I'm working on a query for a news site, which will find FeaturedContent for display on the main homepage. Content marked this way is tagged as 'FeaturedContent', and ordered in a featured table by 'homepage'. I currently have the desired output, but the query runs in over 3 seconds, which I need to cut down on. How does one optimize a query like the one which follows?
EDIT: Materialized the view every minute as suggested, down to .4 seconds:
SELECT f.position, s.item_id, s.item_type, s.title, s.caption, s.date
FROM live.search_all s
INNER JOIN live.tags t
ON s.item_id = t.item_id AND s.item_type = t.item_type AND t.tag = 'FeaturedContent'
LEFT OUTER JOIN live.featured f
ON s.item_id = f.item_id AND s.item_type = f.item_type AND f.feature_type = 'homepage'
ORDER BY position IS NULL, position ASC, date
This returns all the homepage features in order, followed by other featured content ordered by date.
The explain looks like this:
|-id---|-select_type-|-table-|-type---|-possible_keys---------|-key--------|-key_len-|-ref---------------------------------------|-rows--|-Extra-------------------------------------------------------------|
|-1----|-SIMPLE------|-t2----|-ref----|-PRIMARY,tag_index-----|-tag_index--|-303-----|-const-------------------------------------|-2-----|-Using where; Using index; Using temporary; Using filesort;--------|
|-1----|-SIMPLE------|-t-----|-ref----|-PRIMARY---------------|-PRIMARY----|-4-------|-newswires.t2.id---------------------------|-1974--|-Using index-------------------------------------------------------|
|-1----|-SIMPLE------|-s-----|-eq_ref-|-PRIMARY, search_index-|-PRIMARY----|-124-----|-newswires.t.item_id,newswires.t.item_type-|-1-----|-------------------------------------------------------------------|
|-1----|-SIMPLE------|-f-----|-index--|-NULL------------------|-PRIMARY----|-190-----|-NULL--------------------------------------|-13----|-Using index-------------------------------------------------------|
And the Profile is as follows:
|-Status---------------|-Time-----|
|-starting-------------|-0.000091-|
|-Opening tables-------|-0.000756-|
|-System lock----------|-0.000005-|
|-Table lock-----------|-0.000008-|
|-init-----------------|-0.000004-|
|-checking permissions-|-0.000001-|
|-checking permissions-|-0.000001-|
|-checking permissions-|-0.000043-|
|-optimizing-----------|-0.000019-|
|-statistics-----------|-0.000127-|
|-preparing------------|-0.000023-|
|-Creating tmp table---|-0.001802-|
|-executing------------|-0.000001-|
|-Copying to tmp table-|-0.311445-|
|-Sorting result-------|-0.014819-|
|-Sending data---------|-0.000227-|
|-end------------------|-0.000002-|
|-removing tmp table---|-0.002010-|
|-end------------------|-0.000005-|
|-query end------------|-0.000001-|
|-freeing items--------|-0.000296-|
|-logging slow query---|-0.000001-|
|-cleaning up----------|-0.000007-|
I'm new to reading the EXPLAIN output, so I'm unsure if I have a better ordering available, or anything rather simple that could be done to speed these along.
The search_all table is the materialized view table which is periodically updated, while the tags and featured tables are views. These views are not optional, and cannot be worked around.
The tags view combines tags and a relational table to get back a listing of tags according to item_type and item_id, but the other views are all simple views of one table.
EDIT: With the materialized view, the biggest bottleneck seems to be the 'Copying to temp table' step. Without ordering the output, it takes .0025 seconds (much better!) but the final output does need ordered. Is there any way to enhance the performance of that step, or work around it?
Sorry if the formatting is difficult to read, I'm new and unsure how it is regularly done.
Thanks for your help! If anything else is needed, please let me know!
EDIT: Table sizes, for reference:
Tag Relations: 197,411
Tags: 16,897
Stories: 51,801
Images: 28,383
Videos: 2,408
Featured: 13
I think optimizing your query alone won't be very useful. First thoughts are that joining a subquery, itself made of UNIONs, is alone a double bottleneck for performance.
If you have the option to change your database structure, then I would suggest to merge the 3 tables stories, images and videos into one, if they are, as it looks like, very similar (adding them a type ENUM('story', 'image', 'video')) to differentiate the records; this would remove both the subquery and the union.
Also, it looks like your views on stories and videos, are not using an indexed field to filter content. Are you querying an indexed column?
It's a pretty tricky problem without knowing your full table structure and the repartition of your data!
Another option, which would not involve bringing modifications to your existing database (especially if it is already in production), would be to "cache" this information into another table, which would be periodically refreshed by a cron job.
The caching can be done at different levels, either on the full query, or on subparts of it (independent views, or the 3 unions merged into a single cache table, etc.)
The viability of this option depends on whether it is acceptable to display slightly outdated data, or not. It might be acceptable for just some parts of your data, which may imply that you will cache just a subset of the tables/views involved in the query.