Slow MYSQL query after using inner join and sub query - mysql

I have spent hours trying to get my query to run faster so far it works on my database.
however it takes 43 seconds to return my result. basically two tables are joined and I need to only return the latest order_history_id for each order_id with an order_status of 12.
I have tried using table shortcuts ie T1 T2 etc but to keep it simple my sql query has the relevant tables names below any help greatly appreciated
SELECT oc_order.order_id, oc_order.firstname, oc_order.lastname
FROM oc_order
INNER JOIN oc_order_history ON oc_order.order_id = oc_order_history.order_id
AND oc_order_history.comment NOT LIKE ''
AND oc_order_history.order_status_id LIKE '12'
AND order_history_id = (SELECT max(order_history_id)
FROM oc_order_history i
WHERE i.order_id = oc_order.order_id)

INNER JOIN is little bit more resources eating as it adds another filtering condition, i.e. it is really a LEFT JOIN + WHERE {joining column} IS NOT NULL. Having indexes on columns taking part in joins and/or where clauses (especially string values that are searched with LIKE) will help to minimise the resources cost.
In Your query You are using LIKE in WHERE clause but completely wrong thus I can say You are misusing it. LIKE is used when You need to search for strings but You only know part of the string, e.g.
name LIKE 'Pet%'
will match all the rows with names like Pete, Peter, Petra, Petronela, Petriarca, Peter Pan, etc... If You want to search for exact value, compare with comparator signs =, <>, <=, >=, e.g.
name = 'Peter'
Now consider also this query:
SELECT o.order_id, o.firstname, o.lastname
FROM oc_order o
LEFT JOIN oc_order_history oh USING(order_id)
WHERE oh.order_status_id = 12
AND oh.order_history_id IN (
SELECT MAX(order_history_id)
FROM oc_order_history
GROUP BY order_history_id
)

Do not use LIKE if you do not want to search a text inside of a field.
Also the do not put the WHERE conditions into the JOIN conditions.
In addition, check that you have SQL Indexes in the fields that you are joining and filtering on.
Update: after taking a look at your tables, I realised that oc_order_history.comment is a Text column, so search on this column is going to be very slow. If you try to remove the condition (like I did) I am sure your query it will be much faster. If you have to query for the comment column, try to change it to a VARCHAR or put and index.
Also you should have indexes at least in oc_order_history(order_status_id) and in oc_order_history(order_id)
SELECT oc_order.order_id, oc_order.firstname, oc_order.lastname
FROM oc_order
INNER JOIN oc_order_history ON oc_order.order_id = oc_order_history.order_id
WHERE oc_order_history.order_status_id = 12
AND order_history_id = (SELECT MAX(order_history_id)
FROM oc_order_history i
WHERE i.order_id = oc_order.order_id)

Related

MySQL LEFT JOIN order of ON conditions

I have a MySQL query with inner joins and one left join and a lot of data in my database, and it's running quite slow. This is roughly my query:
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN
second_table ON (main_table.id = second_table.ref_id AND second_table.type = 'foo' AND second_table.bar IS NULL
WHERE
second_table.id IS NULL
;
An entry from main_table may have one or more referenced entries in second_table. I want to get all results from main_table, that either have no results in second_table, or only has irrelevant data in the second table (type 'foo' or bar is NULL).
Taking a look into the EXPLAIN, MySQL searches for bar IS NULL first, followed by type = 'foo', that would still result in many thousands of result, whereas checking for ref_id first would only leave very few results to check the other conditions on.
I only have an index on ref_id, not for type or bar and I don't feel the need to index them if I could just get the query search for ref_id first.
--EDIT: I noticed that on the copy of the database (where it has the actual data and runs slow) does also have an index on type and bar individually, so that's probably why MySQL prefers bar over the other keys. I'm considering a key spanning multiple fields.--
Does anybody have an idea how to optimize this kind of query? Is it possible to force MySQL using a certain order in the ON conditions?
"Solution": I added an index spanned over all the relevant fields.
I don't consider this being a real solution, because I believe, it would also have been faster if the JOIN was done on the indexed ref_id first. It probably did so when that was the only index, however my colleague had the idea to add an index separately on the other fields as well for some reason, probably needed somewhere else in our application.
What happens if you move the "Irrelevant" rows to the where part?
Seems to me the DB should have an easier time joining the tables, and will use the index
Something like
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN
second_table ON main_table.id = second_table.ref_id
WHERE
second_table.id IS NULL OR
(second_table.type = 'foo' AND second_table.bar IS NULL)
In MYSQL JOIN is faster then LEFT JOIN so you can write your query like this.
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN (SELECT main_table.*,second_table.* FROM main_table
JOIN second_table ON main_table.id = second_table.ref_id AND
second_table.type = 'foo' AND second_table.bar IS NULL) AS main_table2 ON
main_table2.id = main_table.id
WHERE
second_table.id IS NULL;

Using an INNER JOIN without returning any columns from the joined table

Running an INNER JOIN type of query, i get duplicate column names, which can pose a problem. This has been covered here extensively and i was able to find the solution to this problem, asides from it being fairly logical, by SELECTing only the columns i need.
However, i would like to know how i could run such a query without actually returning any of the columns from the joined table.
This is my MySQL query
SELECT * FROM product z
INNER JOIN crosslink__productXmanufacturer a
ON z.id = a.productId
WHERE
(z.title LIKE "%search_term%" OR z.search_keywords LIKE "%search_term%")
AND
z.availability = 1
AND
a.manufacturerId IN (22,23,24)
Question
How would i modify this MySQL query in order to return only columns from product and none of the columns from crosslink__productXmanufacturer?
Add the table name to the *. Replace
SELECT * FROM product z
with
SELECT z.* FROM product z
Often when you are doing this, the intention may be clearer using in or exists rather than a join. The join is being used for filtering, so putting the condition in the where clause makes sense:
SELECT p.*
FROM product p
WHERE (p.title LIKE '%search_term%' OR p.search_keywords LIKE '%search_term%') AND
p.availability = 1 AND
exists (SELECT 1
FROM pXm
WHERE pXm.productId = p.id AND pxm.manufacturerId IN (22, 23, 24)
);
With the proper indexes, this should run at least as fast as the join version (the index is crosslink__productXmanufacturer(productId, manufacturerId). In addition, you don't have to worry about returning duplicate records, if there are multiple matches in crosslink__productXmanufacturer.
You may notice two other small changes I made to the query. First, the table aliases are abbreviates for the table names, making the logic easier to follow. Second, the string constants use single quotes (the ANSI standard) rather than double quotes. Using single quotes only for string and date constants helps prevent inadvertent syntax errors.

Populate 'temporary' columns with corresponding values during MySQL join query, also limit

I'm doing several MySQL joins to get template variables (i.e. custom fields) and their values (in MODX Evo but it's irrelevant - this is a general MySQL query).
I'm looking ideally to be able to create 2 temporary columns in order to use SORT BY in the query, or something to this effect. I'd like to populate the values for 'event_date' and 'event_featured' for their corresponding id's in these new columns - then I could then sort the results by these columns.
On a very related note I would like to limit the results to 20 for each unique id, not for each row as would happen if I added LIMIT- it would crop the below result to the . Can this be accomplished at the same time?
Anybody know how / if these are possible? Many thanks in advance.
Code and image of the results below:
SELECT DISTINCT
content.id, content.pagetitle, content.template , content.published,
templates.templatename,
tv_props.name,
tv_values.value
FROM `modx_site_content` AS `content`
LEFT JOIN `modx_site_templates` AS `templates` ON content.template=templates.id
LEFT JOIN `modx_site_tmplvar_templates` AS `template_tvs` ON templates.id=template_tvs.templateid
LEFT JOIN `modx_site_tmplvars` AS `tv_props` ON template_tvs.tmplvarid=tv_props.id
LEFT JOIN `modx_site_tmplvar_contentvalues` AS `tv_values` ON template_tvs.tmplvarid=tv_values.tmplvarid
WHERE templates.id=89
AND (
tv_props.name='event_featured'
OR tv_props.name='event_link_through'
OR tv_props.name='event_title'
OR tv_props.name='event_date'
OR tv_props.name='event_date_text'
OR tv_props.name='event_short_description'
OR tv_props.name='event_list_image'
);
Link to full-size image
You're going to need a couple of virtual tables, also known as subqueries, to retrieve these two properties of events from your name/value table. The generic name for this kind of query is a "pivot," for your information.
The mental knack is to think of the subquery as a virtual table which you can use in a surrounding query. The subquery for event_date looks like this, I believe.
SELECT content.id AS id,
tv_values.value AS event_date
FROM modx_site_content AS content
LEFT JOIN modx_site_templates AS templates
ON content.template=templates.id
LEFT JOIN modx_site_tmplvar_templates AS template_tvs
ON templates.id=template_tvs.templateid
LEFT JOIN modx_site_tmplvars AS tv_props
ON template_tvs.tmplvarid=tv_props.id
LEFT JOIN modx_site_tmplvar_contentvalues AS tv_values
ON template_tvs.tmplvarid=tv_values.tmplvarid
WHERE tv_props.name = 'event_date'
This little query produces a resultset that's a table relating content id to event date. I honestly don't understand your schema well enough to know if there's just one event date for each content id, so you might need to adjust this query to SELECT more columns. As you debug this, you should try out the subquery and make sure it's giving the results you hope for.
Then, when you're sure the subquery is OK, you join that subquery into your overall query, generically like so.
SELECT DISTINCT
content.id, event_date.event_date, templates.column,
table.column, table.colum, etc, etc
FROM modx_site_content AS content
LEFT JOIN table ON condition
LEFT JOIN (
SELECT content.id AS id,
tv_values.value AS event_date
FROM modx_site_content AS content
LEFT JOIN modx_site_templates AS templates
ON content.template=templates.id
LEFT JOIN modx_site_tmplvar_templates AS template_tvs
ON templates.id=template_tvs.templateid
LEFT JOIN modx_site_tmplvars AS tv_props
ON template_tvs.tmplvarid=tv_props.id
LEFT JOIN modx_site_tmplvar_contentvalues AS tv_values
ON template_tvs.tmplvarid=tv_values.tmplvarid
WHERE tv_props.name = 'event_date'
) AS event_date ON event_date.id = content.id
LEFT JOIN etc, etc, etc.
WHERE etc etc etc
Do you see how that goes? You can use tablename AS table or (some query) AS table interchangeably. You can also define a VIEW in your schema that provides the same data, and name it in your query. That's a handy way to make your queries less hairy.
By the way, you'll boost performance if you change
AND (
tv_props.name='event_featured'
OR tv_props.name='event_link_through'
OR tv_props.name='event_title' etc )
to
AND tv.props.name IN ('event_featured',
'event_link_through',
'event_title', etc)
You've probably noticed I'm a bit of a stickler for indentation in SQL queries. I find this helpful; I often find mistakes while I'm fixing up the indentation. Your practice may vary.

simple joins between 2 mysql tables returning all results every time.. Help!

I just imported a large amount of data into two tables. Let's call them shipments and returns.
When trying to do a simple join (left or inner) based on any criteria in these two tables. query looks like it tries to do a cross join or find every combination instead of what the query should be pulling.
each table has an PK id field, but there is not FK relationship between the two other than some shared field.
I'm currently just trying to related them on shipment_id.
I feel this is a simple answer. Am I missing a reference or something obvious that is causing this? Thanks!
here's an example. This should returned under 100 rows. This instead returns hundreds of thousands.
SELECT r.*
FROM returns as r
left outer join shipments as s
on r.shipment_id = s.shipment_id
where r.date = '2011-06-20'
Here is a query that should work:
SELECT T0.*, T1.*
FROM shipments AS T0 LEFT JOIN returns AS T1 ON T0.shipment_id = T1.shipment_id
ORDER BY T0.shipment_id;
This query join assumes 1:1 on the shipment_id
It would be nice if you included the query you were using
You need to specify what you are joining on, otherwise it will do a cartesian join:
SELECT r.*
FROM returns as r
LEFT JOIN shipments as s ON s.shipment_id = r.shipment_id
where r.date = '2011-06-20'
Josh,
I would be interested in seeing what would happen if you forced a join to a specific record or set of records instead of the whole table. Assuming there is a shipment with an id of 5 in your table, you could try:
SELECT r.* FROM returns as r
left join shipments as s
ON 5 = r.shipment_id
WHERE r.date = '2011-06-20'
While just a fancy where clause, it would at least prove that the join you are attempting will eventually work correctly. The issue is that your on clause is always returning true, no matter what the value is. This could be because it's not interpreting the shipment_id as an integer, but instead as a true/false variable where any value evaluates to true.
Original Rejected Solution:
No Foreign Key relationship should be needed in order to make the joins happen. The PK id fields I'm assuming are an integer (or number, or whatever your rdms equivalent is)?
Can you past a snippet of your sql query?
Updating based on posted query:
I would add your explicit join criteria in order to rule out any funny business (my guess is since no criteria is specified, it's using 1=1, which always joins). So I would change your query to look like:
SELECT r.*
FROM returns as r
left join shipments as s ON
s.ShipId = R.ReturnId
where r.date = '2011-06-20'
The issue turned out to be very simple, just not readily apparent until going through all the columns. It turns out that the shipment ID was duplicated through every row as it hit the upper limit for the int datatype. This is why joins were returning every record.
After switching the datatype to bigint and reimporting, everything worked great. Thanks all for looking into it.

What is the size limitation for IN and NOT IN in MySQL

I get out of memory exception in my application, when the condition for IN or NOT IN is very large. I would like to know what is the limitation for that.
Perhaps you would be better off with another way to accomplish your query?
I suggest you load your match values into a single-column table, and then inner-join the column being queried to the single column in the new table.
Rather than
SELECT a, b, c FROM t1 WHERE d in (d1, d2, d3, d4, ...)
build a temp table with 1 column, call it "dval"
dval
----
d1
d2
d3
SELECT a, b, c FROM t1
INNER JOIN temptbl ON t1.d = temptbl.dval
Having to ask about limits when either doing a SQL query or database design is a good indicator that you're doing it wrong.
I only ever use IN and NOT IN when the condition is very small (under 100 rows or so). It performs well in those scenarios. I use an OUTER JOIN when the condition is large as the query doesn't have to look up the "IN" condition for every tuple. You just have to check the table that you want all rows to come from.
For "IN" the join condition IS NOT NULL
For "NOT IN" the join condition IS NULL
e.g.
/* Get purchase orders that have never been rejected */
SELECT po.*
FROM PurchaseOrder po LEFT OUTER JOIN
(/* Get po's that have been rejected */
SELECT po.PurchaesOrderID
FROM PurchaseOrder po INNER JOIN
PurchaseOrderStatus pos ON po.PurchaseOrderID = pos.PurchaseOrderID
WHERE pos.Status = 'REJECTED'
) por ON po.PurchaseOrderID = por.PurchaseOrderID
WHERE por.PurchaseOrderID IS NULL /* We want NOT IN */
I"m having a similar issue but only passing 100 3 digit ids in my IN clause. When I look at the stack trace, it actually cuts off the comma separate values in the IN clause. I don't get an error, I just don't get all the results to return. Has anyone had an issue like this before? If its relevant, I'm using the symfony framework... I'm checking to see if its a propel issue but just wanted to see if it could be sql
I have used IN with quite large lists of IDs - I suspect that the memory problem is not in the query itself. How are you retrieving the results?
This query, for example is from a live site:
SELECT DISTINCT c.id, c.name FROM categories c
LEFT JOIN product_categories pc ON c.id = pc.category_id
LEFT JOIN products p ON p.id = pc.product_id
WHERE p.location_id IN (
955,891,901,877,736,918,900,836,846,914,771,773,833,
893,782,742,860,849,850,812,945,775,784,746,1036,863,
750,763,871,817,749,838,986,794,867,758,923,804,733,
949,808,837,741,747,954,939,865,857,787,820,783,760,
911,745,928,818,887,847,978,852
) ORDER BY c.name ASC
My first pass at the code is terribly naive and there are about 10 of these queries on a single page and the database doesn't blink.
You could, of course, be running a list of 100k values which would be a different story altogether.
I don't know what the limit is, but I've run into this problem before as well. I had to rewrite my query something like this:
select * from foo
where id in (select distinct foo_id from bar where ...)