Ok, so i have the following schema and query which is very slow (when using real data) because of the ORDER BY:
http://sqlfiddle.com/#!2/5e7bb/10
As per mysql man : "You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.) "
but i still need to sort by that column. How would i need to do this ?
UPDATE: since the fiddle was updated :
SELECT
cpa.product_id,
cp.product_internal_ref,
cp.product_name,
cpa.product_sale_price,
cpa.is_product_service,
cpa.product_service_price
FROM
catalog_products_attributes cpa
JOIN
catalog_products cp ON cp.product_id = cpa.product_id
WHERE
cpa.product_id IN (
SELECT
product_id
FROM
catalog_products_categories
WHERE
category_id = 41
)
ORDER BY
cpa.product_service_price DESC
Related
I have two tables: 'tasknotes' and 'relatedtasks'.
I want to return all the rows from 'tasknotes' in which the values in its'task_id' column corresponds to the values held in one of two columns in the 'relatedtasks' table. The columns from the 'relatedtasks' table which hold these values are called 'primarytaskid' and 'relatedtaskid'
I want to return values from the 'tasknotes' if the 'tasknotes.task_id' matches the value stored in either relatedtasks.primarytaskid or relatedtasks.relatedtaskid
I tried to do it several ways. I've been looking at the example here as I thought it might help:
Selecting rows from one table using values gotten from another table MYSQL
But I can't get it quite right. I've also tried the following but I can't seem to get it to work.
"Select * from tasknotes WHERE tasknotes.task_id = ? AND relatedtasks.primarytaskid = ? OR relatedtasks.relatedtaskid= ?";
How should I be dong this? I recognise an option would be to create a version of this code:
SELECT t1.* FROM film t1 WHERE EXISTS (SELECT filmid
FROM film_rating_report t2
WHERE t2.rating = 'GE'
AND t2.filmid = t1.id);
which is from the other stack answer I cited and then run it twice targeting each column distinctly, but there must be a better solution.
********** RESOLVED CODE *****************
Thanks to the answer and comments below the final code is:
String sql = "SELECT t.* "
+ "FROM tasknotes t "
+ "INNER JOIN relatedtasks r "
+ "ON (t.task_id = r.primarytaskid OR t.task_id = r.relatedtaskid) "
+ "WHERE r.primarytaskid=? OR r.relatedtaskid=?";
the values for the '?' as passed in from statements in the java code.
This is a job for a JOIN with a nontrivial ON clause.
Try something like this:
SELECT t.*
FROM tasknotes t
JOIN relatedtasks r
ON (t.task_id = r.primarytask OR t.task_id = r.relatedtaskid)
Using JOIN rather than LEFT JOIN will suppress the rows from tasknotes that don't meet the ON clause.
If you need to filter, append a WHERE clause to the query. For example:
SELECT t.*
FROM tasknotes t
JOIN relatedtasks r
ON (t.task_id = r.primarytask OR t.task_id = r.relatedtaskid)
WHERE t.task_id = ?
The trick here:
All the FROM / JOIN ON stuff is to be considered a chunk of code. It specifies the table (physical or virtual) in your query. In your query your virtual table is
FROM tasknotes t
JOIN relatedtasks r
ON (t.task_id = r.primarytask OR t.task_id = r.relatedtaskid)
This virtual table has both tasknotes and relatedtasks columns in it. It has all possible combinations of rows from the two tables that match the ON clause.
(If you did FROM a JOIN b without an ON clause you would get all combinations of rows in a and b. That could be a great many rows.)
You put a SELECT clause before your virtual table to choose the columns you want to retrieve from that virtual table.
You put a WHERE clause after it to choose the rows you want from your virtual table.
In your case you may need SELECT DISTINCT t.* because your virtual table will have a row for each row in tasknotes times each matching row in relatedtasks.
Finally you let the query planner in MySQL figure out how to satisfy your query efficiently.
I am trying to make the following query run faster than 180 secs:
SELECT
x.di_on_g AS deviceid, SUM(1) AS amount
FROM
(SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN ("many (~1500) comma separated IDs coming from my code"))
GROUP BY g.device_id , g.guide_type_id) x
GROUP BY x.di_on_g
ORDER BY amount;
Screenshot from EXPLAIN:
https://ibb.co/da5oAF
Even if I run the subquery as separate query it is still very slow...:
SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN (("many (~1500) comma separated IDs coming from my code")
Screenshot from EXPLAIN:
ibb.co/gJHRVF
I have indexes on g.device_id and on other appropriate places.
Indexes:
SHOW INDEX FROM guide;
ibb.co/eVgmVF
SHOW INDEX FROM operator_guide_type;
ibb.co/f0TTcv
SHOW INDEX FROM operator_device;
ibb.co/mseqqF
I tried creating a new temp table for the ids and using a JOIN to replace the slow IN clause but that didn't make the query much faster.
All IDs are Integers and I tried creating a new temp table for the ids that come from my code and JOIN that table instead of the slow IN clause but that didn't make the query much faster. (10 secs faster)
None of the tables have more then 300,000 rows and the mysql configuration is good.
And the visual plan:
Query Plan
Any help will be appreciated !
Let's focus on the subquery. The main problem is "inflate-deflate", but I will get to that in a moment.
Add the composite index:
INDEX(locale_id, operator_id, device_id)
Why the duplicated "1" in
g.operator_id IN (1 , 1)
Why does the GROUP BY have 2 columns, when you select only 1? Is there some reason for using GROUP BY instead of DISTINCT. (The latter seems to be your intent.)
The only reason for these
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
would be to verify that there are guides and devices in those other table. Is that correct? Are these the PRIMARY KEYs, hence unique?: ogt.guide_type_id and od.device_id. If so, why do you need the GROUP BY? Based on the EXPLAIN, it sounds like both of those are related 1:many. So...
SELECT g.device_id AS di_on_g
FROM guide g
WHERE EXISTS( SELECT * FROM operator_guide_type WHERE guide_type_id = g.guide_type_id )
AND EXISTS( SELECT * FROM operator_device WHERE device_id = g.device_id
AND g.operator_id IN (1)
AND g.locale_id = 1
AND g.device_id IN (...)
Notes:
The GROUP BY is no longer needed.
The "inflate-deflate" of JOIN + GROUP BY is gone. The Explain points this out -- 139K rows inflated to 61M -- very costly.
EXISTS is a "semijoin", meaning that it does not collect all matches, but stops when it finds any match.
"the mysql configuration is good" -- How much RAM do you have? What Engine is the table? What is the value of innodb_buffer_pool_size?
I'm experimenting with a query that I'll use for pruning two related mysql tables. I'll be using it to delete all but the most recent entries.
This query behaves exactly as I expect:
SELECT
O.id AS O_id,
T.id AS T_id
FROM
rt.ObjectCustomFieldValues AS O
LEFT JOIN rt.Transactions AS T
ON O.id = T.NewReference
WHERE
O.Disabled = 1
AND O.CustomField = 58
AND O.ObjectId = 202784
AND T.id NOT IN (
SELECT
id
FROM
(
SELECT
id
FROM
Transactions
WHERE
Field = 58
AND ObjectId = 202784
ORDER BY
Created DESC
LIMIT 5
) Test
)
For the rows containing ObjectId 202784, I get the ObjectCustomFieldValues ids and the Transactions ids for all but the most recent 5 items.
Now how do I turn this into a general query that I can run over all rows instead of specifying the ObjectId manually?
To summarize, for field id 58, I want to iterate all ObjectId values and for each one, delete all but the most recent ObjectCustomFieldValues and Transactions.
You can view schema details here:
https://github.com/bestpractical/rt/blob/stable/etc/schema.mysql#L112
and here:
https://github.com/bestpractical/rt/blob/stable/etc/schema.mysql#L328
If your structure is not INSERTing data with a UNIX_TIMESTAMP(), depending on your entire database structure order, this could be difficult. If you add a UNIX_TIMESTAMP() you can use ORDER BY correctly no matter what.
EDIT. I missed the one main issue I was having. I want to display all the unique 'device_MAC' rows. So I want this query to output 3 rows (as per the original query). The issue I am having is connecting the data table to the remote_node table via dt_short = rn_short where the maximum timestamp for dt_short in the data table.
I am having trouble running a query on 3 tables (2 have many to many relations).
What I am trying to do:
Get each distinct rn_IEEE from the remotenodes table with the maximum timestamp (in the example this will get 3 rows with 3 distinct short addresses rn_short)
Join with the devicenames table on device_IEEE
Get each distinct dt_short from the data table with the maximum timestamp
Join dt_short with rn_short from the query above
Now the problem I am running into is that I can do the queries for the above individually, I have even gotten the first 3 of them together into a query but I cannot seem to properly join the last bit of data to get the result that I want.
I have been going in circles trying to solve this. Here is a link to SQL Fiddle which contains all the test data and the query as far as I got it, it does what i want for the first line but from table 'data' after the first line is NULL:
See this SQL fiddle
After going through your requirements and the data, it looks like you just need to change your query to include an INNER JOIN on the data table instead of a LEFT JOIN
See SQL Fiddle with Demo
select rn.*, dn.*, d.*
from remotenodes rn
inner join devicenames dn
on rn.rn_IEEE = dn.device_IEEE
and rn.rn_timestamp = (SELECT MAX(rn_timestamp) FROM remotenodes
WHERE rn.rn_IEEE = rn_IEEE
GROUP BY rn_IEEE)
inner join data d
on rn.rn_short = d.dt_short
AND d.dt_timestamp = (SELECT MAX(d2.dt_timestamp) AS ts
FROM data d2
WHERE d.dt_short = d2.dt_short
GROUP BY d2.dt_short)
what you have done the query in your SQL fiddle is right.Instead of using left join use inner join so that it will give you the first row
cheers.
Thanks for all your answers everyone. I managed to solve the problem by using views.
It's not the most efficient way but I think it will do for now.
Here is the SQL Fiddle link:
http://sqlfiddle.com/#!2/4076e/8
Try this query, for me its returning one row:
SELECT rn_short, rn_IEEE, device_name
FROM
(SELECT DISTINCTROW dt_short FROM (SELECT * FROM `data` ORDER BY `dt_timestamp` DESC) as data ) as a
JOIN
(SELECT rn_IEEE, rn_short, device_name FROM devicenames dn JOIN (SELECT DISTINCTROW rn_IEEE, rn_short FROM (SELECT * FROM `remotenodes` ORDER BY `rn_timestamp` DESC) as remotenodes GROUP BY rn_IEEE) as rn ON dn.device_IEEE = rn.rn_IEEE) as b
ON a.dt_short = b.rn_short
I'm still having problems understanding how to read, understand and optimize MySQL explain. I know to create indices on orderby columns but that's about it. Therefore I am hoping you can help me tune this query:
EXPLAIN
SELECT specie.id, specie.commonname, specie.block_description, maximage.title,
maximage.karma, imagefile.file_name, imagefile.width, imagefile.height,
imagefile.transferred
FROM specie
INNER JOIN specie_map ON specie_map.specie_id = specie.id
INNER JOIN (
SELECT *
FROM image
ORDER BY karma DESC
) AS maximage ON specie_map.image_id = maximage.id
INNER JOIN imagefile ON imagefile.image_id = maximage.id
AND imagefile.type = 'small'
GROUP BY specie.commonname
ORDER BY commonname ASC
LIMIT 0 , 24
What this query does is to find the photo with the most karma for a specie. You can see the result of this live:
http://www.jungledragon.com/species
I have a table of species, a table of images, a mapping table in between and an imagefile table, since there are multiple image files (formats) per image.
Explain output:
For the specie table, I have indices on its primary id and the field commonname. For the image table, I have indices on its id and karma field, and a few others not relevant to this question.
This query currently takes 0.8 to 1.1s which is too slow in my opinion. I have a suspicion that the right index will speed this up many times, but I don't know which one.
I think you'd go a great way by getting rid of the subquery. Look at the first and last rows of the "explain" result - it's copying the entire "image" table to a temporary table. You could obtain the same result by replacing the subquery with INNER JOIN image and moving ORDER BY karma DESC to the final ORDER BY clause:
SELECT specie.id, specie.commonname, specie.block_description, maximage.title,
maximage.karma, imagefile.file_name, imagefile.width, imagefile.height,
imagefile.transferred
FROM specie
INNER JOIN specie_map ON specie_map.specie_id = specie.id
INNER JOIN image AS maximage ON specie_map.image_id = maximage.id
INNER JOIN imagefile ON imagefile.image_id = maximage.id
AND imagefile.type = 'small'
GROUP BY specie.commonname
ORDER BY commonname ASC, karma DESC
LIMIT 0 , 24
The real problem is that there is no need to optimize MySQL explain. There is usually a query (or several queries) that you want to be efficient and EXPLAIN is a way to see if the execution of the query is going to happen as you expect it to happen.
That is you need to understand how the execution plan should look like and why and compare it with results of the EXPLAIN command. To understand how the plan is going to look like you should understand how indexes in MySQL work.
In the meantime, your query is a tricky one, since for efficient index using it has some limitations: a) simultaneous ordering and by a field from one table, and b) finding the last element in each group from another (the latter is a tricky task as itself). Since your database is rather small, you are lucky that you current query is rather fast (though you consider it slow).
I would rewrite the query in a bit hacky manner (I assume that there is at least one foto for each specie):
SELECT
specie.id, specie.commonname, specie.block_description,
maximage.title, maximage.karma,
imagefile.file_name, imagefile.width, imagefile.height, imagefile.transferred
FROM (
SELECT s.id,
(SELECT i.id
FROM specie_map sm
JOIN image i ON sm.image_id = i.id
WHERE sm.specie_id = s.id
ORDER BY i.karma DESC
LIMIT 1) as image_id
FROM specie s
ORDER BY s.commonname
LIMIT 0, 24
) as ids
JOIN specie
ON ids.id = specie.id
JOIN image as maximage
ON maximage.id = ids.image_id
JOIN imagefile
ON imagefile.image_id = ids.image_id AND imagefile.type = 'small';
You will need the following indexes:
(commonname) on specie
a composite (specie_id, image_id) on specie_map
a composite (id, karma) on image
a composite (image_id, type) on imagefile
Paging now should happen within the subquery.
The idea is to make complex computations within a subquery that operates with ids only and join for the rest of the data at the top. The data would be ordered in the order of the results of the subquery.
It would be better if you could provide the table structures and indexes. I came up with this alternative, it would be nice if you could try this and tell me what happens (I am curious!):
SELECT t.*, imf.* FROM (
SELECT s.*, (SELECT id FROM image WHERE karma = MAX(i.karma) LIMIT 1) AS max_image_id
FROM image i
INNER JOIN specie_map smap ON smap.image_id = i.id
INNER JOIN specie s ON s.id = smap.specie_id
GROUP BY s.commonname
ORDER BY s.commonname ASC
LIMIT 24
) t INNER JOIN imagefile imf
ON t.max_image_id = imf.image_id AND imf.type = 'small'