Speed of LEFT JOIN in the same table - Mysql

Speed of LEFT JOIN in the same table - Mysql - mysql

I have a table called PRODUCTS. It has a field of language. I want to list all rows of language = 'es' which do lack a traduction (corresponding ID) in other language. I have tried the following (id_products is the key relating rows of the same product in different language). It is extremely slow (seconds for a few thousand rows):
SELECT
*
FROM
products AS source
LEFT JOIN products AS target ON source.id_products = target.id_products
AND source.`language` = 'es'
AND target.`language` = 'en'
WHERE
target.id_products IS NULL

My guess is that this is happening due to lack of indexes on the table.
Try adding index on (id_products,language) , that should speed up your query.
In addition you can try to use NOT EXISTS() instead of a left join, maybe it will speed things up a bit as well:
SELECT * FROM products t
WHERE t.language = 'es'
AND NOT EXISTS(SELECT 1 FROM products s
WHERE s.language = 'en'
and s.id_products = t.id_products)

Better index
It will be faster with the composite index in this order:
INDEX(language, id_products)
The query will start with source. For that it needs to look at rows with language = 'es', then reach into target. For target it does not matter which order the index columns are in.
Don't be mislead by the Query cache
If you get a time of less than 1 millisecond, the you are probably getting the answer from the "Query cache". For testing, avoid it by doing
SELECT SQL_NO_CACHE ...
There is no use doing SELECT * ... since you only want source columns, not all the NULLs from target. So either say SELECT source.* or spell out just the columns you want.

There is something weird with where you do the filtering in your query.
You'll get a list of all products, regardless of if source is in 'es' or not. I always recommend to put all the ON-conditions inside parenthesis for clarity.
SELECT *
FROM products AS source
LEFT JOIN products AS target
ON (source.id_products = target.id_products AND target.language = 'en')
WHERE source.language = 'es'
AND target.id_products IS NULL;
And as other points out you also need an index on language for the filtering on source, and depending on how large your tabell is, also on id_products.
alter table products add index search_index (language, id_products);
See this sql fiddle to see it in action.

Related

Is this the proper use of a MySQL index? Why it seems that is not working?

I have a PHP website that shows in a specific page a list of all comments related to that specific url.
My query
I do a SELECT query and I get some results. I wanted to add an index in order to make the query quicker:
SELECT
commentID, comment, users.userID
FROM comments
LEFT JOIN users
ON comments.userID = users.userID
WHERE contentID = ?
Original query in spanish:
SELECT
comentarioID, comentario, usuarios.userID
FROM comentarios
LEFT JOIN usuarios
ON comentarios.userID = usuarios.userID
WHERE contenidoID = ?
My indexes
As you can see is an easy query, but MySQL needs to search between the +14.000 comments in order to show them, so I added an index:
ALTER TABLE comments ADD INDEX(userID);
ALTER TABLE users ADD INDEX(userID);
So here is how comments indexes look without the index:
The result
And here is after I added it:
In both cases (before and after adding the indexes), if I use EXPLAIN for the SELECT query that I've shown at the beginning, I get:
The tables are all InnoDB.
Why there is no real difference?
The speed of the query is almost the same before and after adding the index: (Query took around 0.0163 seconds in both cases).
Is this post duplicated?
Before declaring this is a duplicated issue, please, note that I've already read this post, and this other one, and this other one... but I didn't find the replies there useful, because my case in my opinion is different.

(I presume that the ambiguous attributes in your query are from the comentarios table - you should have qualified these)
Because you are using a LEFT JOIN then the DBMS will always find matching rows in comentarios first before it goes looking for data in usuarios. An index is fast way to find rows. So by the time it has found those matching rows, it has no reason to use the new index.
OTOH if you specified a predicate in the users table, it would have used your new userID index index to find the matching rows in the comments table:
SELECT
comentarioID, comentario, usuarios.userID
FROM comentarios
INNER JOIN usuarios
ON comentarios.userID = usuarios.userID
WHERE usuarios.name = ?
I would expect "UserID" to be unique / the primary key, hence adding a second index on the same attribute is redundant.
Further, if my assumption above holds, your query only outputs attributes which exist in the comentarios table, hence unless you allow comments to be created without a matching user, the join is redundant / expensive and the query can be written as just:
SELECT
comentarioID, comentario, userID
FROM comentarios
WHERE contenidoID = ?

WHERE contenidoID = ? needs INDEX(contenidoID)
WHERE usuarios.name = ? needs INDEX(name)

Complex MySQL query problems and also SQL hangs

I am trying to write an SQL query which is pretty complex. The requirements are as follows:
I need to return these fields from the query:
track.artist
track.title
track.seconds
track.track_id
track.relative_file
album.image_file
album.album
album.album_id
track.track_number
I can select a random track with the following query:
select
track.artist, track.title, track.seconds, track.track_id,
track.relative_file, album.image_file, album.album,
album.album_id, track.track_number
FROM
track, album
WHERE
album.album_id = track.album_id
ORDER BY RAND() limit 10;
Here is where I am having trouble though. I also have a table called "trackfilters1" thru "trackfilters10" Each row has an auto incrementing ID field. Therefore, row 10 is data for album_id 10. These fields are populated with 1's and 0's. For example, album #10 has 10 tracks, then trackfilters1.flags will contain "1111111111" if all tracks are to be included in the search. If track 10 was to be excluded, then it would contain "1111111110"
My problem is including this clause.
The latest query I have come up with is the following:
select
track.artist, track.title, track.seconds,
track.track_id, track.relative_file, album.image_file,
album.album, album.album_id, track.track_number
FROM
track, album, trackfilters1, trackfilters2
WHERE
album.album_id = track.album_id
AND
( (album.album_id = trackfilters1.id)
OR
(album.album_id=trackfilters2.id) )
AND
( (mid(trackfilters1.flags, track.track_number,1) = 1)
OR
( mid(trackfilters2.flags, track.track_number,1) = 1))
ORDER BY RAND() limit 2;
however this is causing SQL to hang. I'm presuming that I'm doing something wrong. Does anybody know what it is? I would be open to suggestions if there is an easier way to achieve my end result, I am not set on repairing my broken query if there is a better way to accomplish this.
Additionally, in my trials, I have noticed when I had a working query and added say, trackfilters2 to the FROM clause without using it anywhere in the query, it would hang as well. This makes me wonder. Is this correct behavior? I would think adding to the FROM list without making use of the data would just make the server procure more data, I wouldn't have expected it to hang.

There's not enough information here to determine what's causing the performance issue.
But here's a few suggestions and comments.
Ditch the old-school comma syntax for the join operations, and use the JOIN keyword instead. And relocate the join predicates to an ON clause.
And for heaven's sake, format the SQL so that it's decipherable by someone trying to read it.
There's some questions here... will there always be a matching row in both trackfilters1 and trackfilters2 for rows you want to return? Or could a row be missing from trackfilters2, and you still want to return the row if there's a matching row in trackfilters1? (The answer to that question determines whether you'd want to use an outer join vs an inner join to those tables.)
For best performance with large sets, having appropriate indexes defined is going to be critical.
Use EXPLAIN to see the execution plan.
I suggest you try writing your query like this:
SELECT track.artist
, track.title
, track.seconds
, track.track_id
, track.relative_file
, album.image_file
, album.album
, album.album_id
, track.track_number
FROM track
JOIN album
ON album.album_id = track.album_id
LEFT
JOIN trackfilters1
ON trackfilters1.id = album.album_id
LEFT
JOIN trackfilters2
ON trackfilters2.id = album.album_id
WHERE MID(trackfilters1.flags, track.track_number, 1) = '1'
OR MID(trackfilters2.flags, track.track_number, 1) = '1'
ORDER BY RAND()
LIMIT 2
And if you want help with performance, provide the output from EXPLAIN, and what indexes are defined.

Alternative to COUNT for innodb to prevent table scan?

I've managed to put together a query that works for my needs, albeit more complicated than I was hoping. But, for the size of tables the query is slower than it should be (0.17s). The reason, based on the EXPLAIN provided below, is because there is a table scan on the meta_relationships table due to it having the COUNT in the WHERE clause on an innodb engine.
Query:
SELECT
posts.post_id,posts.post_name,
GROUP_CONCAT(IF(meta_data.type = 'category', meta.meta_name,null)) AS category,
GROUP_CONCAT(IF(meta_data.type = 'tag', meta.meta_name,null)) AS tag
FROM posts
RIGHT JOIN meta_relationships ON (posts.post_id = meta_relationships.object_id)
LEFT JOIN meta_data ON meta_relationships.meta_data_id = meta_data.meta_data_id
LEFT JOIN meta ON meta_data.meta_id = meta.meta_id
WHERE meta.meta_name = computers AND meta_relationships.object_id
NOT IN (SELECT meta_relationships.object_id FROM meta_relationships
GROUP BY meta_relationships.object_id HAVING count(*) > 1)
GROUP BY meta_relationships.object_id
This particular query, selects posts which have ONLY the computers category. The purpose of count > 1 is to exclude posts that contain computers/hardware, computers/software, etc. The more categories that are selected, the higher the count would be.
Ideally, I'd like to get it functioning like this:
WHERE meta.meta_name IN ('computers') AND meta_relationships.meta_order IN (0)
or
WHERE meta.meta_name IN ('computers','software')
AND meta_relationships.meta_order IN (0,1)
etc..
But unfortunately this doesn't work, because it doesn't take into consideration that there may be a meta_relationships.meta_order = 2.
I've tried...
WHERE meta.meta_name IN ('computers')
GROUP BY meta_relationships.meta_order
HAVING meta_relationships.meta_order IN (0) AND meta_relationships.meta_order NOT IN (1)
but it doesn't return the correct amount of rows.
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY meta ref PRIMARY,idx_meta_name idx_meta_name 602 const 1 Using where; Using index; Using temporary; Using filesort
1 PRIMARY meta_data ref PRIMARY,idx_meta_id idx_meta_id 8 database.meta.meta_id 1
1 PRIMARY meta_relationships ref idx_meta_data_id idx_meta_data_id 8 database.meta_data.meta_data_id 11 Using where
1 PRIMARY posts eq_ref PRIMARY PRIMARY 4 database.meta_relationships.object_id 1
2 MATERIALIZED meta_relationships index NULL idx_object_id 4 NULL 14679 Using index
Tables/Indexes:
meta
This table contains the category and tag names.
indexes:
PRIMARY KEY (meta_id), KEY idx_meta_name (meta_name)
meta_data
This table contains additional data about the categories and tags such as type (category or tag), description, parent, count.
indexes:
PRIMARY KEY (meta_data_id), KEY idx_meta_id (meta_id)
meta_relationships
This is a junction/lookup table. It contains a foreign key to the posts_id, a foreign key to the meta_data_id, and also contains the order of the categories.
indexes:
PRIMARY KEY (relationship_id), KEY idx_object_id (object_id), KEY idx_meta_data_id (meta_data_id)
The count allows me to only select the posts with that correct level of category. For example, the category computers has posts with only the computers category but it also has posts with computers/hardware. The count filters out posts that contain those extra categories. I hope that makes sense.
I believe the key to optimizing the query is to get away completely from doing the COUNT.
An alternative to the COUNT would possibly be using meta_relationships.meta_order or meta_data.parent instead.
The meta_relationships table will grow quickly and with the current size (~15K rows) I'm hoping to achieve an execution time in the 100th of seconds rather than the 10ths of seconds.
Since there needs to be multiple conditions in the WHERE clause for each category/tag, any answer optimized for a dynamic query is preferred.
I have created an IDE with sample data.
How can I optimize this query?
EDIT :
I was never able to find an optimal solution to this problem. It was really a combination of smcjones recommendation of improving the indexes for which I would recommend doing an EXPLAIN and looking at EXPLAIN Output Format then change the indexes to whatever gives you the best performance.
Also, hpf's recommendation to add another column with the total count helped tremendously. In the end, after changing the indexes, I ended up going with this query.
SELECT posts.post_id,posts.post_name,
GROUP_CONCAT(IF(meta_data.type = 'category', meta.meta_name,null)) AS category,
GROUP_CONCAT(IF(meta_data.type = 'tag', meta.meta_name,null)) AS tag
FROM posts
JOIN meta_relationships ON meta_relationships.object_id = posts.post_id
JOIN meta_data ON meta_relationships.meta_data_id = meta_data.meta_data_id
JOIN meta ON meta_data.meta_id = meta.meta_id
WHERE posts.meta_count = 2
GROUP BY posts.post_id
HAVING category = 'category,subcategory'
After getting rid of the COUNT, the big performance killer was the GROUP BY and ORDER BY, but the indexes are your best friend. I learned that when doing a GROUP BY, the WHERE clause is very important, the more specific you can get the better.

With a combination of optimized queries AND optimizing your tables, you will have fast queries. However, you cannot have fast queries without an optimized table.
I cannot stress this enough: If your tables are structured correctly with the correct amount of indexes, you should not be experiencing any full table reads on a query like GROUP BY... HAVING unless you do so by design.
Based on your example, I have created this SQLFiddle.
Compare that to SQLFiddle #2, in which I added indexes and added a UNIQUE index against meta.meta_naame.
From my testing, Fiddle #2 is faster.
Optimizing Your Query
This query was driving me nuts, even after I made the argument that indexes would be the best way to optimize this. Even though I still hold that the table is your biggest opportunity to increase performance, it did seem that there had to be a better way to run this query in MySQL. I had a revelation after sleeping on this problem, and used the following query (seen in SQLFiddle #3):
SELECT posts.post_id,posts.post_name,posts.post_title,posts.post_description,posts.date,meta.meta_name
FROM posts
LEFT JOIN meta_relationships ON meta_relationships.object_id = posts.post_id
LEFT JOIN meta_data ON meta_relationships.meta_data_id = meta_data.meta_data_id
LEFT JOIN meta ON meta_data.meta_id = meta.meta_id
WHERE meta.meta_name = 'animals'
GROUP BY meta_relationships.object_id
HAVING sum(meta_relationships.object_id) = min(meta_relationships.object_id);
HAVING sum() = min() on a GROUP BY should check to see if there is more than one record of each type. Obviously, each time the record shows up, it will add more to the sum. (Edit: On subsequent tests it seems like this has the same impact as count(meta_relationships.object_id) = 1. Oh well, the point is I believe you can remove subquery and have the same result).
I want to be clear that you won't notice much if any optimization on the query I provided you unless the section, WHERE meta.meta_name = 'animals' is querying against an index (preferably a unique index because I doubt you'll need more than one of these and it will prevent accidental duplication of data).
So, instead of a table that looks like this:
CREATE TABLE meta_data (
meta_data_id BIGINT,
meta_id BIGINT,
type VARCHAR(50),
description VARCHAR(200),
parent BIGINT,
count BIGINT);
You should make sure you add primary keys and indexes like this:
CREATE TABLE meta_data (
meta_data_id BIGINT,
meta_id BIGINT,
type VARCHAR(50),
description VARCHAR(200),
parent BIGINT,
count BIGINT,
PRIMARY KEY (meta_data_id,meta_id),
INDEX ix_meta_id (meta_id)
);
Don't overdo it, but every table should have a primary key, and any time you are aggregating or querying against a specific value, there should be indexes.
When indexes are not used, the MySQL will walk through each row of the table until it finds what you want. In such a limited example as yours this doesn't take too long (even though it's still noticeably slower), but when you add thousands or more records, this will become extraordinarily painful.
In the future, when reviewing your queries, try to identify where your full table scans are occurring and see if there is an index on that column. A good place to start is wherever you are aggregating or using the WHERE syntax.
A note on the count column
I have not found putting count columns into the table to be helpful. It can lead to some pretty serious integrity issues. If a table is properly optimized, It should be very easy to use count() and get the current count. If you want to have it in a table, you can use a VIEW, although that will not be the most efficient way to make the pull.
The problem with putting count columns into a table is that you need to update that count, using either a TRIGGER or, worse, application logic. As your program scales out that logic can either get lost or buried. Adding that column is a deviation from normalization and when something like this is to occur, there should be a VERY good reason.
Some debate exists as to whether there is ever a good reason to do this, but I think I'd be wise to stay out of that debate because there are great arguments on both sides. Instead, I will pick a much smaller battle and say that I see this causing you more headaches than benefits in this use case, so it is probably worth A/B testing.

Since the HAVING seems to be the issue, can you instead create a flag field in the posts table and use that instead? If I understand the query correctly, you're trying to find posts with only one meta_relationship link. If you created a field in your posts table that was either a count of the meta_relationships for that post, or a boolean flag for whether there was only one, and indexed it of course, that would probably be much faster. It would involve updating the field if the post was edited.
So, consider this:
Add a new field to the posts table called "num_meta_rel". It can be an unsigned tinyint as long as you'll never have more than 255 tags to any one post.
Update the field like this:
UPDATE posts
SET num_meta_rel=(SELECT COUNT(object_id) from meta_relationships WHERE object_id=posts.post_id);
This query will take some time to run, but once done you have all the counts precalculated. Note this can be done better with a join, but SQLite (Ideone) only allows subqueries.
Now, you rewrite your query like this:
SELECT
posts.post_id,posts.post_name,
GROUP_CONCAT(IF(meta_data.type = 'category', meta.meta_name,null)) AS category,
GROUP_CONCAT(IF(meta_data.type = 'tag', meta.meta_name,null)) AS tag
FROM posts
RIGHT JOIN meta_relationships ON (posts.post_id = meta_relationships.object_id)
LEFT JOIN meta_data ON meta_relationships.meta_data_id = meta_data.meta_data_id
LEFT JOIN meta ON meta_data.meta_id = meta.meta_id
WHERE meta.meta_name = computers AND posts.num_meta_rel=1
GROUP BY meta_relationships.object_id
If I've done this correctly, the runnable code is here: http://ideone.com/ZZiKgx
Note that this solution requires that you update the num_meta_rel (choose a better name, that one is terrible...) if the post has a new tag associated with it. But that should be much faster than scanning your entire table over and over.

See if this gives you the right answer, possibly faster:
SELECT p.post_id, p.post_name,
GROUP_CONCAT(IF(md.type = 'category', meta.meta_name, null)) AS category,
GROUP_CONCAT(IF(md.type = 'tag', meta.meta_name, null)) AS tag
FROM
( SELECT object_id
FROM meta_relation
GROUP BY object_id
HAVING count(*) = 1
) AS x
JOIN meta_relation AS mr ON mr.object_id = x.object_id
JOIN posts AS p ON p.post_id = mr.object_id
JOIN meta_data AS md ON mr.meta_data_id = md.meta_data_id
JOIN meta ON md.meta_id = meta.meta_id
WHERE meta.meta_name = ?
GROUP BY mr.object_id

Unfortunately I have no possibility to test performance,
But try my query using your real data:
http://sqlfiddle.com/#!9/81b29/13
SELECT
posts.post_id,posts.post_name,
GROUP_CONCAT(IF(meta_data.type = 'category', meta.meta_name,null)) AS category,
GROUP_CONCAT(IF(meta_data.type = 'tag', meta.meta_name,null)) AS tag
FROM posts
INNER JOIN (
SELECT meta_relationships.object_id
FROM meta_relationships
GROUP BY meta_relationships.object_id
HAVING count(*) < 3
) mr ON mr.object_id = posts.post_id
LEFT JOIN meta_relationships ON mr.object_id = meta_relationships.object_id
LEFT JOIN meta_data ON meta_relationships.meta_data_id = meta_data.meta_data_id
INNER JOIN (
SELECT *
FROM meta
WHERE meta.meta_name = 'health'
) meta ON meta_data.meta_id = meta.meta_id
GROUP BY posts.post_id

Use
sum(1)
instead of
count(*)

MySQL - How do I figure out which columns to index?

How do I figure out which columns to index?
SELECT a.ORD_ID AS Manual_Added_Orders,
a.ORD_poOrdID_List AS Auto_Added_Orders,
a.ORDPOITEM_ModelNumber,
a.ORDPO_Number,
a.ORDPOITEM_ID,
(SELECT sum(ORDPOITEM_Qty) AS ORDPOITEM_Qty
FROM orderpoitems
WHERE ORDPOITEM_ModelNumber = a.ORDPOITEM_ModelNumber
AND ORDPO_Number = 123007)
AS ORDPOITEM_Qty,
a.ORDPO_TrackingNumber,
a.ORDPOITEM_Received,
a.ORDPOITEM_ReceivedQty,
a.ORDPOITEM_ReceivedBy,
b.ORDPO_ID
FROM orderpoitems a
LEFT JOIN orderpo b ON (a.ORDPO_Number = b.ORDPO_Number)
WHERE a.ORDPO_Number = 123007
GROUP BY a.ORDPOITEM_ModelNumber
ORDER BY a.ORD_poOrdID_List, a.ORD_ID
I did the explain that is how I am getting these pictures... I added a few indexes... still not looking good.

Well firstly your query could be simplified to:
SELECT a.ORD_ID AS Manual_Added_Orders,
a.ORD_poOrdID_List AS Auto_Added_Orders,
a.ORDPOITEM_ModelNumber,
a.ORDPO_Number,
a.ORDPOITEM_ID,
SUM(ORDPOITEM_Qty) AS ORDPOITEM_Qty
a.ORDPO_TrackingNumber,
a.ORDPOITEM_Received,
a.ORDPOITEM_ReceivedQty,
a.ORDPOITEM_ReceivedBy,
b.ORDPO_ID
FROM orderpoitems a
LEFT JOIN orderpo b ON (a.ORDPO_Number = b.ORDPO_Number)
WHERE a.ORDPO_Number = 123007
GROUP BY a.ORDPOITEM_ModelNumber
ORDER BY a.ORD_poOrdID_List, a.ORD_ID
Secondly I would start by creating a index on the orderpoitems.ORDPO_Number and orderpo.ORDPO_number
Bit hard to say without the table structures.

Read up on indexes and covering index
From what you have, start with what is in your where clause AND join criteria to another table. Also, include if possible and practical, those columns used in group by / order by as order by is typically a killer when finishing a query.
That said, I would have an index on your OrderPOItems table on
( ordpo_number, orderpoitem_ModelNumber, ord_poordid_list, ord_id )
This way, the FIRST element hits your WHERE clause. Next the column for your data grouping, finally, the columns for your order by. This way, the joins and qualifying components can be "covered" from the index alone without having to go to the raw data pages for the rest of the columns being returned. Hopefully a good jump start specific to your scenario.

Long query times for simple MySQL SELECT with JOIN

SELECT COUNT(*)
FROM song AS s
JOIN user AS u
ON(u.user_id = s.user_id)
WHERE s.is_active = 1 AND s.public = 1
The s.active and s.public are index as well as u.user_id and s.user_id.
song table row count 310k
user table row count 22k
Is there a way to optimize this? We're getting 1 second query times on this.

Ensure that you have a compound "covering" index on song: (user_id, is_active, public). Here, we've named the index covering_index:
SELECT COUNT(s.user_id)
FROM song s FORCE INDEX (covering_index)
JOIN user u
ON u.user_id = s.user_id
WHERE s.is_active = 1 AND s.public = 1
Here, we're ensuring that the JOIN is done with the covering index instead of the primary key, so that the covering index can be used for the WHERE clause as well.
I also changed COUNT(*) to COUNT(s.user_id). Though MySQL should be smart enough to pick the column from the index, I explicitly named the column just in case.
Ensure that you have enough memory configured on the server so that all of your indexes can stay in memory.
If you're still having issues, please post the results of EXPLAIN.

Perhaps write it as a stored procedure or view... You could also try selecting all the IDs first then running the count on the result... if you do it all as one query it may be faster. Generally optimisation is done by using nested selects or making the server do the work so in this context that is all I can think of.
SELECT Count(*) FROM
(SELECT song.user_id FROM
(SELECT * FROM song WHERE song.is_active = 1 AND song.public = 1) as t
JOIN user AS u
ON(t.user_id = u.user_id))
Also be sure you are using the correct kind of join.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008