We have a e-store and in this e-store there is many complicated links between categories and products.
I'm using Taxonomy table in order to store relations between Products-Categories and Products-Products as sub product.
Products may be member of more than one category.
Products may be a sub product a sub product of an other product. (May be more than one)
Products may be a module of an other product (May be more than one)
aliases of query :
pr-Product
ct-Category
sp-Sub Product
md-Module
Select pr.*,ifnull(sp.destination_id,0) as `top_id`,
ifnull(ct.destination_id,0) as `category_id`
from Products as pr
Left join Taxonomy as ct
on (ct.source_id=pr.id and ct.source='Products' and ct.destination='Categories')
Left join Taxonomy as sp
on (sp.source_id=pr.id and sp.source='Products' and sp.destination='Products' and sp.type='TOPID')
Left join Modules as md
on(pr.id = md.product_id)
where pr.deleted=false
and ct.destination_id='47'
and sp.destination_id is null
and md.product_id is null
order by pr.order,pr.sub_order
With this query; I'm trying to get all products under Category_id=47 and not module of any product and not sub product of any product.
This query takes 23 seconds.
There is 7.820 Records in Products, 3.200 Records in Modules and 19.000 records in Taxonomy
I was going to say that MySQL can only use one index per query but it looks like that is no longer the case. I also came across this in another answer:
http://dev.mysql.com/doc/mysql/en/index-merge-optimization.html
However that may not help you.
In the past, when I've come across queries MySQL couldn't optimised I've settled for precomputing answers in another table using a background job.
What you're trying to do looks like a good fit for a graph database like neo4j.
MySQL's optimizer is known to be bad in changing Outer to Inner joins automatically, it does the outer join first and then starts to filter data.
In your case the join between Products and Taxonomy can be rewritten as an Inner Join (there's a WHERE-condition on ct.destination_id='47').
Try if this changes the execution plan and improves performance.
Related
I have ten tables (Product_A, Product_B, Product_C, etc.), each of them having a primary key pointing to a row in a the parent table Product.
Basically, I have applied the recommendations coming from the SQL antipattern book from Bill Karwin (this antipattern solution described here:
https://fr.slideshare.net/billkarwin/practical-object-oriented-models-in-sql/34-Polymorphic_Assocations_Exclusive_Arcs_Referential )
In order to load a child product, I use something like this:
SELECT * FROM Product
LEFT JOIN Product_A USING (product_id)
LEFT JOIN Product_B USING (product_id)
LEFT JOIN Product_C USING (product_id)
LEFT JOIN Product_D USING (product_id)
WHERE product_id = 1337
etc.
I fear that the more types of child table products I get, the more JOIN clause I will have to add, causing the query to end up incredibly slow.
Is using LEFT JOIN in order to prevent polymorphic associations antipattern still a solution if we work on tens of sub child tables?
Should I start thinking of up using a query on parent table Product in order to grab a "product_type" and then execute another query on the appropriate child table depending on the value stored in the "product_type" column in the parent table?
Update: first replies on this topic state that this is bad design and that I should create a single table combining the colums from the child tables. But each product type has its own attributes. To say it otherwise: "A TV might have a pixel count, but that wouldn't make much sense for a blender." #TomH
Thank you
MySQL has a hard limit on the number of joins. The limit is 61 joins, and it's not configurable (I looked at the source code and it's really just hard-coded). So if you have more than 62 product types, this is not going to work in a single query.
If the data were stored in the structure you describe, I would run a separate query per product type, so you don't make too many joins.
Or do a query against the Product table first, and then additional queries to the product-type specific tables if you need details.
For example, when would you need to gather the product-specific details all at once? On some kind of search page? Do you think you could design your code to show only the attributes from your primary Product table on the search page?
Then only if a user clicks on a specific product, you'd go to a different page to display detailed information. Or if not a different page, maybe it'd be a dynamic HTML thing where you expand a "+" button to fetch detailed info, and each time you do that, run an AJAX request for the details.
Yes, you can use the product_type (so called "discriminator") to help the DBMS produce a better query plan and avoid unnecessary joins. You can do something like this:
SELECT
*
FROM
Product
LEFT JOIN Product_A
ON product_type = 1 -- Or whatever is the actual value in your case.
AND Product.product_id = Product_A.product_id
LEFT JOIN Product_B
ON product_type = 2
AND Product.product_id = Product_B.product_id
LEFT JOIN Product_C
ON product_type = 3
AND Product.product_id = Product_C.product_id
LEFT JOIN Product_D
ON product_type = 4
AND Product.product_id = Product_D.product_id
WHERE
Product.product_id = 1337
The DBMS should be able to short-circuit all "branches" that don't have the right product_type and avoid the corresponding joins.1
Whether this is actually better than using a separate query to fetch product_type and then choosing the corresponding "special" query (and incurring another database round-trip) - that's something you should test. As always, test on the representative amounts of data!
1 At least Oracle or SQL Server would do that - please check for MySQL!
What kind of data is going into these tables? Is it simply metadata about the products? If that's the case you could create a tall table that describes each product.
For example, a Product_Details table that has three columns: product_id, product_data_key, value. Where product_data_key is what used to be the columns in Product_A, Product_B, Product_C...
You could even have a separate table that better describes product_data_key so it's just a foreign key in Product_Details.
Maybe change your design? One product can have many attributes (and many of the same attributes), and those attributes have values.
I suggest three tables:
Products ProductsAttributes Attributes
-Product_Id -Product_Id -Attribute_Id
-... -Attribute_Id -Attribute_Name
-Value -...
-...
Using as such:
SELECT p.Product_Id, a.Attribute_Name, pa.Value FROM Products p
JOIN ProductsAttributes pa ON pa.Product_Id = p.ProductId
JOIN Attributes a ON a.Attribute_Id = pa.Attribute_Id
Then you can reuse attributes, tie them to products, and store their values. Each product only has the attributes that it needs.
Lets say I got a table Products and I got 3-4 other tables(Comments, Pictures, Orders and so on) connected to Products with one-to-many relationship.
How should I select one product and its connected entries from the other tables(Comments, Pictures, Orders) in a easy manner?
I tried using left joins connecting the other tables but i get duplicated entries. This will also get worse if i wanted to select many products instead of one.
I was also thinking of one query for each related table but isn't this too slow?
If I understand you correctly, you have e.g.
1 Product
4 comments for said product
10 pictures and
5 orders
and if you do a left join you get 4*10*5=200 results with lots of duplicate comments, pictures and orders. But you only want one row per comment, picture and order.
You will need a separate query for each related table. If there was a way around that, it would be more complex and slower than the separate solution.
You are not getting duplicate rows, you are getting the correct result of performing a one to many join.
Read up on:
http://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators
http://dev.mysql.com/doc/refman/5.0/en/join.html
If you want one row per product entry consider using an aggregate function
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
select product.id, product.name, group_concat(comment.text), avg(rating.value)
from product
left join comment on comment.product_id=product.id
left join rating on rating.product_id=product.id
group by product.id, product.name
I've got a query which is taking a long time and I was wondering if there was a better way to do it? Perhaps with joins?
It's currently taking ~2.5 seconds which is way too long.
To explain the structure a little: I have products, "themes" and "categories". A product can be assigned any number of themes or categories. The themeitems and categoryitems tables are linking tables to link a category/theme ID to a product ID.
I want to get a list of all products with at least one theme and category. The query I've got at the moment is below:
SELECT *
FROM themes t, themeitems ti, products p, catitems ci, categories c
WHERE t.ID = ti.THEMEID
AND ti.PRODID = p.ID
AND p.ID = ci.PRODID
AND ci.CATID = c.ID
I'm only actually selecting the rows I need when performing the query but I've removed that to abstract a little.
Any help in the right direction would be great!
Edit: EXPLAIN below
Utilise correct JOINs and ensure there are indexes on the fields used in the JOIN is the standard response for this issue.
SELECT *
FROM themes t
INNER JOIN themeitems ti ON t.ID = ti.THEMEID
INNER JOIN products p ON ti.PRODID = p.ID
INNER JOIN catitems ci ON p.ID = ci.PRODID
INNER JOIN categories c ON ci.CATID = c.ID
The specification of the JOINs assists the query engine in working out what it needs to do, and the indexes on the columns used in the join, will enable more rapid joining.
Your query is slow because you don't have any indexes on your tables.
Try:
create unique index pk on themes (ID)
create index fk on themeitems(themeid, prodid)
create unique index pk on products (id)
create index fk catitems(prodid, catid)
create unique index pk on categories (id)
As #symcbean writes in the comments, the catitems and themeitems indices should probably be unique indices too - if there isn't another column to add to that index (e.g. "validityDate"), please add that to the create statement.
Your query is very simple. I do not think that your cost decreases with implementing joins. You can try putting indexes to appropriate columns
Simply selecting less data is the glaringly obvious solution here.
Why do you need to know every column and every row every time you run the query? Addressing any one of these 3 factors will improve performance.
I want to get a list of all products with at least one theme and category
That rather implies you don't care which theme and category, in which case.....
SELECT p.*
FROM themeitems ti, products p, catitems ci
WHERE p.ID = ti.PRODID
AND p.ID = ci.PRODID
It may be possible to make the query run significantly faster - but you've not provided details of the table structure, the indexes, the volume of data, the engine type, the query cache configuration, the frequency of data updates, the frequency with which the query is run.....
update
Now that you've provided the explain plan then it's obvious you've got very small amounts of data AND NO RELEVENT INDEXES!!!!!
As a minimum you should add indexes on the product foreign key in the themeitems and catitems tables. Indeed, the primary keys for these tables should be the product id and category id / theme id, and since it's likely that you will have more products than categories or themes then the fields should be in that order in the indexes. (i.e. PRODID,CATID rather than CATID, PRODID)
update2
Given the requirement "to get a list of all products with at least one theme and category", it might be faster still (but the big wins are reducing the number of joins and adding the right indexes) to....
SELECT p.*
FROM product p
INNER JOIN (
SELECT DISTINCT ti.PRODID
FROM themeitems ti, catitems ci
WHERE ti.PRODID=ci.PRODID
) i ON p.id=i.PRODID
Ive made an answer off this because i could not place it as an comment
Basic thumb off action if you want to remove FULL table scans with JOINS.
You should index first.
Note that this not always works with ORDER BY/GROUP BY in combination with JOINS, because often an Using temporary; using filesort is needed.
Extra because this is out off the scope off the question and how to fix slow query with ORDER BY/GROUP BY in combination with JOIN
Because the MySQL optimizer thinks it needs to access the smallest table first to get the best execution what will cause MySQL cant always use indexes to sort the result and needs to use an temporary table and the filesort the fix the wrong sort ordering
(read more about this here MySQL slow query using filesort this is how i fix this problem because using temporary really can kill performance when MySQL needs an disk based temporary table)
I have a table structure like the following:
user
id
name
profile_stat
id
name
profile_stat_value
id
name
user_profile
user_id
profile_stat_id
profile_stat_value_id
My question is:
How do I evaluate a query where I want to find all users with profile_stat_id and profile_stat_value_id for many stats?
I've tried doing an inner self join, but that quickly gets crazy when searching for many stats. I've also tried doing a count on the actual user_profile table, and that's much better, but still slow.
Is there some magic I'm missing? I have about 10 million rows in the user_profile table and want the query to take no longer than a few seconds. Is that possible?
Typically databases are able to handle 10 million records in a decent manner. I have mostly used oracle in our professional environment with large amounts of data (about 30-40 million rows also) and even doing join queries on the tables has never taken more than a second or two to run.
On IMPORTANT lessson I realized whenever query performance was bad was to see if the indexes are defined properly on the join fields. E.g. Here having index on profile_stat_id and profile_stat_value_id (user_id I am assuming is the primary key) should have indexes defined. This will definitely give you a good performance increaser if you have not done that.
After defining the indexes do run the query once or twice to give DB a chance to calculate the index tree and query plan before verifying the gain
Superficially, you seem to be asking for this, which includes no self-joins:
SELECT u.name, u.id, s.name, s.id, v.name, v.id
FROM User_Profile AS p
JOIN User AS u ON u.id = p.user_id
JOIN Profile_Stat AS s ON s.id = p.profile_stat_id
JOIN Profile_Stat_Value AS v ON v.id = p.profile_stat_value_id
Any of the joins listed can be changed to a LEFT OUTER JOIN if the corresponding table need not have a matching entry. All this does is join the central User_Profile table with each of the other three tables on the appropriate joining column.
Where do you think you need a self-join?
[I have not included anything to filter on 'the many stats'; it is not at all clear to me what that part of the question means.]
MySQL setup: step by step.
programs -> linked to --> speakers (by program_id)
At this point, it's easy for me to query all the data:
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
Nice and easy.
The trick for me is this. My speakers table is also linked to a third table, "books." So in the "speakers" table, I have "book_id" and in the "books" table, the book_id is linked to a name.
I've tried this (including a WHERE you'll notice):
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
No results.
My questions:
What am I doing wrong?
What's the most efficient way to make this query?
Basically, I want to get back all the programs data and the books data, but instead of the book_id, I need it to come back as the book name (from the 3rd table).
Thanks in advance for your help.
UPDATE:
(rather than opening a brand new question)
The left join worked for me. However, I have a new problem. Multiple books can be assigned to a single speaker.
Using the left join, returns two rows!! What do I need to add to return only a single row, but separate the two books.
is there any chance that the books table doesn't have any matching columns for speakers.book_id?
Try using a left join which will still return the program/speaker combinations, even if there are no matches in books.
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
Btw, could you post the table schemas for all tables involved, and exactly what output (or reasonable representation) you'd expect to get?
Edit: Response to op author comment
you can use group by and group_concat to put all the books on one row.
e.g.
SELECT speakers.speaker_id,
speakers.speaker_name,
programs.program_id,
programs.program_name,
group_concat(books.book_name)
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
GROUP BY speakers.id
LIMIT 5
Note: since I don't know the exact column names, these may be off
That's typically efficient. There is some kind of assumption you are making that isn't true. Do your speakers have books assigned? If they don't that last JOIN should be a LEFT JOIN.
This kind of query is typically pretty efficient, since you almost certainly have primary keys as indexes. The main issue would be whether your indexes are covering (which is more likely to occur if you don't use SELECT *, but instead select only the columns you need).