Multiple left joins and performance - mysql

I have following tables:
products - 4500 records
Fields: id, sku, name, alias, price, special_price, quantity, desc, photo, manufacturer_id, model_id, hits, publishing
products_attribute_rel - 35000 records
Fields: id, product_id, attribute_id, attribute_val_id
attribute_values - 243 records
Fields: id, attr_id, value, ordering
manufacturers - 29 records
Fields: id, title,publishing
models - 946 records
Fields: id, manufacturer_id, title, publishing
So I get data from these tables by one query:
SELECT jp.*,
jm.id AS jm_id,
jm.title AS jm_title,
jmo.id AS jmo_id,
jmo.title AS jmo_title
FROM `products` AS jp
LEFT JOIN `products_attribute_rel` AS jpar ON jpar.product_id = jp.id
LEFT JOIN `attribute_values` AS jav ON jav.attr_id = jpar.attribute_val_id
LEFT JOIN `manufacturers` AS jm ON jm.id = jp.manufacturer_id
LEFT JOIN `models` AS jmo ON jmo.id = jp.model_id
GROUP BY jp.id HAVING COUNT(DISTINCT jpar.attribute_val_id) >= 0
This query is slow as hell. It takes hundreds of seconds mysql to handle it.
So how it would be possible to improve this query ? With small data chunks it works
perfectly well. But I guess everything ruins products_attribute_rel table, which
has 35000 records.
Your help would be appreciated.
EDITED
EXPLAIN results of the SELECT query:

The problem is that MySQL uses the join-type ALL for 3 tables. That means that MySQL performs 3 full table scans, puts every possibility together before sorting those out that don't match the ON statement. To get a much faster join-type (for instance eq_ref), you must put an index on the coloumns that are used on the ON statements.
Be aware though that putting an index on every possible coloumn is not recommended. A lot of indexes do speed up SELECT statements, however it also creates an overhead since the index must be stored and managed. This means that manipulation queries like UPDATE and DELETE are much slower. I've seen queries deleting only 1000 records in half an hour. It's a trade-off where you have to decide what happens more often and what is more important.
To get more infos on MySQL join-types, take a look at this.
More on indexes here.

Tables data is not so much huge that it's taking hundreds of seconds. Something is wrong with table schema. Please do proper indexing. That will surly speed up.

select distinct
jm.id AS jm_id,
jm.title AS jm_title,
jmo.id AS jmo_id,
jmo.title AS jmo_title
from products jp,
products_attribute_rel jpar,
attribute_values jav,
manufacturers jm
models jmo
where jpar.product_id = jp.id
and jav.attr_id = jpar.attribute_val_id
and jm.id = jp.manufacturer_id
and jmo.id = jp.model_id
you can do that if you want to select all the data. Hope it works.

Related

MySQL Slow performance with count() of matching records in a joined table

There is a table called "basket_status" in the query below. For each record in basket_status, a count of yarn balls in the basket is being made from another table (yarn_ball_updates).
The basket_status table has 761 rows. The yarn_ball_updates table has 1,204,294 records. Running the query below takes about 30 seconds to 60 seconds (depending on how busy the server is) and returns 750 rows. Obviously my problem is doing a match against 1,204,294 records for all of the 761 basket_status records.
I tried making a view based on the query but offered no performance increase. I believe I read that for views you can't have sub queries and complex joins.
What direction should I take to speed up this query? I've never made a MySQL scheduled task or anything, but it seems like the "basket_status" table should have a "yarn_ball_count" count already in it, and an automated process should be updating that new extra count() column maybe?
Thanks for any help or direction.
SELECT p.id, p.basket_name, p.high_quality, p.yarn_ball_count
FROM (
SELECT q.id, q.basket_name, q.high_quality,
CAST(SUM(IF (q.report_date = mxd.mxdate,1,0)) AS CHAR) yarn_ball_count
FROM (
SELECT bs.id, bs.basket_name, bs.high_quality,ybu.report_date
FROM yb.basket_status bs
JOIN yb.yarn_ball_updates ybu ON bs.basket_name = ybu.alpha_pmn
) q,
(SELECT MAX(ybu.report_date) mxdate FROM yb.yarn_ball_updates ybu) mxd
GROUP BY q.basket_name, q.high_quality ) p
I don't think you need nested queries for this. I'm not a MySQL developer but won't this work?
SELECT bs.id, bs.basket_name, bs.high_quality, count(*) yarn_ball_count
FROM yb.basket_status bs
JOIN yb.yarn_ball_updates ybu ON bs.basket_name = ybu.alpha_pmn
JOIN (SELECT MAX(ybu.report_date) mxdate FROM yb.yarn_ball_updates) mxd ON ybu.report_date = mxd.mxdate
GROUP BY bs.basket_name, bs.high_quality

fast way to get number of records in mysql

I'm writing a query in mysql to join two tables. And both tables have more than 50,000 records.
Table EMP Columns
empid,
project,
code,
Status
Table EMPINFO
empid,
project,
code,
projecttype,
timespent,
skills
In each table there is candidate key [empid, project, code]
So when I join the table using INNER join
like this INNER JOIN
ON a.empid = b.empid
and a.project = b.project
and a.code = b.code
I'm getting the result, but if I add count(*) in outer query to count number of records, it takes lot of time something connection gets failed.
Is there any way to speed up to get number of records ?
And I would like to hear more suggestions to speed up inner join query as well having same candidate key in both tables.
INDEX(empid, project, code) -- in any order.
Are these tables 1:1? If so, why do the JOIN in order to do the COUNT?
Please provide SHOW CREATE TABLE. (If there are datatype differences, this could be a big problem.)
Please provide the actual SELECT.
How much RAM do you have? Please provide SHOW VARIABLES LIKE '%buffer%';.

mySQL query performance with INNER JOINs

I have what may be a basic performance question. I've done a lot of SQL queries, but not much in terms of complex inner joins and such. So, here it is:
I have a database with 4 tables, countries, territories, employees, and transactions.
The transactions links up with the employees and countries. The employees links up with the territories. In order to produce a required report, I'm running a PHP script that processes a SQL query against a mySQL database.
SELECT trans.transactionDate, agent.code, agent.type, trans.transactionAmount, agent.territory
FROM transactionTable as trans
INNER JOIN
(
SELECT agent1.code as code, agent1.type as type, territory.territory as territory FROM agentTable as agent1
INNER JOIN territoryTable as territory
ON agent1.zip=territory.zip
) AS agent
ON agent.code=trans.agent
ORDER BY trans.agent
There are about 50,000 records in the agent table, and over 200,000 in the transaction table. The other two are relatively tiny. It's taking about 7 minutes to run this query. And I haven't even inserted the fourth table yet, which needs to relate a field in the transactionTable (country) to a field in the countryTable (country) and return a field in the countryTable (region).
So, two questions:
Where would I logically put the connection between the transactionTable and the countryTable?
Can anyone suggest a way that this can be quickened up?
Thanks.
Your query should be equivalent to this:
SELECT tx.transactionDate,
a.code,
a.type,
tx.transactionAmount,
t.territory
FROM transactionTable tx,
agentTable a,
territoryTable t
WHERE tx.agent = a.code
AND a.zip = t.zip
ORDER BY tx.agent
or to this if you like to use JOIN:
SELECT tx.transactionDate,
a.code,
a.type,
tx.transactionAmount,
t.territory
FROM transactionTable tx
JOIN agentTable a ON tx.agent = a.code
JOIN territoryTable t ON a.zip = t.zip
ORDER BY tx.agent
In order to work fast, you must have following indexes on your tables:
CREATE INDEX transactionTable_agent ON transactionTable(agent);
CREATE INDEX territoryTable_zip ON territoryTable(zip);
CREATE INDEX agentTable_code ON agentTable(code);
(basically any field that is part of WHERE or JOIN constraint should be indexed).
That said, your table structure looks suspicious in a sense that it is joined by apparently non-unique fields like zip code. You really want to join by more unique entities, like agent id, transaction id and so on - otherwise expect your queries to generate a lot of redundant data and be really slow.
One more note: INNER JOIN is equivalent to simply JOIN, there is no reason to type redundant clause.

Optimize query with loads of IN entries

I have quite a massive query that I want to optimize, it consists of 1 table request and 5 table left joins.
This query takes 0.3428 sec to complete ( Results: 4,340 total, Query took 0.3428 sec)
I am working with about 10000 entries which will definitely grow.
Now the query by it self is not the problem it is the IN statements that is the biggest problem.
I have 2 IN statements
Both are in the WHERE statement
For this specific page load both have a big amount of ID's, 3344 amount of id entries Example: (99, 1, 5, 8458, ...)
Both IN statements will have the same set of 3344 ID's Example: ((cf.catid IN ( 99, 1, 5, 8458, ... ) AND cf.cid=c.id) OR p.category IN ( 99, 1, 5, 8458, ... ))
The query looks like this:
SELECT
p.id, c.id AS pCid, c.name AS cName, p.name, p.seo,
p.description AS pDescription, cd.description,
p.category, p.archive, cf.catid, cf.pid, p.order_nr,
c.order_nr AS cOrder, c.seo AS cSeo, cat.name AS catName,
cat.order_id, pr.price, pr.sale_price, pr.sale_expiry,
IF( pr.sale_price > 0, pr.sale_price, pr.price ) AS `oPrices`,
pr.member_price, p.`set`, p.get_the_look,
c.from_text_price, c.thumb, c.code AS colour_code,
p.code AS product_code, p.supplier_part_number,
p.oem_part_number, p.make, p.model, p.year, p.sub_model
FROM
products p
LEFT JOIN category_featured cf ON p.id=cf.pid
LEFT JOIN colours c ON c.pid=p.id
LEFT JOIN colour_descriptions cd ON c.id=cd.colour_id
LEFT JOIN category cat ON cat.id=p.category
LEFT JOIN pricing pr ON pr.cid=c.id
WHERE
(
(cf.catid IN ( .. 3344 ID entries .. ) AND cf.cid=c.id) OR p.category IN ( .. 3344 ID entries .. )
)
AND p.archive='0'
AND p.status='1' AND c.status='1'
AND c.archive='0'
AND cat.status IN (1,2)
GROUP BY `c`.`id`
ORDER BY `oPrices` DESC
Is there a better way to do a check for specific ID's in a table using the IN statement or maybe use a different check all together?
Speed is the main issue here, I want to achieve the best performance possible.
So far what I did and how some of the settings are set:
I created indexes for those tables (only the columns that are INT (integers) that are used in this query have indexes)
Some tables are MyISAM some are InnoDB (other tables that are not used in the query have a relation with a few tables that are in the query so they had to be InnoDB)
no relations between the tables in the query exist
to run the query I use PHP and MySQLI
Thanks
UPDATE!!!!
I noticed why the query is so slow the new column that I create, using the IF statement oPrices and then useage of "ORDER BY oPrices DESC" makes the query slow, once I remove it the query only takes 0.00009 of a sec which is amazing!!! But now I wont get the correctly ordered data and if even I do the ordering with PHP I will have to create a new pagination function which is not ideal.
IN can make a query very difficult to optimize as the index may not be used (you can verify this by using EXPLAIN). An alternative approach would be to load these IDs into a temporary table and then perform a JOIN.
From this link:
http://explainextended.com/2009/08/18/passing-parameters-in-mysql-in-list-vs-temporary-table/
We see that for a large list of parameters, passing them in a
temporary table is much faster that as a constant list, while for
small lists performance is almost the same.
Using a temporary table is the best way to pass large arrays of
parameters in MySQL.

How would I make this query run faster?

How would I make this query run faster...?
SELECT account_id,
account_name,
account_update,
account_sold,
account_mds,
ftp_url,
ftp_livestatus,
number_digits,
number_cw,
client_name,
ppc_status,
user_name
FROM
Accounts,
FTPDetails,
SiteNumbers,
Clients,
PPC,
Users
WHERE Accounts.account_id = FTPDetails.ftp_accountid
AND Accounts.account_id = SiteNumbers.number_accountid
AND Accounts.account_client = Clients.client_id
AND Accounts.account_id = PPC.ppc_accountid
AND Accounts.account_designer = Users.user_id
AND Accounts.account_active = 'active'
AND FTPDetails.ftp_active = 'active'
AND SiteNumbers.number_active = 'active'
AND Clients.client_active = 'active'
AND PPC.ppc_active = 'active'
AND Users.user_active = 'active'
ORDER BY
Accounts.account_update DESC
Thanks in advance :)
EXPLAIN query results:
I don't really have any foreign keys set up...I was trying to avoid making alterations to the database as will have to do a complete overhaul soon.
only primary keys are the id of each table e.g. account_id, ftp_id, ppc_id ...
Indexes
You need - at least - an index on every field that is used in a JOIN condition.
Indexes on the fields that appear in WHERE or GROUP BY or ORDER BY clauses are most of the time useful, too.
When in a table, two or more fields are used in JOIns (or WHERE or GROUP BY or ORDER BY), a compound (combined) index of these (two or more) fields may be better than separate indexes. For example in the SiteNumbers table, possible indexes are the compound (number_accountid, number_active) or (number_active, number_accountid).
Condition in fields that are Boolean (ON/OFF, active/inactive) are sometimes slowing queries (as indexes are not selective and thus not very helpful). Restructuring (father normalizing) the tables is an option in that case but probably you can avoid the added complexity.
Besides the usual advice (examine the EXPLAIN plan, add indexes where needed, test variations of the query),
I notice that in your query there is a partial Cartesian Product. The table Accounts has a one-to-many relationships to three tables FTPDetails, SiteNumbers and PPC. This has the effect that if you have for example 1000 accounts, and every account is related to, say, 10 FTPDetails, 20 SiteNumbers and 3 PPCs, the query will return for every account 600 rows (the product of 10x20x3). In total 600K rows where many data are duplicated.
You could instead split the query into three plus one for base data (Account and the rest tables). That way, only 34K rows of data (having smaller length) would be transfered :
Accounts JOIN Clients JOIN Users
(with all fields needed from these tables)
1K rows
Accounts JOIN FTPDetails
(with Accounts.account_id and all fields from FTPDetails)
10K rows
Accounts JOIN SiteNumbers
(with Accounts.account_id and all fields from SiteNumbers)
20K rows
Accounts JOIN PPC
(with Accounts.account_id and all fields from PPC)
3K rows
and then use the data from the 4 queries in the client side to show combined info.
I would add the following indexes:
Table Accounts
index on (account_designer)
index on (account_client)
index on (account_active, account_id)
index on (account_update)
Table FTPDetails
index on (ftp_active, ftp_accountid)
Table SiteNumbers
index on (number_active, number_accountid)
Table PPC
index on (ppc_active, ppc_accountid)
Use EXPLAIN to find out which index could be used and which index is actually used. Create an appropriate index if necessary.
If FTPDetails.ftp_active only has the two valid entries 'active' and 'inactive', use BOOL as data type.
As a side note: I strongly suggest using explicit joins instead of implicit ones:
SELECT
account_id, account_name, account_update, account_sold, account_mds,
ftp_url, ftp_livestatus,
number_digits, number_cw,
client_name,
ppc_status,
user_name
FROM Accounts
INNER JOIN FTPDetails
ON Accounts.account_id = FTPDetails.ftp_accountid
AND FTPDetails.ftp_active = 'active'
INNER JOIN SiteNumbers
ON Accounts.account_id = SiteNumbers.number_accountid
AND SiteNumbers.number_active = 'active'
INNER JOIN Clients
ON Accounts.account_client = Clients.client_id
AND Clients.client_active = 'active'
INNER JOIN PPC
ON Accounts.account_id = PPC.ppc_accountid
AND PPC.ppc_active = 'active'
INNER JOIN Users
ON Accounts.account_designer = Users.user_id
AND Users.user_active = 'active'
WHERE Accounts.account_active = 'active'
ORDER BY Accounts.account_update DESC
This makes the query much more readable because the join condition is close to the name of the table that is being joined.
EXPLAIN, benchmark different options. For starters, I'm sure that several queries will be faster than this monster. First, because query optimiser will spend a lot of time examining what join order is the best (5!=120 possibilities). And second, queries like SELECT ... WHERE ....active = 'active' will be cached (though it depends on an amount of data changes).
One of your main problems is here: x.y_active = 'active'
Problem: low cardinality
The active field is a boolean field with 2 possible values, as such it has very low cardinality.
MySQL (or any SQL for that matter will not use an index when 30% or more of the rows have the same value).
Forcing the index is useless because it will make your query slower, not faster.
Solution: partition your tables
A solution is to partition your tables on the active columns.
This will exclude all non-active fields from consideration, and will make the select act as if you actually have a working index on the xxx-active fields.
Sidenote
Please don't ever use implicit where joins, it's much too error prone and consufing to be useful.
Use a syntax like Oswald's answer instead.
Links:
Cardinality: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
Cardinality and indexes: http://www.bennadel.com/blog/1424-Exploring-The-Cardinality-And-Selectivity-Of-SQL-Conditions.htm
MySQL partitioning: http://dev.mysql.com/doc/refman/5.5/en/partitioning.html