How would I make this query run faster...?
SELECT account_id,
account_name,
account_update,
account_sold,
account_mds,
ftp_url,
ftp_livestatus,
number_digits,
number_cw,
client_name,
ppc_status,
user_name
FROM
Accounts,
FTPDetails,
SiteNumbers,
Clients,
PPC,
Users
WHERE Accounts.account_id = FTPDetails.ftp_accountid
AND Accounts.account_id = SiteNumbers.number_accountid
AND Accounts.account_client = Clients.client_id
AND Accounts.account_id = PPC.ppc_accountid
AND Accounts.account_designer = Users.user_id
AND Accounts.account_active = 'active'
AND FTPDetails.ftp_active = 'active'
AND SiteNumbers.number_active = 'active'
AND Clients.client_active = 'active'
AND PPC.ppc_active = 'active'
AND Users.user_active = 'active'
ORDER BY
Accounts.account_update DESC
Thanks in advance :)
EXPLAIN query results:
I don't really have any foreign keys set up...I was trying to avoid making alterations to the database as will have to do a complete overhaul soon.
only primary keys are the id of each table e.g. account_id, ftp_id, ppc_id ...
Indexes
You need - at least - an index on every field that is used in a JOIN condition.
Indexes on the fields that appear in WHERE or GROUP BY or ORDER BY clauses are most of the time useful, too.
When in a table, two or more fields are used in JOIns (or WHERE or GROUP BY or ORDER BY), a compound (combined) index of these (two or more) fields may be better than separate indexes. For example in the SiteNumbers table, possible indexes are the compound (number_accountid, number_active) or (number_active, number_accountid).
Condition in fields that are Boolean (ON/OFF, active/inactive) are sometimes slowing queries (as indexes are not selective and thus not very helpful). Restructuring (father normalizing) the tables is an option in that case but probably you can avoid the added complexity.
Besides the usual advice (examine the EXPLAIN plan, add indexes where needed, test variations of the query),
I notice that in your query there is a partial Cartesian Product. The table Accounts has a one-to-many relationships to three tables FTPDetails, SiteNumbers and PPC. This has the effect that if you have for example 1000 accounts, and every account is related to, say, 10 FTPDetails, 20 SiteNumbers and 3 PPCs, the query will return for every account 600 rows (the product of 10x20x3). In total 600K rows where many data are duplicated.
You could instead split the query into three plus one for base data (Account and the rest tables). That way, only 34K rows of data (having smaller length) would be transfered :
Accounts JOIN Clients JOIN Users
(with all fields needed from these tables)
1K rows
Accounts JOIN FTPDetails
(with Accounts.account_id and all fields from FTPDetails)
10K rows
Accounts JOIN SiteNumbers
(with Accounts.account_id and all fields from SiteNumbers)
20K rows
Accounts JOIN PPC
(with Accounts.account_id and all fields from PPC)
3K rows
and then use the data from the 4 queries in the client side to show combined info.
I would add the following indexes:
Table Accounts
index on (account_designer)
index on (account_client)
index on (account_active, account_id)
index on (account_update)
Table FTPDetails
index on (ftp_active, ftp_accountid)
Table SiteNumbers
index on (number_active, number_accountid)
Table PPC
index on (ppc_active, ppc_accountid)
Use EXPLAIN to find out which index could be used and which index is actually used. Create an appropriate index if necessary.
If FTPDetails.ftp_active only has the two valid entries 'active' and 'inactive', use BOOL as data type.
As a side note: I strongly suggest using explicit joins instead of implicit ones:
SELECT
account_id, account_name, account_update, account_sold, account_mds,
ftp_url, ftp_livestatus,
number_digits, number_cw,
client_name,
ppc_status,
user_name
FROM Accounts
INNER JOIN FTPDetails
ON Accounts.account_id = FTPDetails.ftp_accountid
AND FTPDetails.ftp_active = 'active'
INNER JOIN SiteNumbers
ON Accounts.account_id = SiteNumbers.number_accountid
AND SiteNumbers.number_active = 'active'
INNER JOIN Clients
ON Accounts.account_client = Clients.client_id
AND Clients.client_active = 'active'
INNER JOIN PPC
ON Accounts.account_id = PPC.ppc_accountid
AND PPC.ppc_active = 'active'
INNER JOIN Users
ON Accounts.account_designer = Users.user_id
AND Users.user_active = 'active'
WHERE Accounts.account_active = 'active'
ORDER BY Accounts.account_update DESC
This makes the query much more readable because the join condition is close to the name of the table that is being joined.
EXPLAIN, benchmark different options. For starters, I'm sure that several queries will be faster than this monster. First, because query optimiser will spend a lot of time examining what join order is the best (5!=120 possibilities). And second, queries like SELECT ... WHERE ....active = 'active' will be cached (though it depends on an amount of data changes).
One of your main problems is here: x.y_active = 'active'
Problem: low cardinality
The active field is a boolean field with 2 possible values, as such it has very low cardinality.
MySQL (or any SQL for that matter will not use an index when 30% or more of the rows have the same value).
Forcing the index is useless because it will make your query slower, not faster.
Solution: partition your tables
A solution is to partition your tables on the active columns.
This will exclude all non-active fields from consideration, and will make the select act as if you actually have a working index on the xxx-active fields.
Sidenote
Please don't ever use implicit where joins, it's much too error prone and consufing to be useful.
Use a syntax like Oswald's answer instead.
Links:
Cardinality: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
Cardinality and indexes: http://www.bennadel.com/blog/1424-Exploring-The-Cardinality-And-Selectivity-Of-SQL-Conditions.htm
MySQL partitioning: http://dev.mysql.com/doc/refman/5.5/en/partitioning.html
Related
I've got a query which is taking a long time and I was wondering if there was a better way to do it? Perhaps with joins?
It's currently taking ~2.5 seconds which is way too long.
To explain the structure a little: I have products, "themes" and "categories". A product can be assigned any number of themes or categories. The themeitems and categoryitems tables are linking tables to link a category/theme ID to a product ID.
I want to get a list of all products with at least one theme and category. The query I've got at the moment is below:
SELECT *
FROM themes t, themeitems ti, products p, catitems ci, categories c
WHERE t.ID = ti.THEMEID
AND ti.PRODID = p.ID
AND p.ID = ci.PRODID
AND ci.CATID = c.ID
I'm only actually selecting the rows I need when performing the query but I've removed that to abstract a little.
Any help in the right direction would be great!
Edit: EXPLAIN below
Utilise correct JOINs and ensure there are indexes on the fields used in the JOIN is the standard response for this issue.
SELECT *
FROM themes t
INNER JOIN themeitems ti ON t.ID = ti.THEMEID
INNER JOIN products p ON ti.PRODID = p.ID
INNER JOIN catitems ci ON p.ID = ci.PRODID
INNER JOIN categories c ON ci.CATID = c.ID
The specification of the JOINs assists the query engine in working out what it needs to do, and the indexes on the columns used in the join, will enable more rapid joining.
Your query is slow because you don't have any indexes on your tables.
Try:
create unique index pk on themes (ID)
create index fk on themeitems(themeid, prodid)
create unique index pk on products (id)
create index fk catitems(prodid, catid)
create unique index pk on categories (id)
As #symcbean writes in the comments, the catitems and themeitems indices should probably be unique indices too - if there isn't another column to add to that index (e.g. "validityDate"), please add that to the create statement.
Your query is very simple. I do not think that your cost decreases with implementing joins. You can try putting indexes to appropriate columns
Simply selecting less data is the glaringly obvious solution here.
Why do you need to know every column and every row every time you run the query? Addressing any one of these 3 factors will improve performance.
I want to get a list of all products with at least one theme and category
That rather implies you don't care which theme and category, in which case.....
SELECT p.*
FROM themeitems ti, products p, catitems ci
WHERE p.ID = ti.PRODID
AND p.ID = ci.PRODID
It may be possible to make the query run significantly faster - but you've not provided details of the table structure, the indexes, the volume of data, the engine type, the query cache configuration, the frequency of data updates, the frequency with which the query is run.....
update
Now that you've provided the explain plan then it's obvious you've got very small amounts of data AND NO RELEVENT INDEXES!!!!!
As a minimum you should add indexes on the product foreign key in the themeitems and catitems tables. Indeed, the primary keys for these tables should be the product id and category id / theme id, and since it's likely that you will have more products than categories or themes then the fields should be in that order in the indexes. (i.e. PRODID,CATID rather than CATID, PRODID)
update2
Given the requirement "to get a list of all products with at least one theme and category", it might be faster still (but the big wins are reducing the number of joins and adding the right indexes) to....
SELECT p.*
FROM product p
INNER JOIN (
SELECT DISTINCT ti.PRODID
FROM themeitems ti, catitems ci
WHERE ti.PRODID=ci.PRODID
) i ON p.id=i.PRODID
Ive made an answer off this because i could not place it as an comment
Basic thumb off action if you want to remove FULL table scans with JOINS.
You should index first.
Note that this not always works with ORDER BY/GROUP BY in combination with JOINS, because often an Using temporary; using filesort is needed.
Extra because this is out off the scope off the question and how to fix slow query with ORDER BY/GROUP BY in combination with JOIN
Because the MySQL optimizer thinks it needs to access the smallest table first to get the best execution what will cause MySQL cant always use indexes to sort the result and needs to use an temporary table and the filesort the fix the wrong sort ordering
(read more about this here MySQL slow query using filesort this is how i fix this problem because using temporary really can kill performance when MySQL needs an disk based temporary table)
I have a query on a fact table "foo_success" in a star schema, which has about 6 million rows. This table holds (integer) references to dimension tables and nothing else. We use MyISAM as storage engine.
The query:
SELECT
hierarchy.level0name,
hierarchy.level1name,
hierarchy.level0,
hierarchy.level1,
date.date,
address.city,
user.emailAddress,
foo_object.name,
foo_object.type,
user_group.groupId,
COUNT(user.id) AS count_user_id,
SUM(foo_object_statistic.passes) AS sum_foo_object_statistic_passes,
SUM(foo_object_statistic.starts) AS sum_foo_object_statistic_starts,
SUM(foo_object_statistic.calls) AS sum_foo_object_statistic_calls
FROM
foo_success,
user,
user_group,
address,
hierarchy,
foo_object,
foo_object_statistic,
date
WHERE (foo_success.userDimensionId = user.id)
AND (foo_success.userGroupDimensionId = user_group.id)
AND (foo_success.addressDimensionId = address.id)
AND (foo_success.hierarchyDimensionId = hierarchy.id)
AND (foo_success.fooObjectDimensionId = foo_object.id)
AND (foo_success.fooObjectStatisticDimensionId = foo_object_statistic.id)
AND (foo_success.dateDimensionId=date.id)
AND hierarchy.level0 = 'XYZ'
AND hierarchy.level1 IS NOT NULL
AND hierarchy.level2 IS NOT NULL
AND hierarchy.level3 IS NOT NULL
AND hierarchy.level4 IS NOT NULL
AND hierarchy.level5 IS NOT NULL
AND hierarchy.level6 IS NULL
AND hierarchy.level7 IS NULL
GROUP BY hierarchy.level0, foo_object.fooObjectId
LIMIT 0, 25;
What I've tried so far:
This is the simple join version, which equals the INNER JOIN alternative in speed.
There are indices on all fields which are joined or which are part of a condition.
I did use EXPLAIN on this query and found that the query cost (# of processed rows) is 128596 for the table user and 77 for the table foo_success.
I tried to remove the dependency on the user table, which leads to a # of processed rows of over 6 million in the fact table foo_success.
It takes about 1,5 minutes to finish this query, which is far off my expectations for a data warehouse star schema optimized on read speed. Is there any way I can optimize this monster?
The inefficiency of the query mostly comes from transfering a lot of data you do not actually use: the fields hierarchy.level1name, hierarchy.level0name, hierarchy.level1, date.date, address.city, user.emailAddress, foo_object.name, foo_object.type, user_group.groupId are not included in GROUP BY clause, which means that the information is retrieved for each row, loaded in memory and then just discarded.
What I would recommend is to concentrate retrieving of all sufficient ids and aggregation results in a subquery and then join to the rest of the tables, so that each join would produce not more than a single row (you can even move the LIMIT clause in the subquery to minimize the required subsequent JOIN operations). After that, you may discover, that you do not have some useful indexes.
SELECT COUNT(*)
FROM song AS s
JOIN user AS u
ON(u.user_id = s.user_id)
WHERE s.is_active = 1 AND s.public = 1
The s.active and s.public are index as well as u.user_id and s.user_id.
song table row count 310k
user table row count 22k
Is there a way to optimize this? We're getting 1 second query times on this.
Ensure that you have a compound "covering" index on song: (user_id, is_active, public). Here, we've named the index covering_index:
SELECT COUNT(s.user_id)
FROM song s FORCE INDEX (covering_index)
JOIN user u
ON u.user_id = s.user_id
WHERE s.is_active = 1 AND s.public = 1
Here, we're ensuring that the JOIN is done with the covering index instead of the primary key, so that the covering index can be used for the WHERE clause as well.
I also changed COUNT(*) to COUNT(s.user_id). Though MySQL should be smart enough to pick the column from the index, I explicitly named the column just in case.
Ensure that you have enough memory configured on the server so that all of your indexes can stay in memory.
If you're still having issues, please post the results of EXPLAIN.
Perhaps write it as a stored procedure or view... You could also try selecting all the IDs first then running the count on the result... if you do it all as one query it may be faster. Generally optimisation is done by using nested selects or making the server do the work so in this context that is all I can think of.
SELECT Count(*) FROM
(SELECT song.user_id FROM
(SELECT * FROM song WHERE song.is_active = 1 AND song.public = 1) as t
JOIN user AS u
ON(t.user_id = u.user_id))
Also be sure you are using the correct kind of join.
I have a table 'Clients' and a sub-table 'Orders'.
For a certain view I need to display the last order for each client.
Since you cannot use LIMIT in a join, I first used a complex solution with a LEFT JOIN, GROUP_CONCAT and SUBSTRING_INDEX to get the last order, but this is quite slow, since there are millions of records.
Then I thought of just storing the last OrderID in the Clients table, that is updated by a trigger each time the Orders table changes. Then I just do a LEFT JOIN to Orders on this field LastOrderID.
Would an index on the field LastOrderID be of any use in this situation? Or wouldn't it be used since the source table is always Clients, so there is no sorting, searching, etc. done on this field ?
The reason I'm asking is that in reality it's a little bit more complex, I might actually need about 20 of these kind of fields.
update:
My query now is :
SELECT *
FROM Clients AS c
LEFT JOIN Orders AS o ON o.OrderID=c.LastOrderID
Would an index on LastOrderID in Clients improve speed, or is it not neccessary?
First of all, do you have an index on the Client foreign key within the Order table?
Doing this alone should increase performance quite considerably.
Perhaps your SQL is wrong?
This is standard SQL: you'll need a single two column index on (ClientID, OrderID) on the Orders table for this which will speed up the aggregate and self join
SELECT
...
FROM
(
SELECT MAX(OrderID) AS LastOrderID, ClientID
FROM Orders
GROUP BY ClientID
) o2
JOIN
Orders o ON o2.LastOrderID = o.ClientID AND o2.OrderID = o.ClientID
JOIN
Clients c PN o.ClientID = c.ClientID
so I have a 560mb db with the largest table 500mb(over 10 million rows)
my query hase to join 5 tables and takes about 10 seconds to finish....
SELECT DISTINCT trips.tripid AS tripid,
stops.stopdescrption AS "perron",
Date_format(segments.segmentstart, "%H:%i") AS "time",
Date_format(trips.tripend, "%H:%i") AS "arrival",
Upper(routes.routepublicidentifier) AS "lijn",
plcend.placedescrption AS "destination"
FROM calendar
JOIN trips
ON calendar.vsid = trips.vsid
JOIN routes
ON routes.routeid = trips.routeid
JOIN places plcstart
ON plcstart.placeid = trips.placeidstart
JOIN places plcend
ON plcend.placeid = trips.placeidend
JOIN segments
ON segments.tripid = trips.tripid
JOIN stops
ON segments.stopid = stops.stopid
WHERE stops.stopid IN ( 43914, 23899, 23925, 23908,
23913, 19899, 23871, 43902,
23876, 25563, 18956, 19912,
23889, 23861, 23879, 23884,
23856, 19920, 19898, 23916,
23894, 20985, 23930, 20932,
20986, 22434, 20021, 19893,
19903, 19707, 19935 )
AND calendar.vscdate = Str_to_date('25-10-2011', "%e-%c-%Y")
AND segments.segmentstart >= Str_to_date('15:56', "%H:%i")
AND routes.routeservicetype = 0
AND segments.segmentstart > "00:00:00"
ORDER BY segments.segmentstart
what are things I can do to speed this up? any tips are welcome, i'm pretty new to sql...
but I can't change the structure of the db because it's not mine...
Use EXPLAIN to find the bottlenecks: http://dev.mysql.com/doc/refman/5.0/en/explain.html
Then perhaps, add indexes.
If you don't need to select ALL rows, use LIMIT to limit returned result count.
Just looking at the query, I would say that you should make sure that you have indexes on trips.vsid, calendar.vscdate, segments.segmentstart and routes.routeservicetype. I assume that there is already indexes on all the primary keys in the tables.
Using explain as Briedis suggested would show you how well the indexes work.
You might want to add covering indexes for some tables, like for example an index on trips.vsid where tripid and routeid are included. That way the database can use only the index for the data that is needed from the table, and not read from the actual table.
Edit:
The execution plan tells you that it successfully uses indexes for everything except the segments table, where it does a table scan and filters by the where condition. You should try to make a covering index for segments.segmentstart by including tripid and stopid.
Try adding a clusters index to the routes table on both routeservicetype and routeid.
Depending on the frequency of the data within the routeservicetype field, you may get an improvement by shrinking the amount of data being compared in the join to the trips table.
Looking at the explain plan, you may also want to force the sequence of the table usage by using STRAIGHT_JOIN instead of JOIN (or INNER JOIN), as I've had real improvements with this technique.
Essentially, put the table with the smallest row-count of extracted data at the beginning of the query, and the largest row count table at the end (in this case possibly the segments table?), with the exception of simple lookups (eg. for descriptions).
You may also consider altering the WHERE clause to filter the segments table on stopid instead of the stops table, and creating a clustered index on the segments table on (stopid, tripid and segmentstart) - this index will be effectively able to satisfy two joins and two where clauses from a single index...
To build the index...
ALTER TABLE segments ADD INDEX idx_qry_helper ( stopid, tripid, segmentstart );
And the altered WHERE clause...
WHERE segments.stopid IN ( 43914, 23899, 23925, 23908,
23913, 19899, 23871, 43902,
23876, 25563, 18956, 19912,
23889, 23861, 23879, 23884,
23856, 19920, 19898, 23916,
23894, 20985, 23930, 20932,
20986, 22434, 20021, 19893,
19903, 19707, 19935 )
:
:
At the end of the day, a 10 second response for what appears to be a complex query on a fairly large dataset, isn't all that bad!