How to optimize MySQL Views - mysql

I have some querys using views, and these run a lot slower than I would expect them to given all relevant tables are indexed (and not that large anyway).
I hope I can explain this:
My main Query looks like this (grossly simplified)
select [stuff] from orders as ord
left join calc_order_status as ors on (ors.order_id = ord.id)
calc_order_status is a view, defined thusly:
create view calc_order_status as
select ord.id AS order_id,
(sum(itm.items * itm.item_price) + ord.delivery_cost) AS total_total
from orders ord
left join order_items itm on itm.order_id = ord.id
group by ord.id
Orders (ord) contain orders, order_items contain the individual items associated with each order and their prices.
All tables are properly indexed, BUT the thing runs slowly and when I do a EXPLAIN I get
# id select_type table type possible_keys key key_len ref rows Extra
1 1 PRIMARY ord ALL customer_id NULL NULL NULL 1002 Using temporary; Using filesort
2 1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1002
3 1 PRIMARY cus eq_ref PRIMARY PRIMARY 4 db135147_2.ord.customer_id 1 Using where
4 2 DERIVED ord ALL NULL NULL NULL NULL 1002 Using temporary; Using filesort
5 2 DERIVED itm ref order_id order_id 4 db135147_2.ord.id 2
My guess is, "derived2" refers to the view. The individual items (itm) seem to work fine, indexed by order _ id. The problem seems to be Line # 4, which indicates that the system doesn't use a key for the orders table (ord). But in the MAIN query, the order id is already defined:
left join calc_order_status as ors on (ors.order _ id = ord.id)
and ord.id (both in the main query and within the view) refer to the primary key.
I have read somewhere than MySQL simpliy does not optimize views that well and might not utilize keys under some conditions even when available. This seems to be one of those cases.
I would appreciate any suggestions. Is there a way to force MySQL to realize "it's all simpler than you think, just use the primary key and you'll be fine"? Or are views the wrong way to go about this at all?

If it is at all possible to remove those joins remove them. Replacing them with subquerys will speed it up a lot.
you could also try running something like this to see if it has any speed difference at all.
select [stuff] from orders as ord
left join (
create view calc_order_status as
select ord.id AS order_id,
(sum(itm.items * itm.item_price) + ord.delivery_cost) AS total_total
from orders ord
left join order_items itm on itm.order_id = ord.id
group by ord.id
) as ors on (ors.order_id = ord.id)

An index is useful for finding a few rows in a big table, but when you query every row, an index just slows things down. So here MySQL probably expects to be using the whole [order] table, so it better not use an index.
You can try if it would be faster by forcing MySQL to use an index:
from orders as ord force index for join (yourindex)

Related

Can I optimise this MySQL query?

My SQL is
SELECT authors.*, COUNT(*) FROM authors
INNER JOIN resources_authors ON authors.author_id=resources_authors.author_id
WHERE
resource_id IN
(SELECT resource_id FROM resources_authors WHERE author_id = '1313')
AND authors.author_id != '1313'
GROUP BY authors.author_id`
I have indexes on all the fields in the query, but I still get a Using temporary; Using Filesort.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY authors ALL PRIMARY NULL NULL NULL 16025 Using where; Using temporary; Using filesort
1 PRIMARY resources_authors ref author_id author_id 4 authors.author_id 3 Using where
2 DEPENDENT SUBQUERY resources_authors unique_subquery resource_id,author_id,resource_id_2 resource_id 156 func,const 1 Using index; Using where
How can I improve my query, or table structure, to speed this query up?
There's an SQL Fiddle here, if you'd like to experiment: http://sqlfiddle.com/#!2/96d57/2/0
I would approach it a different way by doing a "PreQuery". Get a list of all authors who have have a common resource count to another author, but to NOT include the original author in the final list. Once those authors are determined, get their name/contact info and the total count of common resources, but not the SPECIFIC resources that were in common. That would be a slightly different query.
Now, the query. to help optimize the query, I would have two indexes on
one on just the (author_id)
another combination on (resource_id, author_id)
which you already have.
Now to explain the inner query. Do that part on its own first and you can see the execution plan will utilize the index. The intent here, the query starts with the resource authors but only cares about one specific author (where clause) which will keep this result set very short. That is IMMEDIATELY joined to the resource authors table again, but ONLY based on the same RESOURCE and the author IS NOT the primary one (from the where clause) giving you only those OTHER authors. By adding a COUNT(), we are now identifying how many for each respective offer have common resources, grouped by author returning one entry per author. Finally take that "PreQuery" result set (all records already prequalified above), and join to the authors. Get details and count() and done.
SELECT
A.*,
PreQuery.CommonResources
from
( SELECT
ra2.author_id,
COUNT(*) as CommonResources
FROM
resources_authors ra1
JOIN resources_authors ra2
ON ra1.resource_id = ra2.resource_id
AND NOT ra1.author_id = ra2.author_id
WHERE
ra1.author_id = 1313
GROUP BY
ra2.author_id ) PreQuery
JOIN authors A
ON PreQuery.author_id = A.author_id

why is this query so slow with MySQL?

Yesterday I found a slow query running on the server(this query costs more than 1 minute).It looks like this:
select a.* from a
left join b on a.hotel_id=b.hotel_id and a.hotel_type=b.hotel_type
where b.hotel_id is null
There are 40000+ rows in table a and 10000+ rows in table b.An unique key had already been created on columns hotel_id and hotel_type in table b like UNIQUE KEY idx_hotel_id (hotel_id,hotel_type).So I used the explain keyword to check the query plan on this sql and I got a result like the following:
type key rows
1 SIMPLE a ALL NULL NULL NULL NULL 36804
1 SIMPLE b index NULL idx_hotel_id 185 NULL 8353 Using where; Using index; Not exists
According to the reference manual of MySQL, when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index the join type will be "eq_ref".See the second row of the query plan,the value of column type is "index".But I really had en unique index on hotel_id and hotel_type and both the two columns were used by the join.The join type "ef_ref" is more efficient than the join type "ref" and "ref" is more efficient than "range"."index" is the last join type wo wanna hava except "ALL".This is what I'm confused about and I wanna know why the join type here is "index". I hope I describe my question clear and I'm looking forward to get answers from you guys,thanks!
Where Is Null checks can be slow, so maybe it is that.
select * from a
where not exists ( select 1 from b where a.hotel_id=b.hotel_id and a.hotel_type=b.hotel_type )
Also: how many records are you returning? If you are returning all 36804 records this could slow things down as well.
Thanks all the people above!I found the way to solve my problem myself.The columns hotel_id and hotel_type didn't have the same character set.After I made them both "utf8",my query returned result in about less than 10 millisecond.There is an good article about left join and index in MySQL,I strongly recommend it to you guys.Here is the site:http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/

Optimize the query with index

MySQL query taking 1.6 seconds for 40000 records in table
SELECT aggsm.topicdm_id AS topid,citydm.city_name
FROM AGG_MENTION AS aggsm
JOIN LOCATIONDM AS locdm ON aggsm.locationdm_id = locdm.locationdm_id
JOIN CITY AS citydm ON locdm.city_id = citydm.city_id
JOIN STATE AS statedm ON citydm.state_id = statedm.state_id
JOIN COUNTRY AS cntrydm ON statedm.country_id = cntrydm.country_id
WHERE cntrydm.country_id IN (1,2,3,4)
GROUP BY aggsm.topicdm_id,aggsm.locationdm_id
LIMIT 0,200000
I have 40000 to 50000 records in AGG_MENTION,LOCATIONDM,CITYDM tables....500records in STATEDM abd 4 records in COUNTRY table.
When i run above query it is taking 1.6 sec..Is there a way to optimize the query or index on which columns will improve the performance....
Following is the EXPLAIN output:
1 SIMPLE aggsm index agg_sm_locdm_fk_idx agg_sm_datedm_fk_idx 4 36313 Using index; Using temporary; Using filesort
1 SIMPLE locdm eq_ref PRIMARY,city_id_UNIQUE,locationdm_id_UNIQUE,loc_city_fk_idx PRIMARY 8 opinionleaders.aggsm.locationdm_id 1
1 SIMPLE citydm eq_ref PRIMARY,city_id_UNIQUE,city_state_fk_idx PRIMARY 8 opinionleaders.locdm.city_id 1
1 SIMPLE statedm eq_ref PRIMARY,state_id_UNIQUE,state_country_fk_idx PRIMARY 8 opinionleaders.citydm.state_id 1 Using where
1 SIMPLE cntrydm eq_ref PRIMARY,country_id_UNIQUE PRIMARY 8 opinionleaders.statedm.country_id 1 Using index
I would reverse the query and start with the STATE first as that is what your criteria is based upon. Since you are not actually doing anything with the country table (except the country ID)... This column also exists in the State table, so you can the State.Country_ID and remove the country table from the join.
Additionally, I would have the following indexes
Table Index
State (Country_ID) as that will be basis of your WHERE criteria.
City (State_ID, City_Name).
Location (City_ID)
Agg_Mention (LocationDM_ID, TopicDM_id).
By having the "City_Name" as part of the index, the query doesn't have to go to the actual page data for it. Since part of the index, it can use it directly.
Many times, the keyword "STRAIGHT_JOIN" included here helps optimizer to run query in the order stated so it doesn't try to take one of the other tables as its primary basis of querying the data. If that doesn't perform well, you can try it again without it.
SELECT STRAIGHT_JOIN
aggsm.topicdm_id AS topid,
citydm.city_name
FROM
STATE AS statedm
JOIN CITY AS citydm
ON statedm.state_id = citydm.state_id
JOIN LOCATIONDM AS locdm
ON citydm.city_id = locdm.city_id
join AGG_MENTION AS aggsm
ON locdm.locationdm_id = aggsm.locationdm_id
WHERE
statedm.country_id IN (1,2,3,4)
GROUP BY
aggsm.topicdm_id,
aggsm.locationdm_id
LIMIT 0,200000

mysql join optimization for big query

i have big query like this and i can't totally rebuild application because of customer:
SELECT count(AdvertLog.id) as count,AdvertLog.advert,AdvertLog.ut_fut_tstamp_dmy as day,
AdvertLog.operation,
Advert.allow_clicks,
Advert.slogan as name,
AdvertLog.log,
(User.tx_reality_credit
+-20
-(SELECT COUNT(advert_log.id) FROM advert_log WHERE ut_fut_tstamp_dmy <= day AND operation = 0 AND advert IN (168))
+(SELECT IF(ISNULL(SUM(log)),0,SUM(log)) FROM advert_log WHERE ut_fut_tstamp_dmy <= day AND operation IN (1, 2) AND advert = 40341 )) AS points
FROM `advert_log` AS AdvertLog
LEFT JOIN `tx_reality_advert` Advert ON Advert.uid = AdvertLog.advert
LEFT JOIN `fe_users` AS User ON (User.uid = Advert.user or User.uid = AdvertLog.advert)
WHERE User.uid = 40341 and AdvertLog.id>0
GROUP BY AdvertLog.ut_fut_tstamp_dmy, AdvertLog.advert
ORDER BY AdvertLog.ut_fut_tstamp_dmy_12 DESC,AdvertLog.operation,count DESC,name
LIMIT 0, 15
It takes 1.5s approximately which is too long.
Indexes:
User.uid
AdvertLog.advert
AdvertLog.operation
AdvertLog.advert
AdvertLog.ut_fut_tstamp_dmy
AdvertLog.id
Advert.user
AdvertLog.log
Output of Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY User const PRIMARY PRIMARY 4 const 1 Using temporary; Using filesort
1 PRIMARY AdvertLog range PRIMARY,advert PRIMARY 4 NULL 21427 Using where
1 PRIMARY Advert eq_ref PRIMARY PRIMARY 4 etrend.AdvertLog.advert 1 Using where
3 DEPENDENT SUBQUERY advert_log ref ut_fut_tstamp_dmy,operation,advert advert 5 const 1 Using where
2 DEPENDENT SUBQUERY advert_log index_merge ut_fut_tstamp_dmy,operation,advert advert,operation 5,2 NULL 222 Using intersect(advert,operation); Using where
Can anyone help me, because i tried different things but no improvements
The query is pretty large, and I'd expect this to take a fair bit of time, but you could try adding an index on Advert.uid, if it's not present. Other than that, someone with much better SQL-foo than I will have to answer this.
First, your WHERE clause is based on a specific "User.ID", yet there is an index on the Advert_Log by the Advert (user ID). So, first, change the WHERE clause to reflect this...
Where
AdverLog.Advert = 40341
Then, remove the "LEFT JOIN" to just a "JOIN" to the user table.
Finally (without a full rewrite of the query), I would tack on the "STRAIGHT_JOIN" keyword...
select STRAIGHT_JOIN
... rest of query ...
Which tells the optimizer to perform the query in the order / relations explicitly stated.
Another area to optimize would be to pre-query the "points" (counts and logs based on advert and operation) once and pull the answer from that (as a subquery) instead of it running through two queries)... but I'd be interested to know impact of above WHERE, JOIN and STRAIGHT_JOIN helps.
Additionally, looking at the join to the user table based on EITHER The Advert_Log.Advert (userID), or the TX_Reality_Credit.User (another user ID which does not appear to be the same since the join between Advert_Log and TX_Reality_Credit (TRC) is based on the TRC.UID) unless that is an incorrect assumption. This could possibly give erroneous results as you are testing for MULTIPLE User IDs... the advert user, and whoever else is the "user" from the "TRC" table... which would result in which user's credit is being applied to the "points" calculation.
To better understand the relationship and context, can you give some more clarification of what is IN these tables from the Advert_Log to TX_Reality_Credit perspective, and the Advert vs UID vs User...

MySQL: simple schema, joining in a view and sorting on unrelated attribute causes unbearable performance hit

I'm creating a database model for use by a diverse amount of applications and different kinds of database servers (though I'm mostly testing on MySQL and SQLite now). It's a really simple model that basically consists of one central matches table and many attribute tables that have the match_id as their primary key and one other field (the attribute value itself). Said in other words, every match has exactly one of every type of attribute and every attribute is stored in a seperate table. After experiencing some rather bad performance whilst sorting and filtering on these attributes (FROM matches LEFT JOIN attributes_i_want on primary index) I decided to try to improve it. To this end I added an index on every attribute value column. Sorting and filtering performance increased a lot for easy queries.
This simple schema is basically a requirement for the application, so it is able to auto-discover and use attributes. Thus, to create more complex attributes that are actually based on other results, I decided to use VIEWs that turn one or more other tables that don't necessarily match up to the attribute-like schema into an attribute-schema. I call these meta-attributes (they aren't directly editable either). However, to the application this is all transparant, and so it happily joins in the VIEW as well when it wants to. The problem: it kills performance. When the VIEW is joined in without sorting on any attribute, performance is still acceptable, but combining a retrieval of the VIEW with sorting is unacceptably slow (on the order of 1s). Even after reading quite a bit of tutorials on indexing and some questions here on stack overflow, I can't seem to help it.
_Prerequisites for a solution: in one way or another, num_duplicates must exist as a table or view with the columns match_id and num_duplicates to look like an attribute. I can't change the way attributes are discovered and used. So if I want to see num_duplicates appear in the application it'll have to be as some kind of view or materialized table that makes a num_duplicates table._
Relevant parts of the schema
Main table:
CREATE TABLE `matches` (
`match_id` int(11) NOT NULL,
`source_name` text,
`target_name` text,
`transformation` text,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB;
Example of a normal attribute (indexed):
CREATE TABLE `error` (
`match_id` int(11) NOT NULL,
`error` double DEFAULT NULL,
PRIMARY KEY (`match_id`),
KEY `error_index` (`error`)
) ENGINE=InnoDB;
(all normal attributes, like error, are basically the same)
Meta-attribute / VIEW:
CREATE VIEW num_duplicates
AS SELECT duplicate AS match_id, COUNT(duplicate) AS num_duplicates
FROM duplicate
GROUP BY duplicate
(this is the only meta-attribute I'm using right now)
Simple query with indexing on the attribute value columns (the part improved by indexes)
SELECT matches.match_id, source_name, target_name, transformation FROM matches
INNER JOIN error ON matches.match_id = error.match_id
ORDER BY error.error
(the performance on this query increased a lot because of the index on error)
(the runtime of this query is on the order of 0.0001 sec)
Slightly more complex queries and their runtimes including the meta-attribute (the still bad part)
SELECT
matches.match_id, source_name, target_name, transformation, STATUS , volume, error, COMMENT , num_duplicates
FROM matches
INNER JOIN STATUS ON matches.match_id = status.match_id
INNER JOIN error ON matches.match_id = error.match_id
LEFT JOIN num_duplicates ON matches.match_id = num_duplicates.match_id
INNER JOIN volume ON matches.match_id = volume.match_id
INNER JOIN COMMENT ON matches.match_id = comment.match_id
(runtime: 0.0263sec) <--- still acceptable
SELECT matches.match_id, source_name, target_name, transformation, STATUS , volume, error, COMMENT , num_duplicates
FROM matches
INNER JOIN STATUS ON matches.match_id = status.match_id
INNER JOIN error ON matches.match_id = error.match_id
LEFT JOIN num_duplicates ON matches.match_id = num_duplicates.match_id
INNER JOIN volume ON matches.match_id = volume.match_id
INNER JOIN COMMENT ON matches.match_id = comment.match_id
ORDER BY error.error
LIMIT 20, 20
(runtime: 0.8866 sec) <--- not acceptable (the query speed is exactly the same with the LIMIT as without the LIMIT, note: if I could get the version with the LIMIT to be fast that would already be a big win. I presume it has to scan the entire table and so the limit doesn't matter too much)
EXPLAIN of the last query
Of course I tried to solve it myself before coming here, but I must admit I'm not that good at these things and haven't found a way to remove the offending performance killer yet. I know it's most likely the using filesort but I don't know how to get rid of it.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY error index PRIMARY,match_id error_index 9 NULL 53909 Using index; Using temporary; Using filesort
1 PRIMARY COMMENT eq_ref PRIMARY PRIMARY 4 tangbig4.error.match_id 1
1 PRIMARY STATUS eq_ref PRIMARY PRIMARY 4 tangbig4.COMMENT.match_id 1 Using where
1 PRIMARY matches eq_ref PRIMARY PRIMARY 4 tangbig4.COMMENT.match_id 1 Using where
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2
1 PRIMARY volume eq_ref PRIMARY PRIMARY 4 tangbig4.matches.match_id 1 Using where
2 DERIVED duplicate index NULL duplicate_index 5 NULL 49222 Using index
By the way, the query without the sort, which still runs acceptably, is EXPLAIN'ed like this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY COMMENT ALL PRIMARY NULL NULL NULL 49610
1 PRIMARY error eq_ref PRIMARY,match_id PRIMARY 4 tangbig4.COMMENT.match_id 1
1 PRIMARY matches eq_ref PRIMARY PRIMARY 4 tangbig4.COMMENT.match_id 1
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2
1 PRIMARY STATUS eq_ref PRIMARY PRIMARY 4 tangbig4.COMMENT.match_id 1
1 PRIMARY volume eq_ref PRIMARY PRIMARY 4 tangbig4.matches.match_id 1 Using where
2 DERIVED duplicate index NULL duplicate_index 5 NULL 49222 Using index
Question
So, my question is if someone who know more about databases/MySQL is able to find me a way that I can use/research to increase the performance of my last query.
I've been thinking quite a lot about materialized views but they are not natively supported in MySQL and since I'm going for as wide a range of SQL servers as possible this might not be idea. I'm hoping maybe a change to the queries or views might help or possible an extra index.
EDIT: Some random thoughts I've been having about the query:
VERY FAST: joining all tables, excluding the VIEW, sorting
ACCEPTABLE: joining all tables, including the VIEW, no sorting
DOG SLOW: joining all tables, including the VIEW, sorting
But: the VIEW has no influence at all on the sorting, none of it's attributes or even the attributes in its constituent tables are used to sort. Why does includingg the sort impact performance that much then? Is there any way I can convince the database to sort first and then just join up the VIEW? Or can I convince it that the VIEW is not important for sorting?
EDIT2: Following the suggestion by #ace for creating a VIEW and then joining at first didn't seem to help:
DROP VIEW IF EXISTS `matches_joined`;
CREATE VIEW `matches_joined` AS (
SELECT matches.match_id, source_name, target_name, transformation, STATUS , volume, error, COMMENT
FROM matches
INNER JOIN STATUS ON matches.match_id = status.match_id
INNER JOIN error ON matches.match_id = error.match_id
INNER JOIN volume ON matches.match_id = volume.match_id
INNER JOIN COMMENT ON matches.match_id = comment.match_id
ORDER BY error.error
);
followed by:
SELECT matches_joined.*, num_duplicates
FROM matches_joined
LEFT JOIN num_duplicates ON matches_joined.match_id = num_duplicates.match_id
However, using LIMIT on the view did make a difference:
DROP VIEW IF EXISTS `matches_joined`;
CREATE VIEW `matches_joined` AS (
SELECT matches.match_id, source_name, target_name, transformation, STATUS , volume, error, COMMENT
FROM matches
INNER JOIN STATUS ON matches.match_id = status.match_id
INNER JOIN error ON matches.match_id = error.match_id
INNER JOIN volume ON matches.match_id = volume.match_id
INNER JOIN COMMENT ON matches.match_id = comment.match_id
ORDER BY error.error
LIMIT 0, 20
);
Afterwards, the query ran at an acceptable speed. This is already a nice result. However, I feel that I'm jumping through hoops to force the database to do what I want and the reduction in time is probably only caused by the fact that it now only has to sort 20 rows. What if I have more rows? Is there any other way to force the database to see that joining in the num_duplicates VIEW doesn't influence the sorting in the least? Could I perhaps change the query that makes the VIEW a bit?
Some things that can be tested if you haven't tried them yet.
Create a view for all joins with sorting.
DROP VIEW IF EXISTS `matches_joined`;
CREATE VIEW `matches_joined` AS (
SELECT matches.match_id, source_name, target_name, transformation, STATUS , volume, error, COMMENT
FROM matches
INNER JOIN STATUS ON matches.match_id = status.match_id
INNER JOIN error ON matches.match_id = error.match_id
INNER JOIN volume ON matches.match_id = volume.match_id
INNER JOIN COMMENT ON matches.match_id = comment.match_id
ORDER BY error.error
);
Then join them with num_duplicates
SELECT matches_joined.*, num_duplicates
FROM matches_joined
LEFT JOIN num_duplicates ON matches_joined.match_id = num_duplicates.match_id
I'm assuming that as pointed out in here, this query will utilize the order by clause in the view matches_joined.
Some information that may help on optimization.
MySQL :: MySQL 5.0 Reference Manual :: 7.3.1.11 ORDER BY Optimization
The problem was more or less solved by the "VIEW" suggestion that #ace made, but several other types of queries still had performance issues (notably large OFFSET's). In the end a large improvement on all queries of this form was had by simply forcing late-row lookup. Note that it is commonly claimed that this is only necessary for MySQL because MySQL always performs early-row lookup and that other databases like PostgreSQL don't suffer from this problem. However, extensive benchmarks of my application have pointed out that PostgreSQL benefits greatly from this approach as well.