optimize mysql indexes - mysql

I have a slow (1.4 second) query that has been bugging me for a while so I just thought I'd put it up and see if anyone can help me optimize my indexes to speed it up:
select sql_calc_found_rows t.id, q.im_id, concat(t.si_id, ' ', t.de), q.date, q.das, q.dac, u.name, q.ac, q.st
from t300q q
left join t300 t on t.id = q.con_id
left join users u on u.id = q.user_id
order by q.date desc limit 0,100
sql explain results:
SIMPLE q ALL 89126 Using filesort
SIMPLE t eq_ref PRIMARY PRIMARY 4 db.q.con_id 1
SIMPLE u eq_ref PRIMARY PRIMARY 4 db.q.user_id 1
session stats:
Handler_read_first = 0
Handler_read_key = 177934
Handler_read_next = 23
Handler_read_prev = 679
Handler_read_rnd = 15
Handler_read_rnd_next = 89127
and I have the following indexes:
t.id - primary key
q.con_id |
q.date | - all form a single index
q.user_id |
u.id - primary key
as you can see from the handler stats the size of table q is 89126 rows.
It's not a massive problem but I would like to get the speed down below 1 second for this query if possible.

The query is slow because you don't have a date index. The compound index cannot be used because the date is in the middle. Either move the date to be the first field in the existing index or create a stand alone index.

BTW mysql uses only equality for the first two columns of a 3 column index. The last column can use ranged queries.
Namely:
WHERE x=? AND y=? order by z;
will use an index of columns (x,y,z) (since z can be ranged).
Try moving 'date' to the 3rd column and rewriting the query.
If that doesn't work, then mysql isn't being smart enough to treat con_id and user_id in the join.. Perhaps you could rewrite it so those join conditions happen in the where clause.

try to trigger the OPTIMIZE or ANALYZE on your database but make sure that you trigger this on the time on which there are only few request or much better if there are no request that are being done on the server to avoid any problems to arise you may see more informations about this statements on this links:
http://dev.mysql.com/doc/refman/5.6/en/analyze-table.html
http://dev.mysql.com/doc/refman/4.1/en/optimize-table.html

Related

SQL query optimization on MySQL

I have a SQL query that is taking too much time to execute. How can I optimize it so that it should not take much time it is taking around 620sec that means 10 min.
| 190543 | root | localhost | ischolar | Query | 620 | Copying to tmp table
SELECT a.article_id, count(a.article_id) AS views
FROM timed_views_log a
INNER JOIN published_articles pa
ON (a.article_id = pa.article_id)
WHERE
a.date BETWEEN date_format(curdate() - interval 1 month,'%Y-%m-01 00:00:00') AND
date_format(last_day(curdate()-interval 1 month),'%Y-%m-%d 23:59:59')
GROUP BY a.article_id
ORDER BY
views desc
LIMIT 6, 5;
You may try adding indices which target the join and where conditions:
CREATE INDEX idx1 ON timed_views_log (date, article_id);
CREATE INDEX idx2 ON published_articles (article_id);
The first index, if used, should speed up the WHERE clause by allowing MySQL to use only the index to satisfy your filters on the date. The second index should allow MySQL to do the lookup for the join faster.
If you are using SQL server you can use sql server query execution plan and optimizations suggested by it.
reference article - https://www.sqlshack.com/using-the-sql-execution-plan-for-query-performance-tuning/
your query is a join with where clause, so mostly the data in the tables itself is large, try adding index.

Is there any performance difference between those two SQL queries?

I’m a new SQL learner and a newbie to StackOverflow. Hope I didn't miss anything important for a first-time post.
I happened to get two following queries from my instructor saying they have different performance. But I couldn’t see why they are different in terms of the logic and computation cost.
Query 1:
SELECT First_Name,
SUM(total_sales_amount) AS sub_total_sales_amount FROM
(
select A.First_Name, C.product_quantity * D.Retail_Price AS t otal_sales_amount From join_demo.customer as A
inner join join_demo.customer_order as B on A.customer_id = B.customer_id
inner join join_demo.order_details C on B.order_id = C.order_id
inner join join_demo.product as D on C.product_id= D.product_id
) E
GROUP BY 1
ORDER BY sub_total_sales_amount DESC LIMIT 1;
Query 2 (I was told this one has better performance):
SELECT A.First_Name, SUM(C.product_quantity * D.Retail_Price) AS sub_total_sales_amount
From join_demo.customer as A
inner join join_demo.customer_order as B on A.customer_id = B.customer_id
inner join join_demo.order_details C on B.order_id = C.order_id
inner join join_demo.product as D on C.product_id= D.product_id GROUP BY 1
ORDER BY sub_total_sales_amount DESC LIMIT 1;
I’m running MySQL on my local Mac. But I suppose this one would be a general question regarding to SQL performance tuning.
Could someone please shed light on this question? Much appreciated!
Updated:
Thanks #Tim and #MatBailie. I added EXPLAIN before each query.
The results are exactly the same. I guess two queries are on the same level of performance.
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
A
NULL
ALL
NULL
NULL
NULL
NULL
3
100
Using temporary; Using filesort
1
SIMPLE
B
NULL
ALL
NULL
NULL
NULL
NULL
4
25
Using where; Using join buffer (hash join)
1
SIMPLE
C
NULL
ALL
NULL
NULL
NULL
NULL
5
20
Using where; Using join buffer (hash join)
1
SIMPLE
D
NULL
ALL
NULL
NULL
NULL
NULL
5
20
Using where; Using join buffer (hash join)
Old versions of MySQL used to automatically materialize derived tables (subqueries in the FROM clause). "Materialize" means that MySQL runs the subquery and saves the results in a temporary location (in this case, before doing the aggregation).
I think the optimizer was improved starting with version 5.7 (although the history may be wrong). Nowadays, MySQL is smarter about materialization and will generally merge a subquery with the outer query.
Hence, more recent versions of MySQL should produce the same execution plan. Of course, optimizers can be confused and the optimizer may decide to materialize the subquery, which would slow down the query under most circumstances.
You can read more about this in the documentation.
You should also learn to use meaningful table aliases, such as c for customers. And, qualify all column references so it is clear where the columns come from. Arbitrary letters are probably worse than no aliases at all (assuming the columns are all qualified).
The first query uses an explicit subquery to first generate an intermediate result containing, for each first name, every total amount. Then, it aggregates over name in the outer query to generate the sums you want. The second version does not use any such intermediate subquery, but instead directly aggregates on the joined tables. As a result, the first query may have extra overhead with regard to memory, and also performance, as MySQL has to aggregate over the intermediate table.
However, you should check the EXPLAIN plans of both queries to verify this. It is also possible that MySQL might be smart enough to execute the first query using the same plan as the second one.
Please provide SHOW CREATE TABLE.
It sounds like these indexes are missing:
B: INDEX(customer_id)
C: INDEX(order_id)
D: INDEX(product_id)

Large SQL database - solving efficiency

I have this following SQL query, which, when I originally coded it, was exceptionally fast, it now takes over 1 second to complete:
SELECT counted/scount as ratio, [etc]
FROM
playlists
LEFT JOIN (
select AID, PLID FROM (SELECT AID, PLID FROM p_s ORDER BY `order` asc, PLSID desc)as g GROUP BY PLID
) as t USING(PLID)
INNER JOIN (
SELECT PLID, count(PLID) as scount from p_s LEFT JOIN audio USING(AID) WHERE removed='0' and verified='1' GROUP BY PLID
) as g USING(PLID)
LEFT JOIN (
select AID, count(AID) as counted FROM a_p_all WHERE ".time()." - playtime < 2678400 GROUP BY AID
) as r USING(AID)
LEFT JOIN audio USING (AID)
LEFT JOIN members USING (UID)
WHERE scount > 4 ORDER BY ratio desc
LIMIT 0, 20
I have identified the problem, the a_p_all table has over 500k rows. This is slowing down the query. I have come up with a solution:
Create a smaller temporary table, that only stores the data necessary, and deletes anything older than is needed.
However, is there a better method to use? Optimally I wouldn't need a temporary table; what do sites such as YouTube/Facebook do for large tables to keep query times fast?
edit
This is the EXPLAIN table for the query in the answer from #spencer7593
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 20
1 PRIMARY u eq_ref PRIMARY PRIMARY 8 q.AID 1 Using index
1 PRIMARY m eq_ref PRIMARY PRIMARY 8 q.UID 1 Using index
3 DERIVED <derived6> ALL NULL NULL NULL NULL 20
6 DERIVED t ALL NULL NULL NULL NULL 21
5 DEPENDENT SUBQUERY s ALL NULL NULL NULL NULL 49 Using where; Using filesort
4 DEPENDENT SUBQUERY c ALL NULL NULL NULL NULL 49 Using where
4 DEPENDENT SUBQUERY o eq_ref PRIMARY PRIMARY 8 database.c.AID 1 Using where
2 DEPENDENT SUBQUERY a ALL NULL NULL NULL NULL 510594 Using where
Two "big rock" issues stand out to me.
Firstly, this predicate
WHERE ".time()." - playtime < 2678400
(I'm assuming that this isn't the actual SQL being submitted to the database, but that what's being sent to the database is something like this...
WHERE 1409192073 - playtime < 2678400
such that we want only rows where playtime is within the past 31 days (i.e. within 31*24*60*60 seconds of the integer value returned by time().
This predicate can't make use of a range scan operation on a suitable index on playtime. MySQL evaluates the expression on the left side for every row in the table (every row that isn't excluded by some other predicate), and the result of that expression is compared to the literal on the right.
To improve performance, rewrite the predicate that so that the comparison is made on the bare column. Compare the value stored in the playtime column to an expression that needs to be evaluated one time, for example:
WHERE playtime > 1409192073 - 2678400
With a suitable index available, MySQL can perform a "range" scan operation, and efficiently eliminate a boatload of rows that don't need to be evaluated.
The second "big rock" is the inline views, or "derived tables" in MySQL parlance. MySQL is much different than other databases in how inline views are processed. MySQL actually runs that innermost query, and stores the result set as a temporary MyISAM table, and then the outer query runs against the MyISAM table. (The name that MySQL uses, "derived table", makes sense when we understand how MySQL processes the inline view.) Also, MySQL does not "push" predicates down, from an outer query down into the view queries. And on the derived table, there are no indexes created. (I believe MySQL 5.7 is changing that, and does sometimes create indexes, to improve performance.) But large "derived tables" can have a significant performance impact.
Also, the LIMIT clause gets applied last in the statement processing; that's after all the rows in the resultset are prepared and sorted. Even if you are returning only 20 rows, MySQL still prepares the entire resultset; it just doesn't transfer them to the client.
Lots of the column references are not qualified with the table name or alias, so we don't know, for example, which table (p_s or audio) contains the removed and verified columns.
(We know it can't be both, if MySQL isn't throwing a "ambiguous column" error. But MySQL has access to the table definitions, where we don't. MySQL also knows something about the cardinality of the columns, in particular, which columns (or combination of columns) are UNIQUE, and which columns can contain NULL values, etc.
Best practice is to qualify ALL column references with the table name or (preferably) a table alias. (This makes it much easier on the human reading the SQL, and it also avoids a query from breaking when a new column is added to a table.)
Also, the query as a LIMIT clause, but there's no ORDER BY clause (or implied ORDER BY), which makes the resultset indeterminate. We don't have any guaranteed which will be the "first" rows returned.
EDIT
To return only 20 rows from playlists (out of thousands or more), I might try using correlated subqueries in the SELECT list; using a LIMIT clause in an inline view to winnow down the number of rows that I'd need to run the subqueries for. Correlated subqueries can eat your lunch (and your lunchbox too) in terms of performance with large sets, due to the number of times those need to be run.
From what I can gather, you are attempting to return 20 rows from playlists, picking up the related row from member (by the foreign key in playlists), finding the "first" song in the playlist; getting a count of times that "song" has been played in the past 31 days (from any playlist); getting the number of times a song appears on that playlist (as long as it's been verified and hasn't been removed... the outerness of that LEFT JOIN is negated by the predicates on the removed and verified columns, if either of those columns is from the audio table...).
I'd take a shot with something like this, to compare performance:
SELECT q.*
, ( SELECT COUNT(1)
FROM a_p_all a
WHERE a.playtime < 1409192073 - 2678400
AND a.AID = q.AID
) AS counted
FROM ( SELECT p.PLID
, p.UID
, p.[etc]
, ( SELECT COUNT(1)
FROM p_s c
JOIN audio o
ON o.AID = c.AID
AND o.removed='0'
AND o.verified='1'
WHERE c.PLID = p.PLID
) AS scount
, ( SELECT s.AID
FROM p_s s
WHERE s.PLID = p.PLID
ORDER BY s.order ASC, s.PLSID DESC
LIMIT 1
) AS AID
FROM ( SELECT t.PLID
, t.[etc]
FROM playlists t
ORDER BY NULL
LIMIT 20
) p
) q
LEFT JOIN audio u ON u.AID = q.AID
LEFT JOIN members m ON m.UID = q.UID
LIMIT 0, 20
UPDATE
Dude, the EXPLAIN output is showing that you don't have suitable indexes available. To get any decent chance at performance with the correlated subqueries, you're going to want to add some indexes, e.g.
... ON a_p_all (AID, playtime)
... ON p_s (PLID, order, PLSID, AID)

Complex MySQL Select Left Join Optimization Indexing

I have a very complex query that is running and finding locations of members joining the subscription details and sorting by distance.
Can someone provide instruction on the correct indexes and cardinality I should add to make this load faster.
Right now on 1 million records it takes 75 seconds and I know it can be improved.
Thank you.
SELECT SQL_CALC_FOUND_ROWS (((acos(sin((33.987541*pi()/180)) * sin((users_data.lat*pi()/180))+cos((33.987541*pi()/180)) * cos((users_data.lat*pi()/180)) * cos(((-118.472153- users_data.lon)* pi()/180))))*180/pi())*60*1.1515) as distance,subscription_types.location_limit as location_limit,users_data.user_id,users_data.last_name,users_data.filename,users_data.user_id,users_data.phone_number,users_data.city,users_data.state_code,users_data.zip_code,users_data.country_code,users_data.quote,users_data.subscription_id,users_data.company,users_data.position,users_data.profession_id,users_data.experience,users_data.account_type,users_data.verified,users_data.nationwide,IF(listing_type = 'Company', company, last_name) as name
FROM `users_data`
LEFT JOIN `users_reviews` ON users_data.user_id=users_reviews.user_id AND users_reviews.review_status='2'
LEFT JOIN users_locations ON users_locations.user_id=users_data.user_id
LEFT JOIN subscription_types ON users_data.subscription_id=subscription_types.subscription_id
WHERE users_data.active='2'
AND subscription_types.searchable='1'
AND users_data.state_code='CA'
AND users_data.country_code='US'
GROUP BY users_data.user_id
HAVING distance <= '50'
OR location_limit='all'
OR users_data.nationwide='1'
ORDER BY subscription_types.search_priority ASC, distance ASC
LIMIT 0,10
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users_reviews system user_id,review_status NULL NULL NULL 0 const row not found
1 SIMPLE users_locations system user_id NULL NULL NULL 0 const row not found
1 SIMPLE users_data ref subscription_id,active,state_code,country_code state_code 47 const 88241 Using where; Using temporary; Using filesort
1 SIMPLE subscription_types ALL PRIMARY,searchable NULL NULL NULL 4 Using where; Using join buffer
You query is not that complex. You have only one join, on a table subscription_types which is certainly a little table with no more than a few hundred rows.
Where are your indexes ? The best way to improve your query is to create indexes on the field you are filtering, like active, country_code, state_code and searchable
Have you create the foreign key on users_data.subscription_id ? You need an index on that too.
ForceIndex is useless, let the RDBMS determine the best indexes to chose.
Left Join is useless too, because the line subscription_types.searchable='1' will remove the unmatch correspondance
The order on search_priority implies that you need indexes on this columns too
The filtering in the HAVING can make the indexes not used. You don't need to put these filters in the HAVING. If I understand your table schema, this is not really the aggregate that is filtered.
Your table contains 1 million rows, but how much rows are returned, without the limit? With the right indexes, the query should execute under a second.
SELECT ...
FROM `users_data`
INNER JOIN subscription_types
ON users_data.subscription_id = subscription_types.subscription_id
WHERE users_data.active='2'
AND users_data.country_code='US'
AND users_data.state_code='NY'
AND subscription_types.searchable='1'
AND (distance <= '50' OR location_limit='all' OR users_data.nationwide='1')
GROUP BY users_data.user_id
ORDER BY subscription_types.search_priority ASC, distance ASC
LIMIT 0,10

MySQL performance, inner join, how to avoid Using temporary and filesort

I have a table 1 and table 2.
Table 1
PARTNUM - ID_BRAND
partnum is the primary key
id_brand is "indexed"
Table 2
ID_BRAND - BRAND_NAME
id_brand is the primary key
brand_name is "indexed"
The table 1 contains 1 million of records and the table 2 contains 1.000 records.
I'm trying to optimize some query using EXPLAIN and after a lot of try I have reached a dead end.
EXPLAIN
SELECT pm.partnum, pb.brand_name
FROM products_main AS pm
LEFT JOIN products_brands AS pb ON pm.id_brand=pb.id_brand
ORDER BY pb.brand ASC
LIMIT 0, 10
The query returns this execution plan:
ID, SELECT_TYPE, TABLE, TYPE, POSSIBLE_KEYS, KEY, KEY_LEN , REF, ROWS, EXTRA
1, SIMPLE, pm, range, PRIMARY, PRIMARY, 1, , 1000000, Using where; Using temporary; Using filesort
1, SIMPLE, pb, ref, PRIMARY, PRIMARY, 4, demo.pm.id_pbrand, 1,
The MySQL query optimizer shows a temporary + filesort in the execution plan.
How can I avoid this?
The "EVIL" is in the ORDER BY pb.brand ASC. Ordering by that external field seems to be the bottleneck..
First of all, I question the use of an outer join seeing as the order by is operating on the rhs, and the NULL's injected by the left join are likely to play havoc with it.
Regardless, the simplest approach to speeding up this query would be a covering index on pb.id_brand and pb.brand. This will allow the order by to be evaluated 'using index' with the join condition. The alternative is to find some way to reduce the size of the intermediate result passed to the order-by.
Still, the combination of outer-join, order-by, and limit, leaves me wondering what exactly you are querying for, and if there might not be a better way of expressing the query itself.
Try replacing the join with a subquery. MySQL's optimizer kind of sucks; subqueries often give better performance than joins.
First, try changing your index on the products_brands table. Delete the existing one on brand_name, and create a new one:
ALTER TABLE products_brands ADD INDEX newIdx (brand_name, id_brand)
Then, the table will already have a "orderedByBrandName" index with the ids you need for the join, and you can try:
EXPLAIN
SELECT pb.brand_name, pm.partnum
FROM products_brands AS pb
LEFT JOIN products_main AS pm ON pb.id_brand = pm.id_brand
LIMIT 0, 10
Note that I also changed the order of the tables in the query, so you start with the small one.
This question is somewhat outdated, but I did find it, and so will other people.
Mysql uses temporary if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
So you just need to have the join order reversed by using STRAIGHT_JOIN, to bypass the order invented by optimizer:
SELECT STRAIGHT_JOIN pm.partnum, pb.brand_name
FROM products_brands AS pb
RIGHT JOIN products_main AS pm ON pm.id_brand=pb.id_brand
ORDER BY pb.brand ASC
LIMIT 0, 10
Also make sure that max_heap_table_size AND tmp_table_size variables are set to a number big enough to store the results:
SET global tmp_table_size=100000000;
SET global max_heap_table_size=100000000;
-- 100 megabytes in this example. These can be set in my.cnf config file, too.