Improving mysql query? - mysql

My attempt was to join customer and order table and to join the lineitem and order table. I have also indexed the c_mktsegment field. My resultant query is this. Is there anything that I can do do improve it?
select
o_shippriority,
l_orderkey,
o_orderdate,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
cust As c
join ord As o on c.c_custkey = o.o_custkey
join line As l on o.o_orderkey = l.l_orderkey
where
c_mktsegment = ':1'
and o_orderdate < date ':2'
and l_shipdate > date ':2'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate;

I don't see anything obviously wrong with this query. For good performance, you probably should have indexes on orders.o_custkey and lineitem.l_orderkey. The index on c_mktsegment will let the DB find customer records quickly, but from there you need to be able to find order and lineitem records.
You should do an Explain to see how the db is processing the query. This depends on many factors, including the number of records in each table and distribution of keys, so I can't say what the plan is just by looking at the query. But if you run Explain and see that it is doing a full-file read of a table, you should add an index to prevent that. That's pretty much rule #1 for query optimization.

Related

Mysql Select INNER JOIN with order by very slow

I'm trying to speed up a mysql query. The Listings table has several million rows. If I don't sort them later I get the result in 0.1 seconds but once I sort it takes 7 seconds. What can I improve to speed up the query?
SELECT l.*
FROM listings l
INNER JOIN listings_categories lc
ON l.id=lc.list_id
AND lc.cat_id='2058'
INNER JOIN locations loc
ON l.location_id=loc.id
WHERE l.location_id
IN (7841,7842,7843,7844,7845,7846,7847,7848,7849,7850,7851,7852,7853,7854,7855,7856,7857,7858,7859,7860,7861,7862,7863,7864,7865,7866,7867,7868,7869,7870,7871,7872,7873,7874,7875,7876,7877,7878,7879,7880,7881,7882,7883,7884,7885,7886,7887,7888,7889,7890,7891,7892,7893,7894,7895,7896,7897,7898,7899,7900,7901,7902,7903)
ORDER BY date
DESC LIMIT 0,10;
EXPLAIN SELECT: Using Index l=date, loc=primary, lc=primary
Such performance questions are really difficult to answer and depend on the setup, indexes etc. So, there will likely not the one and only solution and even not really correct or incorrect attempts to improve the speed. This is a lof of try and error. Anyway, some points I noted which often cause performance issues are:
Avoid conditions within joins that should be placed in the where instead. A join should contain the columns only that will be joined, no further conditions. So the "lc.cat_id='2058" should be put in the where clause.
Using IN is often slow. You could try to replace it by using OR (l.location_id = 7841 OR location_id = 7842 OR...)
Open the query execution plan and check whether there is something useful for you.
Try to find out if there are special cases/values within the affected columns which slow down your query
Change "ORDER BY date" to "ORDER BY tablealias.date" and check if this makes a difference in performance. Even if not, it is better to read.
If you can rename the column "date", do this because using SQL keywords as table name or column name is no good idea. I'm unsure if this influences the performance, but it should be avoided if possible.
Good luck!
You can try additonal indexes to speed up the query, but you'll have a tradeoff when creating/manipulating data.
These combined keys could speed up the query:
listings: date, location_id
listings_categories: cat_id, list_id
Since the plan says it uses the date index, there wouldn't be a need to read the record to check the location_id when usign the new index, and same for the join with listinngs_category, index read would be enough
l: INDEX(location_id, id)
lc: INDEX(cat_id, list_id)
If those don't suffice, try the following rewrite.
SELECT l2.*
FROM
(
SELECT l1.id
FROM listings AS l1
JOIN listings_categories AS lc ON lc.list_id = l1.id
JOIN locations AS loc ON loc.id = l1.location_id
WHERE lc.cat_id='2058'
AND l1.location_id IN (7841, ..., 7903)
ORDER BY l1.date DESC
LIMIT 0,10
) AS x
JOIN listings l2 ON l1.id = x.id
ORDER BY l2.date DESC
With
listings: INDEX(location_id, date, id)
listings_categories: INDEX(cat_id, list_id)
The idea here is to get the 10 ids from the index before reaching to the table itself. Your version is probably shoveling around the whole table before sorting, and then delivering the 10.

MySQL: long running LEFT JOIN query performance

A MySQL database contains two tables: customer and custmomer_orders
The customer table contains 80 million entries and contains 80 fields. Some of them I am interested in:
Id (PK, int(10))
Location (varchar 255, nullable).
Registration_Date (DateTime, nullable). Indexed.
The customer_orders table constains 40 million entries and contains only 3 fields:
Id (PK, int(10))
Customer_Id (int(10), FK to customer table)
Order_Date (DateTime, nullable)
When I run such query, it takes ~800 seconds to execute and returns 40 million entries:
SELECT o.*
FROM customer_orders o
LEFT JOIN customer c ON (c.Id = o.Customer_Id)
WHERE NOT (ISNULL(c.Location)) AND c.Registration_Date < '2018-01-01 00:00:00';
Machine with MySQL server has 32GB of RAM, 28GB assigned to MySQL.
MySQL version: 5.6.39.
Is it normal for MySQL to execute such query for this amount of time on the tables with such amount of records?
How can I improve the performance?
Update:
The customer_orders table does not contain any vital data we would like to store. It is some kind of copied table with orders made within last 10 days.
Every day we run a stored procedure, which deletes orders older than 10 days in scope of a transaction.
In some moment of time, this stored procedure ended up with a timeout due to not optimized query, and number of orders was growing every day.
Previous query contained also COUNT method, which, I suppose, exceeded the timeout.
Nevertheless, it surprised me, that it can take up to 15 minutes for MySQL to fetch 40m of records with additional conditions.
I think it's normal. It would be helpful if you share what explain returns for that query.
In order to optimize the query, it might not be a good idea to start with customer_orders, as you are not filtering it in anyway (so it's performing a full table scan over 40M records). Also, as pointed in the comments, a LEFT JOIN is not needed here.
I would write your query like this:
SELECT o.*
FROM customers c, customer_orders o
WHERE c.id = o.Customer_Id
AND c.Location IS NOT NULL
AND c.Registration_Date < '2018-01-01'
This will (depending on how many records satisfy the clause Registration_Date < '2018-01-01') filter the customers table first and then join with the customer_orders table which has and index by customer_id
Also, maybe not related but, is it normal for you that the query returns 40M records? I mean, it's like the whole customer_orders table. If I am right that means that all orders are from customer registered before '2018-01-01'
This is to long for a comment...
The first thing to note about your query is that it is not actually performing a LEFT JOIN, since it has conditions in the WHERE clause that refer to the LEFT JOINed table.
It could be rewritten as :
SELECT o.*
FROM customer_orders o
INNER JOIN customer c
ON c.Id = o.Customer_Id
AND c.Location is NOT NULL
AND c.Registration_Date < '2018-01-01 00:00:00';
Being explicit about the join type is better for readability and may help MySQL to find a better execution path for the query.
When it comes to performance, the basic advice is that, for this query, you would need a compound index on all three columns being searched, in the same sequence as the one being used in the query (usually, you want to put the more restrictive condition at the beginning, so you might want to adjust this) :
ALTER TABLE mytable ADD INDEX (Id, Location, Registration_Date );
For more advices on performance, you might want to update your question with the CREATE TABLE statements of your tables and the execution plan of your query.
If my comment, and GMB's answer don't end up helping performance much; you can always try writing the query with a different approach. I usually prefer joins to subqueries, but occasionally they turn out to be the best option for the data being handled.
Since you've said the customers table is relatively large compared to the orders table, this could be one of those situations.
SELECT o.*
FROM customer_orders AS o
WHERE o.Customer_Id IN (
SELECT Id
FROM customer
WHERE Location IS NOT NULL
AND Registration_Date < '2018-01-01 00:00:00'
);
I wanted to put a comment, but changed my mind to go with answer.
Because main issue is your question itself.
I don't know how many columns your customer_orders has, but if you are getting
40 million entries
back. I would say you are doing something wrong.
And probably that is not the query itself is slow, but data fetching.
To prove that try to execute EXPLAIN against your query:
EXPLAIN SELECT ...your query here... ;
Then execute
EXPLAIN SELECT ...your query here... LIMIT 1;
Try to LIMIT your results to 1000 for example:
SELECT ...your query here... LIMIT 1000;
When you have answers, outputs and stats for these queries we can discuss your following steps.

Optimizate My SQL Index Multiple Table JOIN

I have a 5 tables in mysql. And when I want execute query it executed too long.
There are structure of my tables:
Reciept(count rows: 23799640)reciept table structure
reciept_goods(count rows: 39398989)reciept_goods table structure
good(count rows: 17514)good table structure
good_categories(count rows: 121)good_categories table structure
retail_category(count rows: 10)retail_category table structure
My Indexes:
Date -->reciept.date #1
reciept_goods_index --> reciept_goods.recieptId #1,
reciept_goods.shopId #2,
reciept_goods.goodId #3
category_id -->good.category_id #1
I have a next sql request:
SELECT
R.shopId,
sales,
sum(Amount) as sum_amount,
count(distinct R.id) as count_reciept,
RC.id,
RC.name
FROM
reciept R
JOIN reciept_goods RG
ON R.id = RG.RecieptId
AND R.ShopID = RG.ShopId
JOIN good G
ON RG.GoodId = G.id
JOIN good_categories GC
ON G.category_id = GC.id
JOIN retail_category RC
ON GC.retail_category_id = RC.id
WHERE
R.date >= '2018-01-01 10:00:00'
GROUP BY
R.shopId,
R.sales,
RC.id
Explain this query gives next result:
Explain query
and execution time = 236sec
if use straight_join good ON (good.id = reciept_goods.GoodId ) explain query
Explain query
and execution time = 31sec
SELECT STRAIGHT_JOIN ... rest of query
I think, that problem in the indexes of my tables, but I don't uderstand how to fix them, can someone help me?
With about 2% of your rows in reciepts having the correct date, the 2nd execution plan chosen (with straight_join) seems to be the right execution order. You should be able to optimize it by adding the following covering indexes:
reciept(date, sales)
reciept_goods(recieptId, shopId, goodId, amount)
I assume that the column order in your primary key for reciept_goods currently is (goodId, recieptId, shopId) (or (goodId, shopId, receiptId)). You could change that to recieptId, shopId, goodId (and if you look at e.g. the table name, you may wanted to do this anyway); in that case, you do not need the 2nd index (at least for this query). I would assume that this primary key made MySQL take the slower execution plan (of course assuming that it would be faster) - although sometimes it's just bad statistics, especially on a test server.
With those covering indexes, MySQL should take the faster explain plan even without straight_join, if it doesn't, just add it again (although I would like a look at both executions plans then). Also check that those two new indexes are used in the explain plan, otherwise I may have missed a column.
It looks like you are depending on walking through a couple of many:many tables? Many people design them inefficiently.
Here I have compiled a list of 7 tips on making mapping tables more efficient. The most important is use of composite indexes.

Improve JOIN query speed

I have this simple join that works great but is HORRIBLY slow I think because the tech table is very large. There are many instances of uid as it tracks timestamp of the uid thus the distinct. What is the best way to speed this query up?
SELECT DISTINCT tech.uid,
listing.empno,
listing.firstname,
listing.lastname
FROM tech,
listing
WHERE tech.uid = listing.empno
ORDER BY listing.empno ASC
First add an Index to tech.UID and listing.EmpNo on their respective tables.
After you are sure there are indexes you can try to re-write your query like this:
SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
If it's still not fast enough, put the word EXPLAIN before the query to get some hints about the execution plan of the query.
EXPLAIN SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
Posts the Explain results so we can get better insight.
Hope it helps,
This is very simple query. Only thing you can do in SQL - you may add indexes on fields used in JOIN/WHERE and ORDER BY clauses (tech.uid, listing.empno), if there are no indexes.
If there are JOIN fields with NULL values - they may ruin your performance. You should filter them in WHERE clause (WHERE tech.uid is not null and listing.empno not null). If there are many rows with JOIN on NULL field - that data may produce cartesian result (not sure how is this called in english) with may contain enormous count of rows.
You may change MySQL configuration. There are many options useful for performance tuning, like key_buffer_size, sort_buffer_size, tmp_table_size, max_heap_table_size, read_buffer_size etc.

Why does the query take a long time in mysql even with a LIMIT clause?

Say I have an Order table that has 100+ columns and 1 million rows. It has a PK on OrderID and FK constraint StoreID --> Store.StoreID.
1) select * from 'Order' order by OrderID desc limit 10;
the above takes a few milliseconds.
2) select * from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this somehow can take up to many seconds. The more inner joins I add, slows it down further more.
3) select OrderID, column1 from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this seems to speed the execution up, by limiting the columns we select.
There are a few points that I dont understand here and would really appreciate it if anyone more knowledgeable with mysql (or rmdb query execution in general) can enlighten me.
Query 1 is fast since it's just a reverse lookup by PK and DB only needs to return the first 10 rows it encountered.
I don't see why Query 2 should take for ever. Shouldn't the operation be the same? i.e. get the first 10 rows by PK and then join with other tables. Since there's a FK constraint, it is guaranteed that the relationship will be satisfied. So DB doesn't need to join more rows than necessary and then trim the result, right? Unless, FK constraint allows null FK? In which case I guess a left join would make this much faster than an inner join?
Lastly, I'm guess query 3 is simply faster because less columns are used in those unnecessary joins? But why would the query execution need the other columns while joining? Shouldn't it just join using PKs first, and then get the columns for just the 10 rows?
Thanks!
My understanding is that the mysql engine applies limit after any join's happen.
From http://dev.mysql.com/doc/refman/5.0/en/select.html, The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
EDIT: You could try using this query to take advantage of the PK speed.
select * from (select * from 'Order' order by OrderID desc limit 10) o
join 'Store' s on s.StoreID = o.StoreID;
All of your examples are asking for tablescans of the existing tables, so none of them will be more or less performant than the degree to which mysql can cache the data or results. Some of your queries have order by or join criteria, which can take advantage of indexes purely to make the joining process more efficient, however, that still is not the same as having a set of criteria that will trigger the use of indexes.
Limit is not a criteria -- it can be thought of as filtration once a result set is determined. You save time on the client, once the result set is prepared, but not on the server.
Really, the only way to get the answers you are seeking is to become familiar with:
EXPLAIN EXTENDED your_sql_statement
The output of EXPLAIN will show you how many rows are being looked at by mysql, as well as whether or not any indexes are being used.