I have this query:
select * from
When I execute it, it takes ~45sec with 35k records. Every day I add 5k+ new records to gps_unit_location table. So table will grow.
My current indexes on all id's. Will adding any additional indexes would help me to improve the performance of this query?
thanks.
So,
be sure you have NOT NULL columns and indices on:
INDEX ON gps_unit_location.idgps_unit_location
INDEX ON user.iduser
INDEX ON user_to_gps_unit.iduser
INDEX ON user_to_gps_unit.idgps_unit
INDEX ON gps_unit.idgps_unit
INDEX ON gps_unit_location.idgps_unit
be sure you really need to select all the fields with that star *
try this query:
SELECT
`gps_unit`.`idgps_unit`,
`gps_unit`.`name as name`,
`gps_unit`.`notes as notes`,
`gps_unit`.`serial`,
`gps_unit_location`.`dt` as dt,
`gps_unit_location`.`idgps_unit_location`,
`gps_unit_location`.`lat`,
`gps_unit_location`.`long`,
`ip`,
`unique_id`,
`loc_age`,
`reason_code`,
`speed_kmh`,
`VehHdg`,
`Odometer`,
`event_time_gmt_unix`,
`switches`,
`engine_on_off`
FROM user
INNER JOIN user_to_gps_unit ON user.iduser = user_to_gps_unit.iduser
INNER JOIN gps_unit ON user_to_gps_unit.idgps_unit = gps_unit.idgps_unit
INNER JOIN gps_unit_location ON gps_unit.idgps_unit = gps_unit_location.idgps_unit
INNER JOIN
(SELECT
`gps_unit_location`.`idgps_unit`,
MAX(`gps_unit_location`.`dt`) dtmax
FROM `gps_unit_location`
GROUP BY 1
) r1 ON r1.idgps_unit = gps_unit_location.idgps_unit AND r1.dtmax = gps_unit_location.dt
WHERE
user.iduser = 14
On a side note, I think you don't need the unique indexes on the columns that are defined as primary keys, this causes write overhead on insert/update statements.
The generic answer is to index those columns that are used to join and constrain (ON and WHERE clauses). Use composite indexes (joins first, then constraints next with the lowest cardinality constraints first).
Oh, and make all your IDs 'unsigned'.
Related
how to avoid full table scan when joining on one foreign in key below is my sql query when i use explain select it show the query is scanning all the table even with a where clause
SELECT message_recipients.id, message_recipients.user_type,
COALESCE(guardians.firstname, students.firstname)
FROM message_recipients
LEFT JOIN students ON message_recipients.user_id = students.student_id
LEFT JOIN guardians ON message_recipients.user_id = guardians.guardian_id
WHERE message_recipients.message_id = 2
Also i added index on the message_id column still the same here below is the image of the explain select may be am reading it wrong
the total rows in the table is 8 but the message_id = 2 is just 6 rows and if you check the image you can see its scanning all the 8 rows which is not suppose to be the big question is how do i optimize this to avoid full table scan thanks
One way to avoid a full scan is to use clustered queries with a "with" clause like this:
WITH TB_A AS (
SELEC
A.*
FROM
some_table A
WHERE
A.field_1 = 'some condition to filter first table'
)
, TB_B AS (
SELECT
B.*
FROM
other_table B
WHERE
B.field_2 = 'some condition to filter second table if you need that too'
)
SELECT
A.*
, B.*
FROM
TB_A A
INNER JOIN
TB_B B
ON A.field_1 = B.field_1 --Running without full scan on join and much faster
;
;D
A FOREIGN KEY creates an INDEX, but that INDEX may not be optimal. See if these 'composit' indexes are better:
message_recipients: INDEX(message_id, user_id, id, user_type)
guardians: INDEX(guardian_id, firstname)
students: INDEX(student_id, firstname)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
Query-
SELECT SUM(sale_data.total_sale) as totalsale, `sale_data_temp`.`customer_type_cy` as `customer_type`, `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN `sale_data` ON `sale_data`.`depo_code` = `distributor_list`.`depo_code` and `sale_data`.`customer_code` = `distributor_list`.`customer_code` LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sale_data`.`item_code` = `item_master`.`item_code` WHERE `invoice_date` BETWEEN "2017-04-01" and "2017-11-01" AND `item_master`.`id_category` = 1 GROUP BY `distributor_list`.`address_coordinates`
Query, rewritten with formatting.
SELECT SUM(sale_data.total_sale) as totalsale,
sale_data_temp.customer_type_cy as customer_type,
distributor_list.customer_status
FROM distributor_list
LEFT JOIN sale_data
ON sale_data.depo_code = distributor_list.depo_code
and sale_data.customer_code = distributor_list.customer_code
LEFT JOIN sale_data_temp
ON distributor_list.address_coordinates = sale_data_temp.address_coordinates
LEFT JOIN item_master
ON sale_data.item_code = item_master.item_code
WHERE invoice_date BETWEEN "2017-04-01" and "2017-11-01"
AND item_master.id_category = 1
GROUP BY distributor_list.address_coordinates
DESC-
This Query is taking 7.5 seconds to run. My application contains 3-4 such queries. Therefore loading time appraches 1 min on server.
My sale data table contains 450K records.
Distributor list contains 970 records
Item master contains 7774 records and sale_data_temp contains 324 records.
I am using indexing but it is not being used for sale data table.
All the 400K records are searched as is evident from explain sql.
If I reduce the duration of BETWEEN clause than sale data table uses date index otherwise it scans all 400K rows.
The rows between 01-04-2017 and 01-11-2017 are 84000 but still it scans 400K rows.
MYSQL EXPLAIN-
I have modified queries two times with no success.
Modification 1:
SELECT SUM(sale_data.total_sale) as totalsale, `sale_data_temp`.`customer_type_cy` as `customer_type`, `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN `sale_data` ON `sale_data`.`depo_code` = `distributor_list`.`depo_code` and `sale_data`.`customer_code` = `distributor_list`.`customer_code` AND `invoice_date` BETWEEN "2017-04-01" and "2017-11-01" LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sale_data`.`item_code` = `item_master`.`item_code` WHERE `item_master`.`id_category` = 1 GROUP BY `distributor_list`.`address_coordinates`
Modification 2
SELECT SQL_NO_CACHE SUM( sd.total_sale ) AS totalsale, `sale_data_temp`.`customer_type_cy` AS `customer_type` , `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN (SELECT * FROM `sale_data` WHERE `invoice_date` BETWEEN "2017-04-01" AND "2017-11-01")sd ON `sd`.`depo_code` = `distributor_list`.`depo_code` AND `sd`.`customer_code` = `distributor_list`.`customer_code` LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sd`.`item_code` = `item_master`.`item_code` WHERE `item_master`.`id_category` =1 GROUP BY `distributor_list`.`address_coordinates`
HERE ARE MY INDEXES ON SALE DATA TABLE
See the key column of the EXPLAIN results view - no key is being used at the moment so MySQL is not using any of your indexes for filtering out rows so it is scanning the whole table on each query. This is why it is taking so long.
I have taken a look at your first query with relation to your sale_data indices. It looks like you will need to create a new composite index on this table that contains the following columns only:
depo_code, customer_code, item_code, invoice_date, total_sale
I recommend that you name this index test1 and experiment with modifying the ordering of the columns and keep testing again each time using EXPLAIN EXTENDED until you achieve a selected key - you want to see index test1 has been selected in the key column.
See this answer that has helped me before with this, and it will help you understand the importance of correctly ordering your composite indices.
Looking at the cardinality of the single field indices, here is my best attempt at giving you the correct index to apply:
ALTER TABLE `sale_data` ADD INDEX `test1` (`item_code`, `customer_code`, `invoice_date`, `depo_code`, `total_sale`);
Good luck with your mission!
A few things to notice about your query.
You are misusing the notorious MySQL extension to GROUP BY. Read this, then mention the same columns in your GROUP BY clause as you mention in your SELECT clause.
Your LEFT JOIN sale_data and LEFT JOIN item_master operations are actually ordinary JOIN operations. Why? You mention columns from those tables in your WHERE clause.
Your best bet for speedup is doing a date-range scan on an index on sale_data.invoice_date. For some reason known only to the MySQL query planner's feverish machinations, you're not getting it.
Try refactoring your query. Here's one suggestion:
SELECT SUM(sale_data.total_sale) as totalsale,
sale_data_temp.customer_type_cy as customer_type,
distributor_list.customer_status
FROM distributor_list
JOIN sale_data
ON sale_data.invoice_date BETWEEN "2017-04-01" and "2017-11-01"
and sale_data.depo_code = distributor_list.depo_code
and sale_data.customer_code = distributor_list.customer_code
LEFT JOIN sale_data_temp
ON distributor_list.address_coordinates = sale_data_temp.address_coordinates
JOIN item_master
ON sale_data.item_code = item_master.item_code
WHERE item_master.id_category = 1
GROUP BY sale_data_temp.customer_type_cy, distributor_list.customer_status
Try creating a covering index on sale_data for this query. You'll have to mess around a bit to get this right, but this is a starting point. (invoice_date, item_code, depo_code, customer_code, total_sale). The point of a covering index is to allow the query to be satisfied entirely from the index without having to refer back to the table's data. That's why I included total_sale in the index.
Please notice that index I suggested makes your index on invoice_date redundant. You can drop that index.
I am trying to make the following query run faster than 180 secs:
SELECT
x.di_on_g AS deviceid, SUM(1) AS amount
FROM
(SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN ("many (~1500) comma separated IDs coming from my code"))
GROUP BY g.device_id , g.guide_type_id) x
GROUP BY x.di_on_g
ORDER BY amount;
Screenshot from EXPLAIN:
https://ibb.co/da5oAF
Even if I run the subquery as separate query it is still very slow...:
SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN (("many (~1500) comma separated IDs coming from my code")
Screenshot from EXPLAIN:
ibb.co/gJHRVF
I have indexes on g.device_id and on other appropriate places.
Indexes:
SHOW INDEX FROM guide;
ibb.co/eVgmVF
SHOW INDEX FROM operator_guide_type;
ibb.co/f0TTcv
SHOW INDEX FROM operator_device;
ibb.co/mseqqF
I tried creating a new temp table for the ids and using a JOIN to replace the slow IN clause but that didn't make the query much faster.
All IDs are Integers and I tried creating a new temp table for the ids that come from my code and JOIN that table instead of the slow IN clause but that didn't make the query much faster. (10 secs faster)
None of the tables have more then 300,000 rows and the mysql configuration is good.
And the visual plan:
Query Plan
Any help will be appreciated !
Let's focus on the subquery. The main problem is "inflate-deflate", but I will get to that in a moment.
Add the composite index:
INDEX(locale_id, operator_id, device_id)
Why the duplicated "1" in
g.operator_id IN (1 , 1)
Why does the GROUP BY have 2 columns, when you select only 1? Is there some reason for using GROUP BY instead of DISTINCT. (The latter seems to be your intent.)
The only reason for these
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
would be to verify that there are guides and devices in those other table. Is that correct? Are these the PRIMARY KEYs, hence unique?: ogt.guide_type_id and od.device_id. If so, why do you need the GROUP BY? Based on the EXPLAIN, it sounds like both of those are related 1:many. So...
SELECT g.device_id AS di_on_g
FROM guide g
WHERE EXISTS( SELECT * FROM operator_guide_type WHERE guide_type_id = g.guide_type_id )
AND EXISTS( SELECT * FROM operator_device WHERE device_id = g.device_id
AND g.operator_id IN (1)
AND g.locale_id = 1
AND g.device_id IN (...)
Notes:
The GROUP BY is no longer needed.
The "inflate-deflate" of JOIN + GROUP BY is gone. The Explain points this out -- 139K rows inflated to 61M -- very costly.
EXISTS is a "semijoin", meaning that it does not collect all matches, but stops when it finds any match.
"the mysql configuration is good" -- How much RAM do you have? What Engine is the table? What is the value of innodb_buffer_pool_size?
suppose tables:
main (1M rows):
id, a_id, b_id, c_id, d_id, amounts, year, main_col
a(500 rows):
id, a_col1, a_col2
b(2000 rows):
id, b_col1, b_col2, b_col3
c(500 rows):
id, c_col1, c_col2
d(1000 rows):
id, d_col1, d_col2
And we have query like:
select sum(amounts), main_col
join a on a.id = main.a_id
join b on b.id = main.b_id
join c on c.id = main.c_id
join d on d.id = main.d_id
where a.a_col1 in (...)
and a.a_col2 in (..)
and b.b_col1 in (...)
and b.b_col2 in (...)
and b.b_col3 in (..)
and c.c_col1 in (..)
and c.c_col2 in (..)
and d.d_col1 in (..)
and d.d_col2 in (..)
and year = 2011
group by main.main_col
any idea who to create index on the main table to improve the query performance ?
thanks
Update:
indexes are added to a,b,c,d tables for the columns show up in where
i've tried multiple column indexes on main table (a_id, b_id, c_id, d_id, main_col) which have the best performance than others like add individual indexes on the a_id, b_id... , but it still not fast enough for requirement, on query will take 7 seconds to run for the maximum situation
CREATE INDEX id_main ON main(id)
You could create multiple indexes on additional columns and see how that works for you.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
It really depends on specifics like how selective each join is, how big each table is, etc. In general it might be wise to add one or more indices on the foreign keys of your main table, but who could say from that limited info?
It might also help to get an idea of how your query is executing-- try looking at MySQL's "explain" or "explain extended".
Since the primary keys are already indexed by default, the join cannot be optimized by adding more indexes. So that leaves your where statements. So it could help adding indexes to the x.x_colN columns.
You should also have an index on your group_by I think
How would I make this query run faster...?
SELECT account_id,
account_name,
account_update,
account_sold,
account_mds,
ftp_url,
ftp_livestatus,
number_digits,
number_cw,
client_name,
ppc_status,
user_name
FROM
Accounts,
FTPDetails,
SiteNumbers,
Clients,
PPC,
Users
WHERE Accounts.account_id = FTPDetails.ftp_accountid
AND Accounts.account_id = SiteNumbers.number_accountid
AND Accounts.account_client = Clients.client_id
AND Accounts.account_id = PPC.ppc_accountid
AND Accounts.account_designer = Users.user_id
AND Accounts.account_active = 'active'
AND FTPDetails.ftp_active = 'active'
AND SiteNumbers.number_active = 'active'
AND Clients.client_active = 'active'
AND PPC.ppc_active = 'active'
AND Users.user_active = 'active'
ORDER BY
Accounts.account_update DESC
Thanks in advance :)
EXPLAIN query results:
I don't really have any foreign keys set up...I was trying to avoid making alterations to the database as will have to do a complete overhaul soon.
only primary keys are the id of each table e.g. account_id, ftp_id, ppc_id ...
Indexes
You need - at least - an index on every field that is used in a JOIN condition.
Indexes on the fields that appear in WHERE or GROUP BY or ORDER BY clauses are most of the time useful, too.
When in a table, two or more fields are used in JOIns (or WHERE or GROUP BY or ORDER BY), a compound (combined) index of these (two or more) fields may be better than separate indexes. For example in the SiteNumbers table, possible indexes are the compound (number_accountid, number_active) or (number_active, number_accountid).
Condition in fields that are Boolean (ON/OFF, active/inactive) are sometimes slowing queries (as indexes are not selective and thus not very helpful). Restructuring (father normalizing) the tables is an option in that case but probably you can avoid the added complexity.
Besides the usual advice (examine the EXPLAIN plan, add indexes where needed, test variations of the query),
I notice that in your query there is a partial Cartesian Product. The table Accounts has a one-to-many relationships to three tables FTPDetails, SiteNumbers and PPC. This has the effect that if you have for example 1000 accounts, and every account is related to, say, 10 FTPDetails, 20 SiteNumbers and 3 PPCs, the query will return for every account 600 rows (the product of 10x20x3). In total 600K rows where many data are duplicated.
You could instead split the query into three plus one for base data (Account and the rest tables). That way, only 34K rows of data (having smaller length) would be transfered :
Accounts JOIN Clients JOIN Users
(with all fields needed from these tables)
1K rows
Accounts JOIN FTPDetails
(with Accounts.account_id and all fields from FTPDetails)
10K rows
Accounts JOIN SiteNumbers
(with Accounts.account_id and all fields from SiteNumbers)
20K rows
Accounts JOIN PPC
(with Accounts.account_id and all fields from PPC)
3K rows
and then use the data from the 4 queries in the client side to show combined info.
I would add the following indexes:
Table Accounts
index on (account_designer)
index on (account_client)
index on (account_active, account_id)
index on (account_update)
Table FTPDetails
index on (ftp_active, ftp_accountid)
Table SiteNumbers
index on (number_active, number_accountid)
Table PPC
index on (ppc_active, ppc_accountid)
Use EXPLAIN to find out which index could be used and which index is actually used. Create an appropriate index if necessary.
If FTPDetails.ftp_active only has the two valid entries 'active' and 'inactive', use BOOL as data type.
As a side note: I strongly suggest using explicit joins instead of implicit ones:
SELECT
account_id, account_name, account_update, account_sold, account_mds,
ftp_url, ftp_livestatus,
number_digits, number_cw,
client_name,
ppc_status,
user_name
FROM Accounts
INNER JOIN FTPDetails
ON Accounts.account_id = FTPDetails.ftp_accountid
AND FTPDetails.ftp_active = 'active'
INNER JOIN SiteNumbers
ON Accounts.account_id = SiteNumbers.number_accountid
AND SiteNumbers.number_active = 'active'
INNER JOIN Clients
ON Accounts.account_client = Clients.client_id
AND Clients.client_active = 'active'
INNER JOIN PPC
ON Accounts.account_id = PPC.ppc_accountid
AND PPC.ppc_active = 'active'
INNER JOIN Users
ON Accounts.account_designer = Users.user_id
AND Users.user_active = 'active'
WHERE Accounts.account_active = 'active'
ORDER BY Accounts.account_update DESC
This makes the query much more readable because the join condition is close to the name of the table that is being joined.
EXPLAIN, benchmark different options. For starters, I'm sure that several queries will be faster than this monster. First, because query optimiser will spend a lot of time examining what join order is the best (5!=120 possibilities). And second, queries like SELECT ... WHERE ....active = 'active' will be cached (though it depends on an amount of data changes).
One of your main problems is here: x.y_active = 'active'
Problem: low cardinality
The active field is a boolean field with 2 possible values, as such it has very low cardinality.
MySQL (or any SQL for that matter will not use an index when 30% or more of the rows have the same value).
Forcing the index is useless because it will make your query slower, not faster.
Solution: partition your tables
A solution is to partition your tables on the active columns.
This will exclude all non-active fields from consideration, and will make the select act as if you actually have a working index on the xxx-active fields.
Sidenote
Please don't ever use implicit where joins, it's much too error prone and consufing to be useful.
Use a syntax like Oswald's answer instead.
Links:
Cardinality: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
Cardinality and indexes: http://www.bennadel.com/blog/1424-Exploring-The-Cardinality-And-Selectivity-Of-SQL-Conditions.htm
MySQL partitioning: http://dev.mysql.com/doc/refman/5.5/en/partitioning.html