I have a table with 40,000,000 rows, and I'm trying to optimize my query, because takes too long.
First, this is my table:
CREATE TABLE resume (
yearMonth char(6) DEFAULT NULL,
type char(1) DEFAULT NULL,
agen_o char(5) DEFAULT NULL,
tar char(2) DEFAULT NULL,
cve_ent char(1) DEFAULT NULL,
cve_mun char(3) DEFAULT NULL,
cve_reg int(1) DEFAULT NULL,
id_ope char(1) DEFAULT NULL,
ope_tip char(2) DEFAULT NULL,
ope_cve char(3) DEFAULT NULL,
cob_u int(9) DEFAULT NULL,
tot_imp bigint(15) DEFAULT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This is my query:
SELECT m.name_ope AS cve,
SUBSTRING(r.yearMonth,5,2) AS period,
COUNT(DISTINCT(CONCAT(r.agen_ope,r.cve_ope))) AS num,
SUM(CASE WHEN r.type='A' THEN r.cob_u ELSE 0 END) AS tot_u,
FROM resume r, media m
WHERE CONCAT(r.id_ope,SUBSTRING(r.ope_cve,3,1))=m.ope_cve AND
r.type IN ('C','D','E') AND
SUBSTRING(r.yearMonth,1,4)='2012' AND
r.id_ope='X' AND
SUBSTRING(r.ope_cve,1,2) IN (SELECT cve_med FROM catNac WHERE numero='0')
GROUP BY SUBSTRING(r.yearMonth,5,2),SUBSTRING(r.ope_cve,3,1)
ORDER BY SUBSTRING(r.yearMonth,5,2),SUBSTRING(r.ope_cve,3,1)
So, I added an index with these fields: id_ope, yearMonth, agen_o, because I have other's queries that have this fields in WHERE, with this order
Now my explain output:
1 PRIMARY r ref indice indice 2 const 14774607 Using where; Using filesort
So i added another index with yearMonth, ope_cve, but I still have "using filesort". How can I optimize this?
Thanks
Without modifying your table structure, if you have an index on yearMonth, you can try this:
SELECT m.name_ope AS cve,
SUBSTRING(r.yearMonth,5,2) AS period,
COUNT(DISTINCT(CONCAT(r.agen_ope,r.cve_ope))) AS num,
SUM(CASE WHEN r.type='A' THEN r.cob_u ELSE 0 END) AS tot_u,
FROM resume r, media m
WHERE CONCAT(r.id_ope,SUBSTRING(r.ope_cve,3,1))=m.ope_cve AND
r.type IN ('C','D','E') AND
r.yearMonth LIKE '2012%' AND
r.id_ope='X' AND
SUBSTRING(r.ope_cve,1,2) IN (SELECT cve_med FROM catNac WHERE numero='0')
GROUP BY r.yearMonth,SUBSTRING(r.ope_cve,3,1)
The changes:
Using r.yearMonth LIKE '2012%' should allow an index to be used for that part of your where clause.
Since you're already filtering out every year but 2012, you can group by GROUP BY r.yearMonth alone.
The ORDER BY clause is not needed since MySQL sorts on GROUP BY, unless you include ORDER BY NULL
Related
I am working with agricultural product management system. I have a question regarding a MySQL query. I would like to know how to create the same query using Laravel query builder:
SELECT
vegitables.name, vegitables.image, vegitables.catagory,
AVG(price_wholesale),
SUM(CASE WHEN rank = 1 THEN price_wholesale ELSE 0 END) today,
SUM(CASE WHEN rank = 2 THEN price_wholesale ELSE 0 END) yesterday
FROM (
SELECT
veg_id, price_wholesale, price_date,
RANK() OVER (PARTITION BY veg_id ORDER BY price_date DESC) as rank
FROM old_veg_prices
) p
INNER JOIN vegitables ON p.veg_id = vegitables.id
WHERE rank in (1,2)
GROUP BY veg_id
This Output result get when run query in database:
Following two table are used to get today price yesterday price and price average get from each product.
CREATE TABLE `vegitables` (
`id` bigint(20) UNSIGNED NOT NULL,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`image` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`catagory` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`total_area` int(11) NOT NULL COMMENT 'Total area of culativate in Sri Lanka (Ha)',
`total_producation` int(11) NOT NULL COMMENT 'Total production particular product(mt)',
`annual_crop_count` int(11) NOT NULL COMMENT 'how many time can crop pre year',
`short_dis` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE `vegitables`
ADD PRIMARY KEY (`id`);
ALTER TABLE `vegitables`
MODIFY `id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=3;
COMMIT;
CREATE TABLE `old_veg_prices` (
`id` bigint(20) UNSIGNED NOT NULL,
`veg_id` int(11) NOT NULL,
`price_wholesale` double(8,2) NOT NULL,
`price_retial` double(8,2) NOT NULL,
`price_location` int(11) NOT NULL,
`price_date` date NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE `old_veg_prices`
ADD PRIMARY KEY (`id`);
ALTER TABLE `old_veg_prices`
MODIFY `id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=6;
COMMIT;
I try this site to convert to MySQL query to query builder code. But it show some error's could find it out. Any Way i want to run this code in Laravel with any method??
Your query will not return the data for yesterday and today; it will return the data for two most recent dates (e.g. if today is 2021-11-01 and most recent two dates for for carrots are 2021-10-25 and 2021-10-20 it will use those two dates). Using RANK() ... IN (1, 2) is also incorrect because it can return ranks such as 1 followed by 3 instead of 2.
To get today and yesterday prices you don't need window functions. Just use appropriate where clause and conditional aggregation:
SELECT vegitables.name
, vegitables.image
, vegitables.catagory
, AVG(old_veg_prices.price_wholesale) AS avgwholesale
, SUM(CASE WHEN old_veg_prices.price_date = CURRENT_DATE - INTERVAL 1 DAY THEN old_veg_prices.price_wholesale END) AS yesterday
, SUM(CASE WHEN old_veg_prices.price_date = CURRENT_DATE THEN old_veg_prices.price_wholesale END) AS today
FROM vegitables
INNER JOIN old_veg_prices ON vegitables.id = old_veg_prices.veg_id
WHERE old_veg_prices.price_date IN (CURRENT_DATE - INTERVAL 1 DAY, CURRENT_DATE)
GROUP BY vegitables.id -- other columns from vegitables table are functionally dependent on primary key
The Laravel equivalent would be:
DB::table('vegitables')
->Join('old_veg_prices', 'old_veg_prices.veg_id', '=', 'vegitables.id')
->whereRaw('old_veg_prices.price_date IN (CURRENT_DATE - INTERVAL 1 DAY, CURRENT_DATE)')
->select(
'vegitables.name',
'vegitables.image',
'vegitables.catagory',
DB::raw('AVG(old_veg_prices.price_wholesale) AS avgwholesale'),
DB::raw('SUM(CASE WHEN old_veg_prices.price_date = CURRENT_DATE - INTERVAL 1 DAY THEN old_veg_prices.price_wholesale END) AS yesterday'),
DB::raw('SUM(CASE WHEN old_veg_prices.price_date = CURRENT_DATE THEN old_veg_prices.price_wholesale END) AS today')
)
->groupBy(
'vegitables.id',
'vegitables.name',
'vegitables.image',
'vegitables.catagory'
)
->get();
"Query builder" features of abstraction products often leave out some possible SQL constructs. I recommend you abandon the goal of reverse engineering SQL back to Laravel and simply perform the "raw" query.
Also...
rank() OVER (PARTITION BY veg_id ORDER BY price_date DESC) as rank
requires MySQL 8.0 (MariaDB 10.2).
And suggest you avoid the alias "rank" since that is identical to the name of a function.
I'm trying to optimize my query speed as much as possible. A side problem is that I cannot see the exact query speed, because it is rounded to a whole second. The query does get the expected result and takes about 1 second. The final query should be extended even more and for this reason i am trying to improve it. How can this query be improved?
The database is constructed as an electricity utility company. The query should eventually calculate an invoice. I basically have 4 tables, APX price, powerdeals, powerload, eans_power.
APX price is an hourly price, powerload is a quarterly hour volume. First step is joining these two together for each quarter of an hour.
Second step is that I currently select the EAN that is indicated in the table eans_power.
Finally I will join the Powerdeals that currently consist only of a single line and indicates from which hour, until which hour and weekday from/until it should be applicable. It consist of an hourly volume and price. Currently it is only joined on the hours, but it will be extended to weekdays as well.
MYSQL Query:
SELECT l.DATE, l.PERIOD_FROM, a.PRICE, l.POWERLOAD,
SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
FROM timeseries.powerload l
INNER JOIN timeseries.apxprice a ON l.DATE = a.DATE
INNER JOIN contracts.eans_power c ON l.ean = c.ean
LEFT OUTER JOIN timeseries.powerdeals d ON d.period_from <= l.period_from
AND d.period_until >= l.period_until
WHERE l.PERIOD_FROM >= a.PERIOD_FROM
AND l.PERIOD_FROM < a.PERIOD_UNTIL
AND l.DATE >= '2018-01-01'
AND l.DATE <= '2018-12-31'
GROUP BY l.date
Explain:
1 SIMPLE c NULL system PRIMARY,ean NULL NULL NULL 1 100.00 Using temporary; Using filesort
1 SIMPLE l NULL ref EAN EAN 21 const 35481 11.11 Using index condition
1 SIMPLE d NULL ALL NULL NULL NULL NULL 1 100.00 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE a NULL ref DATE DATE 4 timeseries.l.date 24 11.11 Using index condition
Create table queries:
apxprice
CREATE TABLE `apxprice` (
`apx_id` int(11) NOT NULL AUTO_INCREMENT,
`date` date DEFAULT NULL,
`period_from` time DEFAULT NULL,
`period_until` time DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`apx_id`),
KEY `DATE` (`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=29664 DEFAULT CHARSET=latin1
powerdeals
CREATE TABLE `powerdeals` (
`deal_id` int(11) NOT NULL AUTO_INCREMENT,
`date_deal` date NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`weekday_from` int(11) NOT NULL,
`weekday_until` int(11) NOT NULL,
`period_from` time NOT NULL,
`period_until` time NOT NULL,
`hourly_volume` int(11) NOT NULL,
`price` int(11) NOT NULL,
`type_deal_id` int(11) NOT NULL,
`contract_id` int(11) NOT NULL,
PRIMARY KEY (`deal_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
powerload
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`ean` varchar(18) DEFAULT NULL,
`date` date DEFAULT NULL,
`period_from` time DEFAULT NULL,
`period_until` time DEFAULT NULL,
`powerload` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`),
KEY `EAN` (`ean`,`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin1
eans_power
CREATE TABLE `eans_power` (
`ean` char(19) NOT NULL,
`contract_id` int(11) NOT NULL,
`invoicing_id` int(11) NOT NULL,
`street` varchar(255) NOT NULL,
`number` int(11) NOT NULL,
`affix` char(11) NOT NULL,
`postal` char(6) NOT NULL,
`city` varchar(255) NOT NULL,
PRIMARY KEY (`ean`),
KEY `ean` (`ean`,`contract_id`,`invoicing_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Sample data tables
apx_prices
apx_id,date,period_from,period_until,price
1,2016-01-01,00:00:00,01:00:00,23.86
2,2016-01-01,01:00:00,02:00:00,22.39
powerdeals
deal_id,date_deal,start_date,end_date,weekday_from,weekday_until,period_from,period_until,hourly_volume,price,type_deal_id,contract_id
1,2019-05-15,2018-01-01,2018-12-31,1,5,08:00:00,20:00:00,1000,50,3,1
powerload
powerload_id,ean,date,period_from,period_until,powerload
1,871688520000xxxxxx,2018-01-01,00:00:00,00:15:00,9
2,871688520000xxxxxx,2018-01-01,00:15:00,00:30:00,11
eans_power
ean,contract_id,invoicing_id,street,number,affix,postal,city
871688520000xxxxxx,1,1,road,14,postal,city
Result, without sum() and group by:
DATE,PERIOD_FROM,PRICE,POWERLOAD,a.PRICE*l.POWERLOAD,d.hourly_volume/4,
2018-01-01,00:00:00,27.20,9,244.80,NULL
2018-01-01,00:15:00,27.20,11,299.20,NULL
Result, with sum() and group by:
DATE, PERIOD_FROM, PRICE, POWERLOAD, SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
2018-01-01,08:00:00,26.33,21,46193.84,12250.0000
2018-01-02, 08:00:00,47.95,43,90623.98,12250.0000
Preliminary optimizations:
Use InnoDB, not MyISAM.
Use CHAR only for constant-lenght strings
Use consistent datatypes (see ean, for example)
For an alternative to using time-to-the-second, check out the Handler counts .
Because range tests (such as l.PERIOD_FROM >= a.PERIOD_FROM AND l.PERIOD_FROM < a.PERIOD_UNTIL) are essentially impossible to optimize, I recommend you expand the table to have one entry per hour (or 1 per quarter hour, if necessary). Looking up a row via a key is much faster than doing a scan of "ALL" the table. 9K rows for an entire year is trivial.
When you get past these recommendations (and the Comments), I will have more tips on optimizing the indexes, especially InnoDB's PRIMARY KEY.
I have found very long query in my system. The MySQL Slow Log says the following:
# Time: 2018-07-08T18:47:02.273314Z
# User#Host: server[server] # localhost [] Id: 1467
# Query_time: 97.251247 Lock_time: 0.000210 Rows_sent: 50 Rows_examined: 41646378
SET timestamp=1531075622;
SELECT n1.full_name AS sender_full_name, s1.email AS sender_email,
e.subject, e.body, e.attach, e.date, e.id, r.status,
n2.full_name AS receiver_full_name, s2.email AS receiver_email,
r.basket,
FROM email_routing r
JOIN email e ON e.id = r.message_id
JOIN people_emails s1 ON s1.id = r.sender_email_id
JOIN people n1 ON n1.id = s1.people_id
JOIN people_emails s2 ON s2.id = r.receiver_email_id
JOIN people n2 ON n2.id = s2.people_id
WHERE r.sender_email_id = 21897 ORDER BY e.date desc LIMIT 0, 50;
The EXPLAIN query shows no full table scan and the query using indexes:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE s1 NULL const PRIMARY PRIMARY 4 const 1 100.00 Using temporary; Using filesort
1 SIMPLE n1 NULL const PRIMARY,ppl PRIMARY 4 const 1 100.00 NULL
1 SIMPLE n2 NULL index PRIMARY,ppl ppl 771 NULL 1 100.00 Using index
1 SIMPLE s2 NULL index PRIMARY s2 771 NULL 3178 10.00 Using where; Using index; Using join buffer (Block Nested Loop)
1 SIMPLE r NULL ref bk1,bk2,msgid bk1 4 server.s2.id 440 6.60 Using where; Using index
1 SIMPLE e NULL eq_ref PRIMARY PRIMARY 4 server.r.message_id 1 100.00 NULL
Here is my SHOW CREATE TABLE queries for the used tables:
CREATE TABLE `email_routing` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`message_id` int(11) NOT NULL,
`sender_email_id` int(11) NOT NULL,
`receiver_email_id` int(11) NOT NULL,
`basket` int(11) NOT NULL,
`status` int(11) NOT NULL,
`popup` int(11) NOT NULL DEFAULT '0',
`tm` int(11) NOT NULL DEFAULT '0',
KEY `id` (`id`),
KEY `bk1` (`receiver_email_id`,`status`,`sender_email_id`,`message_id`,`basket`),
KEY `bk2` (`sender_email_id`,`tm`),
KEY `msgid` (`message_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1055796 DEFAULT CHARSET=utf8
-
CREATE TABLE `email` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` text NOT NULL,
`body` text NOT NULL,
`date` datetime NOT NULL,
`attach` text NOT NULL,
`attach_dir` varchar(255) CHARACTER SET cp1251 DEFAULT NULL,
`attach_subject` varchar(255) DEFAULT NULL,
`attach_content` longtext,
`sphinx_synced` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `Index_2` (`attach_dir`),
KEY `dt` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=898001 DEFAULT CHARSET=utf8
-
CREATE TABLE `people_emails` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nick` varchar(255) NOT NULL,
`email` varchar(255) NOT NULL,
`key_name` varchar(255) NOT NULL,
`people_id` int(11) NOT NULL,
`status` int(11) NOT NULL DEFAULT '0',
`activity` int(11) NOT NULL,
`internal_user_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `s2` (`email`,`people_id`)
) ENGINE=InnoDB AUTO_INCREMENT=22146 DEFAULT CHARSET=utf8
-
CREATE TABLE `people` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(255) CHARACTER SET cp1251 NOT NULL,
`lname` varchar(255) CHARACTER SET cp1251 NOT NULL,
`patronymic` varchar(255) CHARACTER SET cp1251 NOT NULL,
`gender` tinyint(1) NOT NULL,
`full_name` varchar(255) NOT NULL DEFAULT ' ',
`category` int(11) NOT NULL,
`people_type_id` int(255) DEFAULT NULL,
`tags` varchar(255) CHARACTER SET cp1251 NOT NULL,
`job` varchar(255) CHARACTER SET cp1251 NOT NULL,
`post` varchar(255) CHARACTER SET cp1251 NOT NULL,
`profession` varchar(255) CHARACTER SET cp1251 DEFAULT NULL,
`zip` varchar(16) CHARACTER SET cp1251 NOT NULL,
`country` int(11) DEFAULT NULL,
`region` varchar(10) NOT NULL,
`city` varchar(255) CHARACTER SET cp1251 NOT NULL,
`address` varchar(255) CHARACTER SET cp1251 NOT NULL,
`address_date` date DEFAULT NULL,
`last_update_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `ppl` (`id`,`full_name`)
) ENGINE=InnoDB AUTO_INCREMENT=415040 DEFAULT CHARSET=utf8
Here is the SHOW TABLE STATUS output for those 4 tables:
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment
email InnoDB 10 Dynamic 753748 12079 9104785408 0 61112320 4194304 898167
email_routing InnoDB 10 Dynamic 900152 61 55132160 0 69419008 6291456 1056033
people InnoDB 10 Dynamic 9538 386 3686400 0 2785280 4194304 415040
people_emails InnoDB 10 Dynamic 3178 752 2392064 0 98304 4194304 22146
MySQL Version 5.7.22 Ubuntu 16.04
However I have noticed one thing - if I take ORDER BY out of the query, but leaving the LIMIT, then query runs almost instantly taking not more than 0.2 seconds. So I have started to think to run query with no ORDER BY and do sorting by PHP means like that but eventually that seems to complicated as using the LIMIT with no ORDER BY I get wronge range to sort.
Is there anything else I could do to speed up or optimize that query?
AS AN ALTERNATIVE I could do sorting and paging by my PHP code. I add addtional columnt into the SELECT ..., UNIX_TIMESTAMP(e.date) as ts and then do:
<?php
...
$main_query = $server->query($query);
$emails_list = $main_query->fetch_all(MYSQLI_ASSOC);
function cmp($a, $b) {
return strcmp($a['ts'], $b['ts']);
}
$emails_sorted = usort($emails_list, "cmp");
for ($i=$start;$i<$lenght;$i++)
{
$singe_email = $emails_sorted[$i]
// Format the output
}
But when I do that I get
Fatal error: Allowed memory size of 134217728 bytes exhausted
at line of $emails_sorted = usort($emails_list, "cmp");
Warning, I'm not very familiar with MySQL, in fact I'm mostly projecting MSSQL experience on top of things I (mostly) read about MySQL.
1) Potential workaround: is it safe to assume that email.id and email.date are always in the same order? From a functional point of view this seems logical as emails get added to the table over time and thus have an ever increasing auto-number... But maybe the initial load of the data was in a different/random order? Anyway, if it is, what happens if you ORDER BY e.id instead of ORDER BY e.date ?
2) Does adding a composite index on email (id, date) (in that order!) help?
3) If all of that does not help, splitting the query into 2 parts might help out the optimizer. (You may need to fix the syntax for MySQL)
-- Locate what we want first
CREATE TEMPORARY TABLE results (
SELECT e.id
r.basket
FROM email_routing r
JOIN email e ON e.id = r.message_id
WHERE r.sender_email_id = 21897
ORDER BY e.date desc LIMIT 0, 50 );
-- Again, having an index on email (id, date) seems like a good idea to me
-- (As a test you may want to add an index on results (id) here, shouldn't take long and
-- in MSSQl it would help build a better query plan, can't tell with MySQL)
-- return actual results
SELECT n1.full_name AS sender_full_name,
s1.email AS sender_email,
e.subject, e.body, e.attach, e.date, e.id, r.status,
n2.full_name AS receiver_full_name,
s2.email AS receiver_email,
r.basket,
FROM results r
JOIN email e ON e.id = r.message_id
JOIN people_emails s1 ON s1.id = r.sender_email_id
JOIN people n1 ON n1.id = s1.people_id
JOIN people_emails s2 ON s2.id = r.receiver_email_id
JOIN people n2 ON n2.id = s2.people_id
ORDER BY e.date desc
If your data comes back that quickly, how about wrapping it... but how many rows are actually GOING to be returned WITHOUT the LIMIT. Maybe you would still get better performance AFTER such as...
select PQ.*
from ( YourQueryWithoutOrderByAndLimt ) PQ
order by PQ.date desc
LIMIT 0, 50;
I suspect this is a case where the MySQL Join Optimizer overestimates the benefits of Block Nested Loop (BNL) join. You can try to turn off BNL by doing:
set optimizer_switch='block_nested_loop=off';
Hopefully this will provide a better join order. You could also try:
set optimizer_prune_level = 0;
to force the join optimizer to explore all possible join orders.
Another option is to use STRAIGHT_JOIN to force a particular join order. In this case, it seems the order as specified in the query text would be good. Hence, to force this particular join order you could write
SELECT STRAIGHT_JOIN ...
Note that whatever you do, you can not expect the query to be as fast as without ORDER BY. As long as you need to find the latest emails from a particular sender, and there is no information about sender in the email table, it is not possible to use an index to avoid sorting without going through all emails from all senders. Things would be different if you had information about date in the email_routing table. Then an index on that table could have been used to avoid sorting.
MySQL cannot use index for order by in your query because
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
MySQL Order By Optimization
I have two tables, main_part (3k records) and part_details (25k records)
I tried the following indexes but explain always returns full table scan of 25k records as opposed to about 2k of matched records and Using where; Using temporary; Using filesort
ALTER TABLE `main_part` ADD INDEX `main_part_index_1` (`unit`);
ALTER TABLE `part_details` ADD INDEX `part_details_index_1` (`approved`, `display`, `country`, `id`, `price`);
Here is my query:
SELECT a.part_id, b.my_title,
b.price, a.type,
a.unit, a.certification,
b.my_image,
b.price/a.unit AS priceW
FROM main_part AS a
INNER JOIN part_details AS b ON a.part_id=b.id
WHERE b.approved = 'Yes'
AND b.display = 'On'
AND b.country = 'US'
AND a.unit >= 300
ORDER BY b.price ASC LIMIT 50
One thing that I am aware of is that a.part_id is not a Primary Key in main_part table. Could this be a culprit?
Create tables SQL:
CREATE TABLE `main_part` (
`id` smallint(6) NOT NULL AUTO_INCREMENT,
`part_id` mediumint(9) NOT NULL DEFAULT '0',
`type` varchar(50) NOT NULL DEFAULT '',
`unit` varchar(50) NOT NULL DEFAULT '',
`certification` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `part_details` (
`id` mediumint(9) NOT NULL AUTO_INCREMENT,
`asn` varchar(50) NOT NULL DEFAULT '',
`country` varchar(5) NOT NULL DEFAULT '',
`my_title` varchar(200) NOT NULL DEFAULT '',
`display` varchar(5) NOT NULL DEFAULT 'On',
`approved` varchar(5) NOT NULL DEFAULT 'No',
`price` decimal(7,3) NOT NULL DEFAULT '0.000',
`my_image` varchar(250) NOT NULL DEFAULT '',
`update_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `countryasn` (`country`,`asn`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
For your query the more important index is the JOIN condition and as you are already aware a.part_id isn't primary key, so doesn't have a default index and your first try should be:
ALTER TABLE `main_part` ADD INDEX `main_part_index_1` (`part_id`,`unit`);
Because we are interested on the JOIN condition first you should also change the second index to
ALTER TABLE `part_details` ADD INDEX `part_details_index_1`
(`id`, `approved`, `display`, `country`, `price`);
order matters in the index
Another tip is you start with the basic query:
SELECT *
FROM main_part AS a
INNER JOIN part_details AS b ON a.part_id=b.id
Add index for part_id and id check the explain plan and then start adding condition and updating the index if required.
It seems that most columns used for filtering in part_details aren't very selective (display is probably an On/off switch, country is probably very similar in many products, etc.).
In some cases, when the WHERE clause is not very selective, MySQL may choose to use an index that better suits the ORDER BY clause.
I would try to create this index as well and check in the explain plan if there is any changes:
ALTER TABLE `part_details` ADD INDEX `part_details_price_index` (`price`);
For this query:
SELECT mp.part_id, pd.my_title, pd.price, mp.type,
mp.unit, mp.certification, pd.my_image,
pd.price/mp.unit AS priceW
FROM main_part mp INNER JOIN
part_details pd
ON mp.part_id = pd.id
WHERE pd.approved = 'Yes' AND
pd.display = 'On' AND
pd.country = 'US' AND
mp.unit >= 300
ORDER BY pd.price ASC
LIMIT 50;
For this query, I would start with indexes on part_details(country, display, approved, id, price) and main_part(part_id, unit).
The index on part_details can be used for filtering before the join. It is not easy to get rid of the sort for the order by.
I am in the middle of my first data science project and I am having some trouble with extremely slow queries using MySQL Workbench.
These are my 3 tables (each of which comes from data sets from various websites that have been cleaned up and inserted into MySQL):
CREATE TABLE IF NOT EXISTS `starbucks` (
`STORE_NUMBER` varchar(20) NOT NULL,
`CITY` varchar(50) NOT NULL,
`STATE` char(2) NOT NULL,
`ZIPCODE` char(5) NOT NULL,
`LONG` varchar(10) NOT NULL,
`LAT` varchar(10) NOT NULL,
PRIMARY KEY (`STORE_NUMBER`)
)ENGINE=InnoDB")
CREATE TABLE IF NOT EXISTS `income`(
`STATEFIPS` char(2) NOT NULL,
`STATE` char(2) NOT NULL,
`ZIPCODE` char(5) NOT NULL,
`AGI_STUB` tinyint NOT NULL,
`NUM_RETURNS` float(15,4) NOT NULL,
`TOTAL_INCOME` float(15,4) NOT NULL,
PRIMARY KEY (`STATE`, `ZIPCODE`, `AGI_STUB`)
)ENGINE=InnoDB")
CREATE TABLE IF NOT EXISTS `diversity`(
`COUNTY` varchar(50) NOT NULL,
`STATE` char(2) NOT NULL,
`INDEX` float(7,6) NOT NULL,
`1` float(3,1) NOT NULL,
`2` float(3,1) NOT NULL,
`3` float(3,1) NOT NULL,
`4` float(3,1) NOT NULL,
`5` float(3,1) NOT NULL,
`6` float(3,1) NOT NULL,
`7` float(3,1) NOT NULL,
PRIMARY KEY (`COUNTY`, `STATE`)
)ENGINE=InnoDB")
starbucks has 13,608 records,
income has 166,740 records,
diversity has 3,143 records.
The query I am trying to run:
SELECT i.TOTAL_INCOME,
CASE
WHEN s.STORE_NUMBER IS NOT NULL THEN 1
ELSE 0
END AS has_starbucks
FROM income as i
LEFT OUTER JOIN starbucks as s
ON i.ZIPCODE = s.ZIPCODE
If I limit the result to 1,000 rows it will run fast, however I need to get ALL of the records (no row limit) and this is resulting in the query never returning, and eventually timing out and disconnecting me from the MySQL server.
In the past when working for companies on their databases with MILLIONS of records in them I have never had this much trouble.
What table optimizations do I need to do to fix this? What MySQL settings do I need to change? Any other suggestions welcome.
EDIT
It appears that the 'Duration' of the query never exceeds 0.500 seconds, it is the 'Fetch' part that lasts > 120 seconds. I am not sure if this is useful information.
The first issue is create a proper index on join columns
CREATE INDEX idx1 ON starbucks (ZIPCODE );
CREATE INDEX idx2 ON income (ZIPCODE );
or a verbous index adding the column you select
CREATE INDEX idx2 ON income (ZIPCODE , TOTAL_INCOME);
and check the behavior using explain plan
This won't do much to fix your performance issue but will fix your duplicate row issue -
SELECT i.TOTAL_INCOME, 1 AS has_starbucks
FROM income as i
WHERE i.Zipcode in (Select zipcode from Starbucks)
UNION
SELECT i.TOTAL_INCOME, 0 AS has_starbucks
FROM income as i
WHERE i.Zipcode not in (Select zipcode from Starbucks)
EXISTS is sometimes more efficient than IN
SELECT i.TOTAL_INCOME, 1 AS has_starbucks
FROM income as i
WHERE EXISTS
( SELECT 1
FROM Starbucks s
WHERE s.zipcode = i.Zipcode
)
UNION
SELECT i.TOTAL_INCOME, 0 AS has_starbucks
FROM income as i
WHERE NOT EXISTS
( SELECT 1
FROM Starbucks s
WHERE s.zipcode = i.Zipcode
)