I have a table in MySQL (InnoDB engine) with 100M records. Structure is as below:
CREATE TABLE LEDGER_AGR (
ID BIGINT(20) NOT NULL AUTO_INCREMENT,
`Booking` int(11) NOT NULL,
`LType` varchar(5) NOT NULL,
`PType` varchar(5) NOT NULL,
`FType` varchar(5) DEFAULT NULL,
`TType` varchar(10) DEFAULT NULL,
`AccountCode` varchar(55) DEFAULT NULL,
`AgAccountId` int(11) DEFAULT '0',
`TransactionDate` date NOT NULL,
`DebitAmt` decimal(37,6) DEFAULT '0.000000',
`CreditAmt` decimal(37,6) DEFAULT '0.000000',
KEY `TRANSACTION_DATE` (`TransactionDate`)
)
ENGINE=InnoDB;
When I am doing:
EXPLAIN
SELECT * FROM LEDGER_AGR
WHERE TransactionDate >= '2000-08-01'
AND TransactionDate <= '2017-08-01'
It is not using TRANSACTION_DATE index. But when I am doing:
EXPLAIN
SELECT * FROM LEDGER_AGR
WHERE TransactionDate = '2000-08-01'
it is using TRANSACTION_DATE index. Could someone explain please?
Range query #1 has poor selectivity. Equality query #2 has excellent selectivity. The optimizer is very likely to choose the index access path when result rows will be < 1% of total rows in table. The backend optimizer is unlikely to prefer the index when result rows will be a large fraction of the total, for example a half or a quarter of all rows.
A range of '2000-08-01' thru '2000-08-03' would likely exploit the index.
cf: mysql not using index?
Related
I have 3 simple tables
Invoices ( ~500k records )
Invoice items, one-to-many relation to invoices ( ~10 million records )
Invoice payments, one-to-many relation to invoices ( ~700k records )
Now, as simple as it sounds, I need to query for unpaid invoices.
Here is the query I am using:
select * from invoices
LEFT JOIN (SELECT invoice_id, SUM(price) as totalAmount
FROM invoice_items
GROUP BY invoice_id) AS t1
ON t1.invoice_id = invoices.id
LEFT JOIN (SELECT invoice_id, SUM(payed_amount) as totalPaid
FROM invoice_payment_transactions
GROUP BY invoice_id) AS t2
ON t2.invoice_id = invoices.id
WHERE totalAmount > totalPaid
Unfortunately, this query takes around 30 seconds, so way to slow.
Of course I have indexes set for "invoice_id" on both payments and items.
When I "EXPLAIN" the query, I can see that mysql has to do a full table scan.
I also tried several other query approaches, using "EXISTS" or "IN" with subqueries, but I never got around the full table scan.
Pretty sure there is not much that can be done here ( except use some caching approach ), but maybe someone knows how to optimize this ?
I need this query to run in a +/-2 seconds max.
EDIT:
Thanks to everybody for trying. Please just know that I absolutely know how to adopt different caching strategies here, but this question is purely about optimizing this query !
Here are the ( simplified ) table definitions
CREATE TABLE `invoices`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`date` date NOT NULL,
`title` enum ('M','F','Other') DEFAULT NULL,
`first_name` varchar(191) DEFAULT NULL,
`family_name` varchar(191) DEFAULT NULL,
`street` varchar(191) NOT NULL,
`postal_code` varchar(10) NOT NULL,
`city` varchar(191) NOT NULL,
`country` varchar(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`)
) ENGINE = InnoDB
CREATE TABLE `invoice_items`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`name` varchar(191) DEFAULT NULL,
`description` text DEFAULT NULL,
`reference` varchar(191) DEFAULT NULL,
`quantity` smallint(6) NOT NULL,
`price` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `invoice_items_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
CREATE TABLE `invoice_payment_transactions`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`transaction_identifier` varchar(191) NOT NULL,
`payed_amount` mediumint(9) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `invoice_payment_transactions_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
Plan A:
Summary table by invoice_id and day. (as Bill suggested) Summary Tables
Plan B:
Change the design to be "current" and "history". This is where the "payments" is a "history" of money changing hands. Meanwhile "invoices" would be "current" in that it contains a "balance_owed" column. This is a philosophy change; it could (should) be encapsulated in a client subroutine and/or a database Stored Procedure.
Plan C: This may be useful if "most" of the invoices are paid off.
Have a flag in the invoices table to indicate paid-off. That will prevent "most" of the JOINs from occurring. (Well, adding that column is just as hard as doing Plan B.)
I have the following table in MYSQL 8:
create table session
(
ID bigint unsigned auto_increment
primary key,
session_id varchar(255) null,
username varchar(255) null,
session_status varchar(255) null,
session_time int null,
time_created int null,
time_last_updated int null,
time_ended int null,
date_created date null,
);
I'm executing the following statement:
select * from session where username = VALUE and session_id = VALUE order by time_created desc
What is the the optimal index for the table to speed up this query?
The EXPLAIN query tells me I have two potential indexes, which are:
create index username_3
on session (username, time_created);
create index username_session_id_time_created_desc
on session (username, session_id, time_created desc);
I would have thought the index 'username_session_id_time_created_desc' would have been picked, however the EXPLAIN statement says that index 'username_3' is selected instead.
EDIT*
Result of SHOW CREATE TABLE session:
CREATE TABLE `session` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`session_id` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`username` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`session_status` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`session_time` int(11) DEFAULT NULL,
`time_created` int(11) DEFAULT NULL,
`time_last_updated` int(11) DEFAULT NULL,
`time_ended` int(11) DEFAULT NULL,
`date_created` date DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `username_3` (`username`,`time_created`),
KEY `username_session_id_time_created_desc` (`username`,`session_id`,`time_created`)
) ENGINE=InnoDB AUTO_INCREMENT=76149265 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Result of EXPLAIN statement:
select_type: SIMPLE
type: ref
possible_keys: username_3,username_session_id_time_created_desc
key: username_3
key_len: 768
ref: const
rows: 1
Extra: Using where
For this query:
select *
from session
where username = %s and session_id = %s
order by time_created desc
The optimal index is (username, session_id, time_created desc). The first two columns can be in either order.
First I thought you had a typo because you wrote
create index username_session_id_time_created_desc on session (username, session_id, time_created desc);
But your create table shows
KEY `username_session_id_time_created_desc` (`username`,`session_id`,`time_created`)
instead of
KEY `username_session_id_time_created_desc` (`username`,`session_id`,`time_created` DESC)
But, I now think it is using username_3 in MySQL 8.0 because of backward_index_scan (I did not read it all)
How you can tell : there is no filesort in your simple EXPLAIN, so it must be the optimizer can use the username_3 index to sort your result set. (If you remove time_created from any index, you will see Using filesort. If you are not using MySQL 8.0, it might be that the sorting can also be done with username_3 index in your version.)
Fiddle in 5.7 shows "key": "username_session_id_time_created_desc", : well sometimes, but not always... it might depends on the index length (field length).
Whereas in MySQL 8.0 it shows "key": "username_3", and "backward_index_scan": true,
With only the 3 column index it shows a lower query cost so why choose the other index?
My guess is that the 2 column index is much shorter, and since the backward index scan is possible, the sorting is cheap, and the optimizer still prefer to have less IO, and little more computing.
i have created partitions on pricing table. below is the alter statement.
ALTER TABLE `price_tbl`
PARTITION BY HASH(man_code)
PARTITIONS 87;
one partition consists of 435510 records. total records in price_tbl is 6 million.
EXPLAIN query showing only one partion is used for the query . Still the query takes 3-4 sec to execute. below is the query
EXPLAIN SELECT vrimg.image_cap_id,vm.man_name,vr.range_code,vr.range_name,vr.range_url, MIN(`finance_rental`) AS from_price, vd.der_id AS vehicle_id FROM `range_tbl` vr
LEFT JOIN `image_tbl` vrimg ON vr.man_code = vrimg.man_code AND vr.type_id = vrimg.type_id AND vr.range_code = vrimg.range_code
LEFT JOIN `manufacturer_tbl` vm ON vr.man_code = vm.man_code AND vr.type_id = vm.type_id
LEFT JOIN `derivative_tbl` vd ON vd.man_code=vm.man_code AND vd.type_id = vr.type_id AND vd.range_code=vr.range_code
LEFT JOIN `price_tbl` vp ON vp.vehicle_id = vd.der_id AND vd.type_id = vp.type_id AND vp.product_type_id=1 AND vp.maintenance_flag='N' AND vp.man_code=164
AND vp.initial_rentals_id =(SELECT rental_id FROM `rentals_tbl` WHERE rental_months='9')
AND vp.annual_mileage_id =(SELECT annual_mileage_id FROM `mileage_tbl` WHERE annual_mileage='8000')
WHERE vr.type_id = 1 AND vm.man_url = 'audi' AND vd.type_id IS NOT NULL GROUP BY vd.der_id
Result of EXPLAIN.
Same query without partitioning takes 3-4 sec.
Query with partitioning takes 2-3 sec.
how we can increase query performance as it is too slow yet.
attached create table structure.
price table - This consists 6 million records
CREATE TABLE `price_tbl` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`lender_id` bigint(20) DEFAULT NULL,
`type_id` bigint(20) NOT NULL,
`man_code` bigint(20) NOT NULL,
`vehicle_id` bigint(20) DEFAULT NULL,
`product_type_id` bigint(20) DEFAULT NULL,
`initial_rentals_id` bigint(20) DEFAULT NULL,
`term_id` bigint(20) DEFAULT NULL,
`annual_mileage_id` bigint(20) DEFAULT NULL,
`ref` varchar(255) DEFAULT NULL,
`maintenance_flag` enum('Y','N') DEFAULT NULL,
`finance_rental` decimal(20,2) DEFAULT NULL,
`monthly_rental` decimal(20,2) DEFAULT NULL,
`maintenance_payment` decimal(20,2) DEFAULT NULL,
`initial_payment` decimal(20,2) DEFAULT NULL,
`doc_fee` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`,`type_id`,`man_code`),
KEY `type_id` (`type_id`),
KEY `vehicle_id` (`vehicle_id`),
KEY `term_id` (`term_id`),
KEY `product_type_id` (`product_type_id`),
KEY `finance_rental` (`finance_rental`),
KEY `type_id_2` (`type_id`,`vehicle_id`),
KEY `maintenanace_idx` (`maintenance_flag`),
KEY `lender_idx` (`lender_id`),
KEY `initial_idx` (`initial_rentals_id`),
KEY `man_code_idx` (`man_code`)
) ENGINE=InnoDB AUTO_INCREMENT=5830708 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (man_code)
PARTITIONS 87 */
derivative table - This consists 18k records.
CREATE TABLE `derivative_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`der_cap_code` varchar(20) DEFAULT NULL,
`der_id` bigint(20) DEFAULT NULL,
`body_style_id` bigint(20) DEFAULT NULL,
`fuel_type_id` bigint(20) DEFAULT NULL,
`trans_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`model_code` bigint(20) DEFAULT NULL,
`der_name` varchar(255) DEFAULT NULL,
`der_url` varchar(255) DEFAULT NULL,
`der_intro_year` date DEFAULT NULL,
`der_disc_year` date DEFAULT NULL,
`der_last_spec_date` date DEFAULT NULL,
KEY `der_id` (`der_id`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`),
KEY `range_code` (`range_code`),
KEY `model_code` (`model_code`),
KEY `body_idx` (`body_style_id`),
KEY `capcodeidx` (`der_cap_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
range table - This consists 1k records
CREATE TABLE `range_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`range_name` varchar(255) DEFAULT NULL,
`range_url` varchar(255) DEFAULT NULL,
KEY `range_code` (`range_code`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY HASH is essentially useless if you are hoping for improved performance. BY RANGE is useful in a few use cases_.
In most situations, improvements in indexes are as good as trying to use partitioning.
Some likely problems:
No explicit PRIMARY KEY for InnoDB tables. Add a natural PK, if applicable, else an AUTO_INCREMENT.
No "composite" indexes -- they often provide a performance boost. Example: The LEFT JOIN between vr and vrimg involves 3 columns; a composite index on those 3 columns in the 'right' table will probably help performance.
Blind use of BIGINT when smaller datatypes would work. (This is an I/O issue when the table is big.)
Blind use of 255 in VARCHAR.
Consider whether most of the columns should be NOT NULL.
That query may be a victim of the "explode-implode" syndrome. This is where you do JOIN(s), which create a big intermediate table, followed by a GROUP BY to bring the row-count back down.
Don't use LEFT unless the 'right' table really is optional. (I see LEFT JOIN vd ... vd.type_id IS NOT NULL.)
Don't normalize "continuous" values (annual_mileage and rental_months). It is not really beneficial for "=" tests, and it severely hurts performance for "range" tests.
Same query without partitioning takes 3-4 sec. Query with partitioning takes 2-3 sec.
The indexes almost always need changing when switching between partitioning and non-partitioning. With the optimal indexes for each case, I predict that performance will be close to the same.
Indexes
These should help performance whether or not it is partitioned:
vm: (man_url)
vr: (man_code, type_id) -- either order
vd: (man_code, type_id, range_code, der_id)
-- `der_id` 4th, else in any order (covering)
vrimg: (man_code, type_id, range_code, image_cap_id)
-- `image_cap_id` 4th, else in any order (covering)
vp: (type_id, der_id, product_type_id, maintenance_flag,
initial_rentals, annual_mileage, man_code)
-- any order (covering)
A "covering" index is an extra boost, in that it can do all the work just in the index's BTree, without touching the data's BTree.
Implement a bunch of what I recommend, then come back (in another Question) for further tweaking.
Usually the "partition key" should be last in a composite index.
I have a table (logs) holding approximately 100k rows. Each row has a timestamp associated with when it was created. When I sort by this timestamp, even with numerous WHERE criteria, the query is much slower than without a sort. I can't seem to find a way to speed it up. I've tried all kinds of indexes.
The query is returning about 25k rows. I have similar queries that need to be run, with slightly different WHERE criteria.
With the ORDER BY, the query takes 0.6 seconds. Without the ORDER BY, the query takes 0.003 seconds.
The table structure is as follows.
CREATE TABLE IF NOT EXISTS `logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`shipment_id` int(11) DEFAULT NULL,
`time` timestamp NULL DEFAULT NULL,
`initials` varchar(50) DEFAULT NULL,
`result` int(11) DEFAULT NULL,
`information` int(11) DEFAULT NULL,
`issues` varchar(5) DEFAULT NULL,
`fw_actions` varchar(999) DEFAULT NULL,
`noncompliant` tinyint(4) DEFAULT NULL,
`noncompliant_lead_initials` varchar(50) DEFAULT NULL,
`noncompliant_lead_time` varchar(20) DEFAULT NULL,
`event_id` int(11) DEFAULT NULL,
`action_id` int(11) DEFAULT NULL,
`resolution_id` int(11) DEFAULT NULL,
`noncompliant_reviewed` tinyint(4) NOT NULL DEFAULT '0',
`violation` tinyint(4) DEFAULT NULL,
`approved` tinyint(4) NOT NULL DEFAULT '0',
`approved_time` timestamp NULL DEFAULT NULL,
`approver` int(11) DEFAULT NULL,
`reviewed` tinyint(4) NOT NULL DEFAULT '0',
`reviewed_time` timestamp NULL DEFAULT NULL,
`reviewer` int(11) DEFAULT NULL,
`editor` int(11) DEFAULT NULL,
`summary` varchar(999) DEFAULT NULL,
`updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `LOGS_SHIPMENT_ID_TIME` (`shipment_id`,`time`,`action_id`),
KEY `SHIPMENT_ID_IDX` (`shipment_id`),
KEY `logs_updated_index` (`updated`),
KEY `violation_idx` (`violation`,`approved`,`reviewed`,`shipment_id`,`time`,`reviewer`,`approver`,`editor`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=100022 ;
The query is
SELECT * FROM logs
WHERE (logs.approved != 1) AND (logs.violation = 1)
ORDER BY logs.`time` DESC
My EXPLAIN looks like this
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE logs ref violation_idx violation_idx 2 const 1000 Using index condition; Using where; Using filesort
Anyone have a trick here? Thanks!
The key_len column says that MySQL is only using 2 bytes of the index "violation_idx". So it's only using the first two columns, "violation" and "approved", each of which is a tinyint (one byte).
You might be able to improve the performance of this query by making "time" the third column in this index. Currently, it's the fifth column. I don't know what other queries you're doing; this kind of change might hurt performance in other queries.
Also, you might be able to improve the performance by creating an additional index on the "time" column alone. Both those things are worth testing.
Most dbms will benefit from an index that has a descending sort on "time", but MySQL won't.
An index_col_name specification can end with ASC or DESC. These
keywords are permitted for future extensions for specifying ascending
or descending index value storage. Currently, they are parsed but
ignored; index values are always stored in ascending order.
You'll have to find your own comfort level with that. Today, creating an index "DESC" expresses your intent clearly, but a future upgrade to MySQL that starts parsing and implementing that expression might hurt performance for other queries.
Create an index on just time. Further indexes can aid with additional filters in the direction of your time index.
I've created the following table:
CREATE TABLE `clicks_summ` (
`dt` INT(7) UNSIGNED NOT NULL,
`banner` SMALLINT(6) UNSIGNED NOT NULL,
`client` SMALLINT(6) UNSIGNED NOT NULL,
`channel` SMALLINT(6) UNSIGNED NOT NULL,
`cnt` INT(11) UNSIGNED NOT NULL,
`lpid` INT(11) NULL DEFAULT NULL,
UNIQUE INDEX `dt` (`dt`, `banner`, `client`, `channel`, `lpid`),
INDEX `banner` (`banner`),
INDEX `channel` (`channel`),
INDEX `client` (`client`),
INDEX `lpid` (`lpid`),
INDEX `cnt` (`cnt`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB;
and i am using following query to fetch rows/records from this table:
select client, sum(cnt) cnt
from clicks_summ cs
group by client;
and it's awful! It takes about a second to perform this query. EXPLAIN shows me
So, the question is: how I can speed up this query? I've tried indexing this table on different fields without any reasonable success. Now there are 331 036 rows in this table, I guess, is not so big.
Try craete INDEX client_cnt(client, cnt). Another way to make query faster is upgrade your hardware:)
one cool thing
if you always have 5 col in your where clause when a grouped index of 5 col will outperform 5 individual indexes ;)