Optimizing MySQL query to reduced runtime - mysql

Below is the query that will going to run on two tables with 60+ million and 400+ million records. Only the table name will be different, otherwise query is same for both the tables.
SELECT * FROM
(
SELECT A.CUSIP, A.ISIN, A.SEDOL, A.LocalCode, A.MIC, A.ExchgCD, A.PrimaryExchgCD, A.Currency, A.Open, A.High, A.Low, A.Close, A.Mid, A.Ask, A.Last,
A.Bid, A.Bidsize, A.Asksize, A.TradedVolume, A.SecID, A.PriceDate, A.MktCloseDate, A.VolFlag, A.IssuerName, A.TotalTrades, A.CloseType, A.SectyCD,
row_number() OVER (partition by A.CUSIP order by A.MktCloseDate desc) as 'rank'
from EDI_Price04 A
WHERE A.CUSIP IN (
"91879Q109", "583840509", "583840608", "59001A102", "552848103") AND (A.PrimaryExchgCD = A.ExchgCD) AND A.CloseType='CC'
) t WHERE t.rank <= 3;
When A.CUSIP IN () condition have 10-15 values, the query complete in 2-3sec. With 400 values it took 28sec. But I want to make A.CUSIP IN () take 2k-3k value at a time.
This is my table structure.
CREATE TABLE `EDI_Price04` (
`MIC` varchar(6) NOT NULL DEFAULT '',
`LocalCode` varchar(60) NOT NULL DEFAULT '' COMMENT 'PricefileSymbol',
`ISIN` varchar(12) DEFAULT NULL,
`Currency` varchar(3) NOT NULL DEFAULT '',
`PriceDate` date DEFAULT NULL,
`Open` double DEFAULT NULL,
`High` double DEFAULT NULL,
`Low` double DEFAULT NULL,
`Close` double DEFAULT NULL,
`Mid` double DEFAULT NULL,
`Ask` double DEFAULT NULL,
`Last` double DEFAULT NULL,
`Bid` double DEFAULT NULL,
`BidSize` int(11) DEFAULT NULL,
`AskSize` int(11) DEFAULT NULL,
`TradedVolume` bigint(20) DEFAULT NULL,
`SecID` int(11) NOT NULL DEFAULT '0',
`MktCloseDate` date NOT NULL DEFAULT '0000-00-00',
`Volflag` char(1) DEFAULT NULL,
`IssuerName` varchar(255) DEFAULT NULL,
`SectyCD` varchar(3) DEFAULT NULL,
`SecurityDesc` varchar(255) DEFAULT NULL,
`SEDOL` varchar(7) DEFAULT NULL,
`CUSIP` varchar(9) DEFAULT NULL COMMENT 'USCode',
`PrimaryExchgCD` varchar(6) DEFAULT NULL,
`ExchgCD` varchar(6) NOT NULL DEFAULT '',
`TradedValue` double DEFAULT NULL,
`TotalTrades` int(11) DEFAULT NULL,
`Comment` varchar(255) DEFAULT NULL,
`Repush` tinyint(4) NOT NULL DEFAULT '0',
`CloseType` varchar(2) NOT NULL DEFAULT '',
PRIMARY KEY (`MIC`,`LocalCode`,`Currency`,`SecID`,`MktCloseDate`,`ExchgCD`,`Repush`,`CloseType`),
KEY `idx_EDI_Price04_0` (`MIC`),
KEY `idx_EDI_Price04_1` (`LocalCode`),
KEY `idx_EDI_Price04_2` (`ISIN`),
KEY `idx_EDI_Price04_3` (`PriceDate`),
KEY `idx_EDI_Price04_4` (`SEDOL`),
KEY `idx_EDI_Price04_5` (`CUSIP`),
KEY `idx_EDI_Price04_6` (`PrimaryExchgCD`),
KEY `idx_EDI_Price04_7` (`ExchgCD`),
KEY `idx_EDI_Price04_8` (`CloseType`),
KEY `idx_EDI_Price04_9` (`MktCloseDate`),
KEY `idx_EDI_Price04_CUSIP_ExchgCD_CloseType_MktCloseDate` (`CUSIP`,`ExchgCD`,`CloseType`,`MktCloseDate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

For this query:
SELECT *
FROM (SELECT a.*
ROW_NUMBER() OVER (PARTITION BY A.CUSIP ORDER BY A.MktCloseDate DESC) as rank
FROM EDI_Price04 A
WHERE A.CUSIP IN ('91879Q109', '583840509', '583840608', '59001A102', '552848103') AND
A.PrimaryExchgCD = A.ExchgCD AND
A.CloseType = 'CC'
) t
WHERE t.rank <= 3;
The place to start is with an index. For this query, you want an index on EDI_Price04(CloseType, CUSIP, ExchgCD, MktCloseDate).
Unfortunately, the condition A.PrimaryExchgCD = A.ExchgCD prevents index seeks. If you were to make changes to the query/data, then one approach would be to add a flag when these are the same, rather then looking at the values separately. That would allow an index on:
EDI_Price04(CloseType, IsPrimary, CUSIP, PrimaryExchgCD, ExchgCD, MktCloseDate)

PRIMARY KEY (id),
UNIQUE(`MIC`,`LocalCode`,`Currency`,`SecID`,`MktCloseDate`,
`ExchgCD`,`Repush`,`CloseType`),
-- KEY `idx_EDI_Price04_0` (`MIC`),
KEY `idx_EDI_Price04_1` (`LocalCode`),
KEY `idx_EDI_Price04_2` (`ISIN`),
KEY `idx_EDI_Price04_3` (`PriceDate`),
KEY `idx_EDI_Price04_4` (`SEDOL`),
-- KEY `idx_EDI_Price04_5` (`CUSIP`),
KEY `idx_EDI_Price04_6` (`PrimaryExchgCD`),
KEY `idx_EDI_Price04_7` (`ExchgCD`),
KEY `idx_EDI_Price04_8` (`CloseType`),
KEY `idx_EDI_Price04_9` (`MktCloseDate`),
KEY `idx_EDI_Price04_CUSIP_ExchgCD_CloseType_MktCloseDate` (`CUSIP`,
`ExchgCD`, `CloseType`, `MktCloseDate`)
KEY (CUSIP, MktCloseDate)
Having so many columns in the PK costs in space and insert time. So, I added an id, which needs to be AUTO_INCREMENT.
Keys 0 and 5 are redundant because of the rule "If you have INDEX(a,b), INDEX(a) redundant.
I added (CUSIP, MktCloseDate) in hopes that it will optimize the RANK expression.

Related

How can i tune this mysql query to run successfully

This is my query on mysql version 8
select sender, fullName, phoneNumber, addressState, businessName, bvn, max(date)
from tranlog t INNER JOIN agent a on t.sender = a.realId
where captureDate < '2022-03-01' and active = 'Y' and thirdparty = 0
group by sender
CREATE TABLE `agent` (
`id` bigint NOT NULL AUTO_INCREMENT,
`realId` varchar(19) DEFAULT NULL,
`active` char(1) DEFAULT NULL,
`phoneNumber` varchar(15) DEFAULT NULL,
`address` varchar(255) DEFAULT NULL,
`addressState` varchar(50) DEFAULT NULL,
`addressCity` varchar(50) DEFAULT NULL,
`fullName` varchar(255) DEFAULT NULL,
`businessName` varchar(255) DEFAULT NULL,
`corporate` bit(1) DEFAULT b'0',
`thirdparty` bit(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`id`),
KEY `id` (`fee_group`),
KEY `realId` (`realId`),
KEY `agent_password` (`password`),
KEY `agent_idx` (`active`,`thirdparty`),
) ENGINE=InnoDB AUTO_INCREMENT=29784 DEFAULT CHARSET=latin1;
CREATE TABLE `tranlog` (
`id` bigint NOT NULL AUTO_INCREMENT
`date` datetime DEFAULT NULL
`captureDate` date DEFAULT NULL
`sender` varchar(50) DEFAULT NULL
PRIMARY KEY (`id`)
KEY `tranlog_date` (`date`)
KEY `sender` (`sender`)
KEY `tranlog_capturedate_idx` (`captureDate`)
) ENGINE=InnoDB AUTO_INCREMENT=49373312 DEFAULT CHARSET=latin1"
But i keep getting 'C:windows\TEMP#sql1234_2' is full which i believe is about temporary table
I have increased the size of tmp_table_size and max_heap_size to 3G yet no reprieve, the error keeps poping up.
Any ideas on how to tune the query
Add a 'LIMIT CLAUSE' to your query. It seems you have a very large result set.

Huge speed difference in two similar queries (MySQL ORDER clause)

Important updade (explanation):
I realized that my query having single DESC order is 10 times slower that the same query with ASC order. The ordered field has an index. Is it normal behavior?
Original question with queries:
I have a mysql table with a few hundred of product items. It's suprising (for me) how 2 similar sql queries differs in terms of performance. I don't know why. Can you please give me a hint or explain why the difference is so huge?
This query takes 3ms:
SELECT
*
FROM
`product_items`
WHERE
(product_items.shop_active = 1)
AND (product_items.active = 1)
AND (product_items.active_category_id is not null)
AND (has_picture is not null)
AND (price_orig is not null)
AND (category_min_discount IS NOT NULL)
AND (product_items.slug is not null)
AND `product_items`.`active_category_id` IN (6797, 5926, 5806, 6852)
ORDER BY
price asc
LIMIT 1
But the following query takes already 169ms... Only difference is that the order clause contains 2 columns. "Price" value has each product, while "price top" has roughly only 1% of products.
SELECT
*
FROM
`product_items`
WHERE
(product_items.shop_active = 1)
AND (product_items.active = 1)
AND (product_items.active_category_id is not null)
AND (has_picture is not null)
AND (price_orig is not null)
AND (category_min_discount IS NOT NULL)
AND (product_items.slug is not null)
AND `product_items`.`active_category_id` IN (6797, 5926, 5806, 6852)
ORDER BY
price asc,
price_top desc
LIMIT 1
The table structure looks like this:
CREATE TABLE `product_items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`shop_id` int(11) DEFAULT NULL,
`item_id` varchar(255) DEFAULT NULL,
`productname` varchar(255) DEFAULT NULL,
`description` text,
`url` text,
`url_hash` varchar(255) DEFAULT NULL,
`img_url` text,
`price` decimal(10,2) DEFAULT NULL,
`price_orig` decimal(10,2) DEFAULT NULL,
`discount` decimal(10,2) DEFAULT NULL,
`discount_percent` decimal(10,2) DEFAULT NULL,
`manufacturer` varchar(255) DEFAULT NULL,
`delivery_date` varchar(255) DEFAULT NULL,
`categorytext` text,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`active_category_id` int(11) DEFAULT NULL,
`shop_active` int(11) DEFAULT NULL,
`active` int(11) DEFAULT '0',
`price_top` decimal(10,2) NOT NULL DEFAULT '0.00',
`attention_priority` int(11) DEFAULT NULL,
`attention_priority_over` int(11) DEFAULT NULL,
`has_picture` varchar(255) DEFAULT NULL,
`size` varchar(255) DEFAULT NULL,
`category_min_discount` int(11) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_product_items_on_url_hash` (`url_hash`),
KEY `index_product_items_on_shop_id` (`shop_id`),
KEY `index_product_items_on_active_category_id` (`active_category_id`),
KEY `index_product_items_on_productname` (`productname`),
KEY `index_product_items_on_price` (`price`),
KEY `index_product_items_on_discount_percent` (`discount_percent`),
KEY `index_product_items_on_price_top` (`price_top`)
) ENGINE=InnoDB AUTO_INCREMENT=1715708 DEFAULT CHARSET=utf8;
UPDATE
I realized that the difference is mainly in the type of ordering: if I use asc+asc for both columns the query takes around 6ms, if I use asc+desc or desc+asc, the query takes around 160ms..
Thank you.
If creating an index to help ORDER BY doesn't help, try creating an index that helps both WHERE and ORDER BY:
CREATE INDEX product_items_i1 ON product_items (
shop_active,
active,
active_category_id,
has_picture,
price_orig,
category_min_discount,
slug,
price,
price_top DESC
)
Obviously, this is a bit clunky, and you'll have to balance the performance gain for the query with the price of maintaining the index.

query slows down if add field to where condition

I have a table Mysql fiddle with about 500k records.
CREATE TABLE IF NOT EXISTS `p_transactions` (
`transaction_id` bigint(10) unsigned NOT NULL,
`amount` decimal(19,2) NOT NULL,
`dt` bigint(1) NOT NULL,
`transaction_status` int(1) NOT NULL,
`transaction_type` varchar(15) NOT NULL,
`payment_method` varchar(25) NOT NULL,
`notes` text NOT NULL,
`member_id` int(10) unsigned NOT NULL,
`new_amount` decimal(19,2) NOT NULL,
`paid_amount` decimal(19,2) NOT NULL,
`secret_code` char(40) NOT NULL,
`internal_status` varchar(40) NOT NULL,
`ip_addr` varchar(15) NOT NULL,
`description` text NOT NULL,
`seller_transaction_id` varchar(50) DEFAULT NULL,
`return_url` varchar(255) DEFAULT NULL,
`fail_url` varchar(255) DEFAULT NULL,
`success_url` varchar(255) DEFAULT NULL,
`result_url` varchar(255) DEFAULT NULL,
`user_fee` decimal(19,3) DEFAULT '0.000',
`currency` char(255) DEFAULT 'USD',
`gateway_transaction_id` char(255) DEFAULT NULL,
`load_amount` decimal(19,2) NOT NULL,
`transaction_mode` varchar(1) NOT NULL DEFAULT '',
`p_fee` decimal(19,2) NOT NULL,
`country` varchar(2) NOT NULL,
`email` varchar(255) NOT NULL,
`vat` decimal(19,2) NOT NULL DEFAULT '0.00',
`name` varchar(255) NOT NULL,
`bdate` varchar(255) NOT NULL,
`child_method` varchar(255) NOT NULL,
`processing_fee` decimal(19,2) NOT NULL DEFAULT '0.00',
`flat_fee` varchar(1) NOT NULL DEFAULT 'n',
`user_fee_sum` decimal(19,2) NOT NULL DEFAULT '0.00',
`p_fee_sum` decimal(19,2) NOT NULL DEFAULT '0.00',
`dt_open` bigint(1) NOT NULL DEFAULT '0',
`user_fee_type` varchar(1) NOT NULL DEFAULT 'r',
`custom_gateway_fee` decimal(19,2) NOT NULL DEFAULT '0.00',
`paid_currency` varchar(3) NOT NULL DEFAULT 'USD',
`paid_microtime` bigint(10) unsigned NOT NULL,
`check_ballance` varchar(1) NOT NULL DEFAULT 'n',
PRIMARY KEY (`transaction_id`),
KEY `member_id` (`member_id`),
KEY `payment_method` (`payment_method`),
KEY `child_method` (`child_method`),
KEY `check_ballance` (`check_ballance`),
KEY `dt` (`dt`),
KEY `transaction_type` (`transaction_type`),
KEY `paid_microtime` (`paid_microtime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
When I execute query
SELECT *
FROM `p_transactions`
WHERE dt >= 1517443200
AND dt <= 1523404799
AND member_id = 2051
ORDER BY `paid_microtime` DESC
LIMIT 50;
it runs 0,000 sec. (+ 0,016 sec. network)
but if I add to query this condition AND transaction_status = 7
SELECT *
FROM `p_transactions`
WHERE dt >= 1517443200
AND dt <= 1523404799
AND member_id = 2051
AND transaction_status = 7
ORDER BY `paid_microtime` DESC
LIMIT 50
query run 12,938 sec. (+ 0,062 sec. network)
Please help me to find out the reason of such behavior
PS. There was index on transaction_status and it increased execution time even more.
Add a suitable index, such as:
ON payzoff_transactions (member_id, dt)
or
ON payzoff_transactions (member_id, dt, transaction_status)
We want member_id column as the leading column in the index, because of the equality comparison, and we expect the result to be a substantially smaller subset of the entire table. We want dt column after that, because of the "range scan" on that.
Including additional columns in the index may allow MySQL to check that condition using values from the index, without a visit/lookup of the row in the underlying table pages.
Either of these indexes would be suitable for both of the queries shown in the question.
Use EXPLAIN to see the execution plan... which index is being used.
There's really no getting around the "Using filesort" operation, since we're pulling a small subset of the entire table.
(If we were pulling the entire table (or a huge subset), we might be able to avoid an expensive sort operation with an access plan that pulls rows in reverse index order, with that has an index with leading column of paid_microtime.)
For the original query have these
INDEX(member_id, dt)
INDEX(member_id, paid_microtime)
For the secondary query, have
INDEX(transaction_status, member_id, dt)
INDEX(transaction_status, member_id, paid_microtime)
Without getting into the details of the distribution of the data values, we cannot explain why one query so much slower; however, my 4 indexes should make both queries run faster most of the time.
More discussion of how I came up with those indexes (and why (member_id, dt, transaction_status) is not so good): http://mysql.rjweb.org/doc.php/index_cookbook_mysql

MySQL use separate indices for JOIN and GROUP BY

I am trying to execute following query
SELECT
a.sessionID AS `sessionID`,
firstSeen, birthday, gender,
isAnonymous, LanguageCode
FROM transactions AS trx
INNER JOIN actions AS a ON a.sessionID = trx.SessionID
WHERE a.ActionType = 'PURCHASE'
GROUP BY trx.TransactionNumber
Explain provides the following output
1 SIMPLE trx ALL TransactionNumber,SessionID NULL NULL NULL 225036 Using temporary; Using filesort
1 SIMPLE a ref sessionID sessionID 98 infinitiExport.trx.SessionID 1 Using index
The problem is that I am trying to use one field for join and different field for GROUP BY.
How can I tell MySQL to use different indices for same table?
CREATE TABLE `transactions` (
`SessionID` varchar(32) NOT NULL DEFAULT '',
`date` datetime DEFAULT NULL,
`TransactionNumber` varchar(32) NOT NULL DEFAULT '',
`CustomerECommerceTrackID` int(11) DEFAULT NULL,
`SKU` varchar(45) DEFAULT NULL,
`AmountPaid` double DEFAULT NULL,
`Currency` varchar(10) DEFAULT NULL,
`Quantity` int(11) DEFAULT NULL,
`Name` tinytext NOT NULL,
`Category` varchar(45) NOT NULL DEFAULT '',
`customerInfoXML` text,
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
KEY `TransactionNumber` (`TransactionNumber`),
KEY `SessionID` (`SessionID`)
) ENGINE=InnoDB AUTO_INCREMENT=212007 DEFAULT CHARSET=utf8;
CREATE TABLE `actions` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`sessionActionDate` datetime DEFAULT NULL,
`actionURL` varchar(255) DEFAULT NULL,
`sessionID` varchar(32) NOT NULL DEFAULT '',
`ActionType` varchar(64) DEFAULT NULL,
`CustomerID` int(11) DEFAULT NULL,
`IPAddressID` int(11) DEFAULT NULL,
`CustomerDeviceID` int(11) DEFAULT NULL,
`customerInfoXML` text,
PRIMARY KEY (`id`),
KEY `ActionType` (`ActionType`),
KEY `CustomerDeviceID` (`CustomerDeviceID`),
KEY `sessionID` (`sessionID`)
) ENGINE=InnoDB AUTO_INCREMENT=15042833 DEFAULT CHARSET=utf8;
Thanks
EDIT 1: My indexes were broken, I had to add (SessionID, TransactionNumber) index to transactions table, however now, when I try to include trx.customerInfoXML table mysql stops using index
EDIT 2 Another answer does not really solved my problem because it's not standard sql syntax and generally not a good idea to force indices.
For ORM users such syntax is a unattainable luxury.
EDIT 3 I updated my indices and it solved the problem, see EDIT 1

Perf of select mysql query is really bad

I'm not sure why this query is taking 4 minutes to complete:
SELECT
su.sid,u.uid,u.display_name,u.locale
FROM user u
LEFT JOIN subscription_user su ON su.uid = u.uid
ORDER BY u.display_name DESC
LIMIT 0,25;
Well, I know it's due to the order, remove it and it's very fast. If I change to using INNER JOIN instead it's fast but the issue is not all users may be in the subscription_user table.
CREATE TABLE `user` (
`uid` int(11) NOT NULL AUTO_INCREMENT,
`password` varchar(100) DEFAULT NULL,
`user_type` varchar(10) NOT NULL DEFAULT 'user',
`display_name` varchar(50) NOT NULL,
`email` varchar(100) NOT NULL,
`locale` varchar(8) DEFAULT 'en',
`last_login` datetime DEFAULT NULL,
`auth_type` varchar(10) DEFAULT NULL,
`auth_data` varchar(500) DEFAULT NULL,
`inactive` tinyint(4) NOT NULL DEFAULT '0',
`receive_email` tinyint(4) NOT NULL DEFAULT '1',
`stateid` int(10) DEFAULT NULL,
`owner_group_id` int(11) DEFAULT NULL,
`signature` varchar(500) DEFAULT NULL,
`raw_signature` varchar(500) DEFAULT NULL,
`round_robin` smallint(5) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`uid`),
UNIQUE KEY `email` (`email`),
KEY `stateid` (`stateid`) USING BTREE,
KEY `user_type` (`user_type`) USING BTREE,
KEY `name` (`display_name`)
) ENGINE=InnoDB AUTO_INCREMENT=28343 DEFAULT CHARSET=latin1;
CREATE TABLE `subscription_user` (
`sid` varchar(50) NOT NULL,
`uid` int(11) NOT NULL,
`deleted` tinyint(4) NOT NULL DEFAULT '0',
`forum_user` varchar(50) NOT NULL,
PRIMARY KEY (`sid`,`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
When you have an SQL query, the index can only really help you if the first column in the index is part of the query.
Your query joins su.uid = u.uid and the optimizer will not be able to use that to reference the first column in the subscription primary key index.
You should either reverse the order of the columns in the primary key, or alternatively, you should add a foreign key index, or an independent index on the uid