Better MySQL Query Performance - mysql

I want to get the SUM of QTYs of an item# grouped be months, the query takes too long (15 - 20) seconds to fetch.
-Total rows: 1495873
-Total Fetched rows: 9 - 12
The relation between two tables (invoice_header and invoice_detail) is (one to many) that the invoice_header is the header of an invoice, with only totals. Which is linked to invoice_detail using location ID (loc_id) and Invoice number (invo_no), as each location has its own serial number. The invoice detail contains the details of each invoice.
Is there's a better way to enhance the performance of that query, here it's:
SELECT SUM(invoice_detail.qty) AS qty, Month(invoice_header.date) AS month
FROM invoice_detail
JOIN invoice_header ON invoice_detail.invo_no = invoice_header.invo_no
AND invoice_detail.loc_id = invoice_header.loc_id
WHERE invoice_detail.item_id = {$itemId}
GROUP BY Month(invoice_header.date)
ORDER BY Month(invoice_header.date)
EXPLAIN:
invoice_header table structure:
CREATE TABLE `invoice_header` (
`invo_type` varchar(1) NOT NULL,
`invo_no` int(20) NOT NULL AUTO_INCREMENT,
`invo_code` varchar(50) NOT NULL,
`date` date NOT NULL,
`time` time NOT NULL,
`cust_id` int(11) NOT NULL,
`loc_id` int(3) NOT NULL,
`cash_man_id` int(11) NOT NULL,
`sales_man_id` int(11) NOT NULL,
`ref_invo_no` int(20) NOT NULL,
`total_amount` decimal(19,2) NOT NULL,
`tax` decimal(19,2) NOT NULL,
`discount_amount` decimal(19,2) NOT NULL,
`net_value` decimal(19,2) NOT NULL,
`split` decimal(19,2) NOT NULL,
`qty` int(11) NOT NULL,
`payment_type_id` varchar(20) NOT NULL,
`comments` varchar(255) NOT NULL,
PRIMARY KEY (`invo_no`,`loc_id`)
) ENGINE=InnoDB AUTO_INCREMENT=20286 DEFAULT CHARSET=utf8
invoice_detail table structure:
CREATE TABLE `invoice_detail` (
`invo_no` int(11) NOT NULL,
`loc_id` int(3) NOT NULL,
`serial` int(11) NOT NULL,
`item_id` varchar(11) NOT NULL,
`size_id` int(5) NOT NULL,
`qty` int(11) NOT NULL,
`rtp` decimal(19,2) NOT NULL,
`type` tinyint(1) NOT NULL,
PRIMARY KEY (`invo_no`,`loc_id`,`serial`),
KEY `item_id` (`item_id`),
KEY `size_id` (`size_id`),
KEY `invo_no` (`invo_no`),
KEY `serial` (`serial`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

How long does it take by following SQL?
SELECT count(*)
FROM invoice_detail
WHERE invoice_detail.item_id = {$itemId}
If this SQL takes 15-20 seconds, you should add an index on field item_id on inovice_detail table.
The invoice_header already had the primary key on the join columns of invo_no and loc_id, so you don't need add other indexes on invoice_header table. But if you add an index on fields invo_no, loc_id and date, that may enhance a little performance by only index scan.

According to see the result of explain invoice_header can not use any multiple index.
You could create index on field invo_no on invoice_header Table.

Try creating a few indexes on some columns: invo_no, loc_id, item_id, date.
based on your explain, your query is scanning all rows in invoice_header. Index the columns that make the join... invo_no, loc_id

Related

How to optimize query by SUM of relations?

I have 3 simple tables
Invoices ( ~500k records )
Invoice items, one-to-many relation to invoices ( ~10 million records )
Invoice payments, one-to-many relation to invoices ( ~700k records )
Now, as simple as it sounds, I need to query for unpaid invoices.
Here is the query I am using:
select * from invoices
LEFT JOIN (SELECT invoice_id, SUM(price) as totalAmount
FROM invoice_items
GROUP BY invoice_id) AS t1
ON t1.invoice_id = invoices.id
LEFT JOIN (SELECT invoice_id, SUM(payed_amount) as totalPaid
FROM invoice_payment_transactions
GROUP BY invoice_id) AS t2
ON t2.invoice_id = invoices.id
WHERE totalAmount > totalPaid
Unfortunately, this query takes around 30 seconds, so way to slow.
Of course I have indexes set for "invoice_id" on both payments and items.
When I "EXPLAIN" the query, I can see that mysql has to do a full table scan.
I also tried several other query approaches, using "EXISTS" or "IN" with subqueries, but I never got around the full table scan.
Pretty sure there is not much that can be done here ( except use some caching approach ), but maybe someone knows how to optimize this ?
I need this query to run in a +/-2 seconds max.
EDIT:
Thanks to everybody for trying. Please just know that I absolutely know how to adopt different caching strategies here, but this question is purely about optimizing this query !
Here are the ( simplified ) table definitions
CREATE TABLE `invoices`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`date` date NOT NULL,
`title` enum ('M','F','Other') DEFAULT NULL,
`first_name` varchar(191) DEFAULT NULL,
`family_name` varchar(191) DEFAULT NULL,
`street` varchar(191) NOT NULL,
`postal_code` varchar(10) NOT NULL,
`city` varchar(191) NOT NULL,
`country` varchar(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`)
) ENGINE = InnoDB
CREATE TABLE `invoice_items`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`name` varchar(191) DEFAULT NULL,
`description` text DEFAULT NULL,
`reference` varchar(191) DEFAULT NULL,
`quantity` smallint(6) NOT NULL,
`price` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `invoice_items_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
CREATE TABLE `invoice_payment_transactions`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`transaction_identifier` varchar(191) NOT NULL,
`payed_amount` mediumint(9) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `invoice_payment_transactions_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
Plan A:
Summary table by invoice_id and day. (as Bill suggested) Summary Tables
Plan B:
Change the design to be "current" and "history". This is where the "payments" is a "history" of money changing hands. Meanwhile "invoices" would be "current" in that it contains a "balance_owed" column. This is a philosophy change; it could (should) be encapsulated in a client subroutine and/or a database Stored Procedure.
Plan C: This may be useful if "most" of the invoices are paid off.
Have a flag in the invoices table to indicate paid-off. That will prevent "most" of the JOINs from occurring. (Well, adding that column is just as hard as doing Plan B.)

What is the best index for this two query?

I want to avoid redundant index, so what is the best composite index for these two query? Based on my understanding, these two query cannot have the same composite index since one need country, the other one need product_id, but if I make the index as below, will it be redundant index, and effect the DB performance?
combine merchant_id, created_at and product_id
combine merchant_id, created_at and country
Query 1
SELECT * from shop_order
WHERE shop_order.merchant_id = ?
AND shop_order.created_at >= TIMESTAMP(?)
AND shop_order.created_at <= TIMESTAMP(?)
AND shop_order.product_id = ?) AS mytable
WHERE product_id IS NOT NULL GROUP BY product_id, title;
Query 2
SELECT COALESCE(SUM(total_price_usd),0) AS revenue,
COUNT(*) as total_order, COALESCE(province, 'Unknown') AS name
FROM shop_order
WHERE DATE(created_at) >= '2021-02-08 13:37:42'
AND DATE(created_at) <= '2021-02-14 22:44:13'
AND merchant_id IN (18,19,20,1)
AND country = 'Malaysia' GROUP BY province;
Table structure
CREATE TABLE `shop_order` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`merchant_id` bigint(20) DEFAULT NULL,
`order_id` bigint(20) NOT NULL,
`customer_id` bigint(20) DEFAULT NULL,
`customer_orders_count` varchar(45) DEFAULT NULL,
`customer_total_spent` varchar(45) DEFAULT NULL,
`customer_email` varchar(100) DEFAULT NULL,
`customer_last_order_name` varchar(45) DEFAULT NULL,
`currency` varchar(10) NOT NULL,
`total_price` decimal(20,8) NOT NULL,
`subtotal_price` decimal(20,8) NOT NULL,
`transaction_fee` decimal(20,8) DEFAULT NULL,
`total_discount` decimal(20,8) DEFAULT '0.00000000',
`shipping_fee` decimal(20,8) DEFAULT '0.00000000',
`total_price_usd` decimal(20,8) DEFAULT NULL,
`transaction_fee_usd` decimal(20,8) DEFAULT NULL,
`country` varchar(50) DEFAULT NULL,
`province` varchar(45) DEFAULT NULL,
`processed_at` datetime DEFAULT NULL,
`refunds` json DEFAULT NULL,
`ffm_status` varchar(50) DEFAULT NULL,
`gateway` varchar(45) DEFAULT NULL,
`confirmed` tinyint(1) DEFAULT NULL,
`cancelled_at` datetime DEFAULT NULL,
`cancel_reason` varchar(100) DEFAULT NULL,
`created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`order_number` bigint(1) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`financial_status` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `shop_order_unique` (`merchant_id`,`order_id`),
KEY `merchant_id` (`merchant_id`),
KEY `combine_idx1` (`country`,`merchant_id`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=2237 DEFAULT CHARSET=utf8mb4;
Please help me
Query 1:
INDEX(merchant_id, product_id, -- Put columns for "=" tests first (in any order)
created_at) -- then range
Query 2. First, avoid hiding created_at in a function call (DATE()); it prevents using it in an index.
INDEX(country, -- "="
merchant_id, -- IN
created_at) -- range (after removing DATE)
You are correct in saying that you need separate indexes for those queries. And possibly some of your existing indexes are needed for other queries.
Also, you already have a redundant index. Drop KEY merchant_id (merchant_id), -- You have ate least one other index starting with merchant_id.
Having extra indexes is only a minor performance drag. And the hit is during INSERT or if you UPDATE any column in an index. Generally, the benefit of the 'right' index to a SELECT outweighs the hit to the writes.
Having multiple unique indexes is somewhat a burden. Do you really need id since you have a "natural" PK made up of those two columns? Check to see if other tables need to join on id.
Consider shrinking many of the datasizes. BIGINT takes 8 bytes and has a range that is rarely needed. decimal(20,8) takes 10 bytes and allows up to a trillion dollars; this seems excessive, too. Is customer_orders_count a number?

Slow search query with a one to many join

My problem is a slow search query with a one-to-many relationship between the tables. My tables look like this.
Table Assignment
CREATE TABLE `Assignment` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ProjectId` int(10) unsigned NOT NULL,
`AssignmentTypeId` smallint(5) unsigned NOT NULL,
`AssignmentNumber` varchar(30) NOT NULL,
`AssignmentNumberExternal` varchar(50) DEFAULT NULL,
`DateStart` datetime DEFAULT NULL,
`DateEnd` datetime DEFAULT NULL,
`DateDeadline` datetime DEFAULT NULL,
`DateCreated` datetime DEFAULT NULL,
`Deleted` datetime DEFAULT NULL,
`Lat` double DEFAULT NULL,
`Lon` double DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `idx_assignment_assignment_type_id` (`AssignmentTypeId`),
KEY `idx_assignment_assignment_number` (`AssignmentNumber`),
KEY `idx_assignment_assignment_number_external`
(`AssignmentNumberExternal`)
) ENGINE=InnoDB AUTO_INCREMENT=5280 DEFAULT CHARSET=utf8;
Table ExtraFields
CREATE TABLE `ExtraFields` (
`assignment_id` int(10) unsigned NOT NULL,
`name` varchar(30) NOT NULL,
`value` text,
PRIMARY KEY (`assignment_id`,`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
My search query
SELECT
`Assignment`.`Id`, COL_5_72, COL_5_73, COL_5_74, COL_5_75, COL_5_76,
COL_5_77 FROM (
SELECT
`Assignment`.`Id`,
`Assignment`.`AssignmentNumber` AS COL_5_72,
`Assignment`.`AssignmentNumberExternal` AS COL_5_73 ,
`AssignmentType`.`Name` AS COL_5_74,
`Assignment`.`DateStart` AS COL_5_75,
`Assignment`.`DateEnd` AS COL_5_76,
`Assignment`.`DateDeadline` AS COL_5_77 FROM `Assignment`
CASE WHEN `ExtraField`.`Name` = "WorkDistrict" THEN
`ExtraField`.`Value` end as COL_5_78 FROM `Assignment`
LEFT JOIN `ExtraFields` as `ExtraField` on
`ExtraField`.`assignment_id` = `Assignment`.`Id`
WHERE `Assignment`.`Deleted` IS NULL -- Assignment should not be removed.
AND (1=1) -- Add assignment filters.
) AS q1
GROUP BY `Assignment`.`Id`
HAVING 1 = 1
AND COL_5_78 LIKE '%Amsterdam East%'
ORDER BY COL_5_72 ASC, COL_5_73 ASC;
When the table is only around 3500 records my query takes a couple of seconds to execute and return the results.
What is a better way to search in the related data? Should I just add a JSON field to the Assignment table and use the MySQL 5.7 Json query features? Or did I made a mistake in designing my database?
You are using select from subquery that forces MySQL to create unindexed temp table for each execution. Remove subquery (you really don't need it here) and it will be much faster.

MySQL Left join take too long for huge data

I have two tables .Property tables and it related photo.One property may have many photo but I want only one any of it related photo, When I use left join MySQL query it become too slow.
Here is my query
SELECT `sp_property`.`id` as propertyid, `sp_property`.`property_name`, `sp_property`.`property_price`, `sp_property`.`adv_type`, `sp_property`.`usd`, `images`.`filepath_name`
FROM (`sp_property`)
LEFT JOIN (select id, Max(property_id) as pid,filepath_name
from sp_property_images
group by property_id) `images`
ON `images`.`pid` = `sp_property`.`id`
WHERE `sp_property`.`published` = 'yes'
GROUP BY `propertyid`
ORDER BY `sp_property`.`feature_listing` desc, `submit_date` desc
LIMIT 1,20
CREATE TABLE IF NOT EXISTS `sp_property_images` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`property_id` varchar(100) NOT NULL,
`filepath_name` text,
`label_name` varchar(45) DEFAULT NULL,
`primary` char(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `property_id` (`property_id`),
KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=12941 ;
CREATE TABLE IF NOT EXISTS `sp_property` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`propertytype` varchar(50) NOT NULL,
`adv_type` varchar(45) NOT NULL,
`property_name` text,
`division` varchar(45) NOT NULL,
`township` varchar(45) NOT NULL,
`property_price` decimal(20,2) unsigned DEFAULT NULL,
`price_type` varchar(45) NOT NULL,
`availability` varchar(100) DEFAULT NULL,
`property_address` text,
`p_dimension_length` varchar(45) NOT NULL,
`p_dimension_width` varchar(45) NOT NULL,
`p_dimension_sqft` varchar(45) NOT NULL,
`p_dimension_acre` varchar(45) NOT NULL,
`floor` varchar(45) NOT NULL,
`phone` varchar(100) DEFAULT NULL,
`aircorn` varchar(45) NOT NULL,
`ownership` varchar(45) NOT NULL,
`bedroom` varchar(45) NOT NULL,
`bathroom` varchar(45) NOT NULL,
`special_feature` text,
`amentites` text,
`property_detail` text,
`submit_date` datetime DEFAULT NULL,
`published` varchar(45) NOT NULL,
`published_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`agent_id` varchar(45) NOT NULL,
`source` varchar(45) NOT NULL,
`contact_name` varchar(100) NOT NULL,
`contact_no` varchar(100) NOT NULL,
`contact_address` text NOT NULL,
`contact_email` varchar(100) NOT NULL,
`unit_type` varchar(100) DEFAULT NULL,
`map_lat` varchar(100) DEFAULT NULL,
`map_long` varchar(100) DEFAULT NULL,
`show_map` varchar(3) DEFAULT 'no',
`total_view` bigint(20) NOT NULL DEFAULT '0',
`feature_listing` varchar(10) NOT NULL DEFAULT 'no',
`new_homes_id` int(11) NOT NULL,
`publish_price` int(1) NOT NULL DEFAULT '0',
`usd` decimal(20,2) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=18524 ;
Have you added indices on your tables? You would need three indices on the following columns:
article_photo.a_id for grouping and joining
article_photo.p_id for sorting
article.a_id for joining (although this is hopefully already the PK of your table)
The result of joins is not guaranteed to be sorted in any order, so you probably want to move your ORDER BY clause from the subquery to the outer query:
SELECT * from `article`
LEFT JOIN (
SELECT * from `article_photo`
GROUP BY `a_id`) as images
ON article.a_id = images.a_id
ORDER BY images.p_id DESC
Also, you have no guarantee on which article_photo you will get, since select data without an aggregate function (and only MySQL will allow you to do that).
Now that the question contains the real tables and all information essential to answering, here's my take – first, here's your query:
SELECT `sp_property`.`id` as propertyid, `sp_property`.`property_name`, `sp_property`.`property_price`, `sp_property`.`adv_type`, `sp_property`.`usd`, `images`.`filepath_name`
FROM (`sp_property`)
LEFT JOIN (select id, Max(property_id) as pid,filepath_name
from sp_property_images
group by property_id) `images`
ON `images`.`pid` = `sp_property`.`id`
WHERE `sp_property`.`published` = 'yes'
GROUP BY `propertyid`
ORDER BY `sp_property`.`feature_listing` desc, `submit_date` desc
LIMIT 1,20
Let's see. you are joining sp_property_images.property_id with sp_property.id. These columns have different types (int vs. varchar) and I suppose this results in a severe performance penalty (since the values have to be converted to the same type).
Then, you are filtering by sp_property.published, so I suggest adding an index on this column as well. Also, examine if you really need to have this column as varchar. A bool/bit flag probably suffices as well (if it doesn't, an enum might be a better choice still).
Ordering benefits from an index too. Add an index spanning both columns sp_property.feature_listing and sp_property.submit_date.
If all of the above still doesn't help, you might have to remove the sub-select. It might prevent the mysql engine from recognizing (and using!) the index you have defined on the sp_property_images.property_id column.
This simple query will give you what you asked for, all articles with their photos
SELECT ar.a_id, ar.a_title, ap.p_id, ap.photo_name
FROM article ar
JOIN article_photo ap on ar.a_id = ap.a_id
No reason for left join and grouping there or you wanna get sum on photos by article?

Split non-relational table into multiple relational tables

I have the following table of 60000 rows, in a MySQL 5 database, which is derived from a CSV file:
CREATE TABLE `smts_import` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`AscCode` varchar(5) NOT NULL,
`AscName` varchar(50) NOT NULL,
`RgnCode` varchar(5) NOT NULL,
`RgnName` varchar(100) NOT NULL,
`SCode` varchar(30) NOT NULL,
`SName` varchar(100) NOT NULL,
`AM` int(11) NOT NULL,
`AF` int(11) NOT NULL,
`LG` decimal(10,4) NOT NULL,
`LT` decimal(10,4) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ars_codes` (`AscCode`,`RgnCode`,`SCode`),
KEY `s_code` (`SCode`)
);
and which contains much repeated data. SCodes are unique, and the relationship between the RgnCode and SCode fields is many-to-one, as is that between the AscCode and RgnCode fields.
I want to split up (normalize) the data into three separate tables in a relational manner, such that no data are repeated:
CREATE TABLE `ascs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`a_code` varchar(6) NOT NULL,
`a_name` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `a_code` (`a_code`)
);
CREATE TABLE `rgns` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`a_id` int(11) NOT NULL,
`r_code` varchar(6) NOT NULL,
`r_name` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `a_id`+`r_code` (`a_id`, `r_code`),
KEY `r_code` (`r_code`)
);
CREATE TABLE `sms` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`r_id` int(11) NOT NULL,
`s_code` varchar(16) NOT NULL,
`s_name` varchar(100) NOT NULL,
`a_m` int(11) NOT NULL,
`a_f` int(11) NOT NULL,
`lg` decimal(10,4) NOT NULL,
`lt` decimal(10,4) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `r_id+s_code` (`r_id`,`s_code`),
KEY `s_code` (`s_code`)
);
where rgns.a_id is a foreign key on ascs.id, and sms.r_id is a foreign key on rgns.id.
I've created the three tables and successfully populated the first two, ascs and rgns, with unique data from the smts_import table. My problem comes when I try to populate the third table sms with just the SCode, SName, AM, AF, LG and LT fields from the smts_import table, PLUS the appropriate id from the rgns table. And here I just get lost, I've tried many variations on the following:
INSERT INTO sms (r_id, s_code, s_name, a_m, a_f, lg, lt)
SELECT DISTINCT sr.id, SCode, SName, AM, AF, LG, LT
FROM sms_import AS si, rgns AS sr, ascs AS sa
WHERE (sr.r_code = si.RgnCode)
AND (sr.a_id = sa.id)
AND (sa.a_code = si.AscCode)
ORDER BY SCode
but I just end up with too many records. How do I write this insert statement to get the appropriate fields from all records from the sms_import table, plus the correct values in the sms.r_id field from the rgns.id field?
Thanks for your help
"the relationship between the RgnCode and SCode fields is many-to-one"
So you also get many rgns.id per SCode.
If you select distinct rgns.id, SCode, etc. you will get each SCode more than once depending on how many RgnCode you have for it.
I would suggest adding a id column to the sms table and create a separate table sms2rgns which contains the one-to-many relations sms.id -> rgns.id